Abstract:
To achieve accurate and effective image emotion transfer, an Emotional Modeling Relationship guided Regional Context Perception enhanced Network is proposed (EMR-RCPN). It is guided by external emotion knowledge and jointly optimized through multiple losses to accurately transfer the emotion of image. In this network, we introduce a novel Regional Context Perception Block (RCPB) to enhance the Twins-SVT encoder. It extracts context features of different receptive fields in the image through Locally-grouped Self Attention, Axis-wise Self Attention, and Neighbor Self Attention. Then it fuses them adaptively through cross attention to comprehensively integrate image information. Furthermore, details lost in the fusion of deep features are restored through residual connections to more accurately preserve the image content. Additionally, we propose a novel Emotion-wheel Guided Module (EGM). It uses the emotion distribution in the emotion wheel to guide the model in accurately transferring image emotions. To accurately and effectively evaluate the model ability to transfer image emotion, we innovatively propose the Emotion Transfer Comprehensive Metric (ETCM). It evaluates the effect of image emotion transfer from multiple perspectives, including emotion categories, emotion polarities, and positions of emotions in the emotion wheel. To evaluate the model effectiveness more effectively, we construct a new emotion transfer dataset called FATE. It is based on four widely used and stylistically different emotion datasets: FI, Twitter-LDL, Emotion6, and Artphoto. Extensive experiments on FATE demonstrate the effectiveness of the proposed method, which outperforms other comparative methods.