Abstract:
To achieve accurate and effective image emotion transfer, a region context aware enhancement network guided by emotion modeling relationships is proposed, EMR-RCPN. Under the guidance of external emotion knowledge, multiple losses are combined to achieve accurate image induced emotion transfer. In this network, a novel Regional Context Perception Block (RCPB) is introduced to enhance the Twins-SVT encoder. It extracts context features of different receptive fields in images through various Locally-grouped Self Attention, Axis-wise Self Attention, and Neighbor Self Attention, and adaptively fuses them through cross attention to comprehensively integrate image information. On this basis, the lost information in the fusion of deep features are restored through residual connections to more accurately preserve the image content. Additionally, a novel Emotion-wheel Guided Module (EGM) is proposed, which uses the emotion distribution in the emotion wheel to guide the model in accurately transferring image emotions. To accurately and effectively evaluate the ability of the model to transfer image emotion, an innovative Emotion Transfer Comprehensive Metric (ETCM) is proposed, which evaluates the effect of image emotion transfer from multiple perspectives, including emotion categories, emotion polarities, and the position of emotions in the emotion wheel. To effectively evaluate the model effectiveness of the model, a new emotion transfer dataset FATE is constructed based on four widely used emotion datasets with different styles: FI, Twitter-LDL, Emotion6, and Artphoto. The extensive experiments on FATE have fully validated the effectiveness of the proposed method and demonstrated its superiority over other compared methods.