Abstract:
Face frontalization is a challenging study topic and has several primary drivers and application possibilities. Due to the diversity and complexity of face images, face recognition and other tasks often perform poorly in non-frontal views. However, frontalization techniques can improve the accuracy and reliability of recognition and can be used in many application areas such as digital art, entertainment, and gaming. By changing the angle and pose of face images, users can obtain more personalized choices and entertainment experiences. In recent years, with the development of deep learning, more and more face recognition problems have been solved. However, most face frontalization networks must be trained using paired face datasets that are difficult to obtain. Moreover, this trained network model has low generalizability, and other features in the obtained face images, except for facial posture, will also be altered. Aiming at the above problem, this work proposes a novel Self-Supervised Face Frontalization Model (SFM) based on the StyleGAN generator, which can solve this problem by changing the latent space encoding. In order to synthesize high-quality frontal facial images, both the Contrastive Language Image Pretraining (CLIP) module and an Adaptive Enhancement Module (AEM) are used to edit the latent space during the editing process, maximizing the modification of facial posture without modifying other facial features. The research results indicate that the method proposed in this paper can generate high-quality and complete frontal face images without the need for paired facial datasets. The effectiveness of using ID Encoder and AEM in this model to improve the performance is also verified by ablation experiments, which shows that the proposed method in this work is superior to other methods, in the comparison of qualitative and quantitative experimental data.