一种基于StyleGAN生成器的自监督人脸正面化模型

谢立志; 刘漫丹; 朱宝旭; 张雯婷

doi:10.14135/j.cnki.1006-3080.20231221001

一种基于StyleGAN生成器的自监督人脸正面化模型

A Self-Supervised Face Frontalization Model Based on StyleGAN Generator

摘要

摘要: 提出了一种基于StyleGAN生成器的新型自监督人脸正面化模型（Self-Supervised Face Frontalization Model，SFM），通过改变潜空间编码实现人脸正面化。为了合成质量优异的正面人脸图像，使用对比语言图像预训练（Contrastive Language Image Pretraining，CLIP）模块和自适应增强模块（Adaptive Enhancement Module，AEM）来编辑潜空间，在最大程度上只修改面部姿态而不修改面部的其他特征。研究结果表明，本文方法无需配对人脸数据集训练就能生成质量优且完整的正面人脸图像。在定性和定量实验数据的比较中，本文方法最优。

Abstract: Face frontalization is a challenging study topic and has several primary drivers and application possibilities. Due to the diversity and complexity of face images, face recognition and other tasks often perform poorly in non-frontal views. However, frontalization techniques can improve the accuracy and reliability of recognition and can be used in many application areas such as digital art, entertainment, and gaming. By changing the angle and pose of face images, users can obtain more personalized choices and entertainment experiences. In recent years, with the development of deep learning, more and more face recognition problems have been solved. However, most face frontalization networks must be trained using paired face datasets that are difficult to obtain. Moreover, this trained network model has low generalizability, and other features in the obtained face images, except for facial posture, will also be altered. Aiming at the above problem, this work proposes a novel Self-Supervised Face Frontalization Model (SFM) based on the StyleGAN generator, which can solve this problem by changing the latent space encoding. In order to synthesize high-quality frontal facial images, both the Contrastive Language Image Pretraining (CLIP) module and an Adaptive Enhancement Module (AEM) are used to edit the latent space during the editing process, maximizing the modification of facial posture without modifying other facial features. The research results indicate that the method proposed in this paper can generate high-quality and complete frontal face images without the need for paired facial datasets. The effectiveness of using ID Encoder and AEM in this model to improve the performance is also verified by ablation experiments, which shows that the proposed method in this work is superior to other methods, in the comparison of qualitative and quantitative experimental data.

HTML全文

参考文献(34)

施引文献

资源附件(0)