UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

Meng Cao; Haozhi Huang; Hao Wang; Xuan Wang; Li Shen; Sheng Wang; Linchao Bao; Zhifeng Li; Jiebo Luo

doi:10.1109/TIP.2021.3089909

UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

Meng Cao, Haozhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao^*, Zhifeng Li, Jiebo Luo

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

Original language	English
Article number	9464699
Pages (from-to)	6107-6116
Number of pages	10
Journal	IEEE Transactions on Image Processing
Volume	30
Early online date	24 Jun 2021
DOIs	https://doi.org/10.1109/TIP.2021.3089909
Publication status	Published - 7 Jul 2021

Keywords

Faces
Three-dimensional displays
Training
Task analysis
Image reconstruction
Optical losses
Solid modeling

Access to Document

10.1109/TIP.2021.3089909

Cite this

@article{6ae0f7dfc588477f9f9c0b1817276d05,

title = "UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing",

abstract = "Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.",

keywords = "Faces, Three-dimensional displays, Training, Task analysis, Image reconstruction, Optical losses, Solid modeling",

author = "Meng Cao and Haozhi Huang and Hao Wang and Xuan Wang and Li Shen and Sheng Wang and Linchao Bao and Zhifeng Li and Jiebo Luo",

year = "2021",

month = jul,

day = "7",

doi = "10.1109/TIP.2021.3089909",

language = "English",

volume = "30",

pages = "6107--6116",

journal = "IEEE Transactions on Image Processing",

issn = "1941-0042",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

TY - JOUR

T1 - UniFaceGAN

T2 - A Unified Framework for Temporally Consistent Facial Video Editing

AU - Cao, Meng

AU - Huang, Haozhi

AU - Wang, Hao

AU - Wang, Xuan

AU - Shen, Li

AU - Wang, Sheng

AU - Bao, Linchao

AU - Li, Zhifeng

AU - Luo, Jiebo

PY - 2021/7/7

Y1 - 2021/7/7

N2 - Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

AB - Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

KW - Faces

KW - Three-dimensional displays

KW - Training

KW - Task analysis

KW - Image reconstruction

KW - Optical losses

KW - Solid modeling

U2 - 10.1109/TIP.2021.3089909

DO - 10.1109/TIP.2021.3089909

M3 - Article

SN - 1941-0042

VL - 30

SP - 6107

EP - 6116

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

M1 - 9464699

ER -

UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

Abstract

Keywords

Access to Document

Fingerprint

Cite this