Facial Expression Retargeting from Human to Avatar Made Easy

What?

기존에 사람에서 아바타로 expression을 transfer할 때, tedious한 3D modeling process가 필요했으며, 모델러의 경험에 의존적이었다는 단점이 있었음
본 논문에선 이러한 문제점을 해결하기 위해서, nonlinear expression 임베딩과 expression domain translation을 통해 사람과 아바타라는 두 도메인간의 expression을 transfer하는 솔루션을 제안
- VAE를 통해 human과 avatar의 facial expression을 low-dimensional latent spaces로 build
- Geometric하고 perceptual한 constraints를 줘서 두 latent space 간의 유사함을 표현
이들이 제안한 솔루션은 전문적이지 않은 유저들도 적은 시간과 노력으로 high-quality facial expression retargeting을 할 수 있음

Untitled

위 그림은 blendshape-based expression transfer 방법을 나타낸 것
- 근본적으로 semantic이 human과 avatar 사이에 동일함. 그 이유는 blendshape weights는 source parametric space에서 target parametric space를 그대로 copy하기 때문
- 하지만 human references와 일치하는 avatar를 blendshape 하는 것은 매우 많은 노동이 들어가며, 전문적인 animators의 고도의 skills이 요구됨
- Retargeting results의 quality는 animators의 미적인 성향에 쉽게 영향을 받을 수 밖에 없음

Untitled

본 논문의 전반적인 Framework는 위와 같음.
- Human과 avatar의 expression ($M_{human},M_{avtar}$)를 latent space ($S_{human},S_{avatar}$) 로 임베딩
- $S_{human},S_{avatar}$ 간의 domain translation을 해준다는 아이디어

VAE를 통해 nonlinear한 expression representation이 가능함
Avatar character별 각각의 VAE network train 필요. 이때, expressions은 랜덤하게 생성된 것으로 함. 이를 통해 avatar expression에 대한 latent space를 얻게 됨
- Latent space는 avatar의 가장 valid한 expressions을 갖고 있음
Human faces는 identity와 expression로 분해하기 위해 disentangled 3D face representation learning[1]을 사용

[1] Z.-H. Jiang, Q. Wu, K. Chen, and J. Zhang, “Disentangled representation learning for 3d face shape,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 957–11 966.
- 위 논문에서 학습 방식을 차용하여 Human face를 임베딩하였는데, 해당 논문에선 3D face shape을 nonlinear한 방식으로 인코딩하여, identity와 expression을 disentangled하였음
- 조금 더 디테일하게 들어가자면, 3D face mesh를 Euclidian coordinates가 아닌 vertex-based deformation으로 표현
- 본 논문에선 위와 같은 학습을 통해, human expression representation은 identity의 차이에 영향을 받지 않음

임베딩을 완성한 후, human에서 avatar로 domian translation을 위한 매핑 함수 $F: S_{human} \rightarrow S_{avatar}$ 을 정의해야 함
정확한 transfer를 위해 두 가지 constraints를 고려해줌
- Geometric consistency constraint: original과 retargeted expressios가 기하학적으로 유사해야 함
- Perceptual consistency constraint: original과 retargeted expressions가 사람의 시각적인 관점에서 유사한 semantics를 가져야 함