Abstract


This white paper describes how we use the Datagen Platform to create a synthetic face dataset that:

  1. Achieves comparable results to other real and synthetic datasets on the task of landmark detection.
  2. Boosts model performance while reducing the amount of real data required.

In this paper, Using a data-centric approach, we iteratively improve performance by optimizing our data rather than our model.

We describe the domain gap which naturally arises when training on synthetic data and testing on real data. We meet both a visual domain gap - the images differ visually - and a label domain gap. The label domain gap is caused by the differences between human 2D labeling and 2D labels based on a 3D model (available only in synthetic data). Additionally, 2D human labeling is prone to more noise than pixel-perfect annotations available using synthetic data. In this work, we describe the measures we took to mitigate these gaps.

We compare two strategies for combining different amounts of real and syntheti data:

Methodology


Differences in how the data was obtained or labeled, the position of the camera, and the distribution of populations within the datasets are all examples of domain gaps one may need to bridge.

Visual domain gap

Our source and target domains (synthetic train data and real test data, respectively) come form different distributions. In tackling this issue, we learned that the preprocessing stage is a key factor which which has a great influence on the generalization to the target domain.

The preprocessing steps we experimented with included initial cropping around the face area and different augmentations.