3D Face Reconstruction with Position Map Regression Networks

Published in

Heartbeat

6 min readMay 24, 2018

Position Map Regression Networks (PRN) is a method to jointly regress dense alignment and 3D face shape in an end-to-end manner. In this article, I’ll provide a short explanation and discuss its applications in computer vision.

When I was a child, I imagined that (due to movies, of course), in the future, we’d be able to have these crazy holograms where you could see people talking to you as if they were there. These kinds of applications for computer vision suggest we aren’t that far from achieving something similar.

In the last few decades, a lot of important research groups in computer vision have made amazing advances in 3D face reconstruction and face alignment. Primarily, these groups have used CNNs as the de facto ANN for this task. However, the performance of these methods is restricted because of the limitations of 3D space defined by face model templates used for mapping.

Position Map Regression Networks (PRN)

The architecture of PRN. The green rectangles represent the residual blocks, and the blue ones represent the transposed convolutional layers.

In a recent paper, Yao Feng and others proposed an end-to-end method called Position Map Regression Networks (PRN) to jointly predict dense alignment and reconstruct 3D face shape. They claim their method surpasses all previous attempts at both 3D face alignment and reconstruction on multiple datasets.

Specifically, they designed a UV position map, which is a 2D image recording the 3D coordinates of a complete facial point cloud, which maintains the semantic meaning at each UV polygon. They then train a simple encoder-decoder network with a weighted loss that focuses more on discriminative region to regress the UV position map from a single 2D facial image.

The qualitative results of their method. Odd row: alignment results (only 68 key points are plotted for display). Even row: 3D reconstruction results (re- constructed shapes are rendered with head light for better views).

Their contributions can be summarized here (from the same paper):

- For the first time, we solve the problems of face alignment and 3D face reconstruction together in an end-to-end fashion, without the restriction of low-dimensional solution space.
- To directly regress the 3D facial structure and dense alignment, we develop a novel representation called UV position map, which records the position information of a 3D face and provides dense correspondence to the semantic meaning of each point on the UV space.
- For training, we proposed a weight mask that assigns different weight to each point on the position map and computes a weighted loss. We show that this design helps improve the performance of our network.
- We finally provide a light-weighted framework that runs at over 100FPS to directly obtain 3D face reconstruction and alignment resulting from a single 2D facial image.
- Comparison on the AFLW2000–3D and Florence datasets shows that our method achieves more than 25% relative improvements over other state-of-the-art methods on both 3D face reconstruction and dense face alignment.

Implementation

Their code is implemented in Python using TensorFlow. You can take a look at the official repo here:

YadiraF/PRNet

PRNet — The source code of ‘Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network’.

github.com

If you want to run their examples you’ll need this:

Python 2.7 (numpy, skimage, scipy)
TensorFlow >= 1.4
dlib (for detecting face. You do not have to install if you can provide bounding box information. )
OpenCV 2 (for showing results)

The trained model can be downloaded at BaiduDrive or GoogleDrive.

Right now the code is in development and they will be adding more and more functionalities in the near future.

Deep learning — For experts, by experts. We’re using our decades of experience to deliver the best deep learning resources to your inbox each week.

Applications

Basics (Evaluated in paper)

Face Alignment: Dense alignment of both visible and non-visible points(including 68 key points).

3D Face Reconstruction: Get the 3D vertices and corresponding colours from a single image. Save the result as mesh data(.obj), which can be opened with Meshlab or Microsoft 3D Builder. Notice that, the texture of non-visible area is distorted due to self-occlusion.

To be added:

3D Pose Estimation: Rather than only use 68 key points to calculate the camera matrix(easily effected by expression and poses), we use all vertices(more than 40K) to calculate a more accurate pose.

Depth image:

Texture Editing: Data Augmentation/Selfie Editing, modify special parts of input face, eyes for example:

Face Swapping: replace the texture with another, then warp it to original pose and use Poisson editing to blend images.

Basic usage

Clone the repository

git clone https://github.com/YadiraF/PRNet
cd PRNet

Download the PRN trained model at BaiduDrive or GoogleDrive, and put it into Data/net-data
Run the test code. (test AFLW2000 images)

python run_basics.py #Can run only with python and tensorflow

Run with your own images

python demo.py -i <inputDir> -o <outputDir> --isDlib True

run python demo.py --help for more details.

I’ll be using Deep Cognition’s Deep Learning Studio to test this and other frameworks in the near future, so start by creating an account :).

DeepCognition — Become an AI-Powered Organization Today

Design, Train, and Deploy Deep Learning Models without Coding. Deep Learning Studio simplifies and accelerates the…

deepcognition.ai

Thanks for reading this. I hope you found something interesting here :)

If you have questions just follow me on Twitter

Favio Vázquez (@FavioVaz) | Twitter

The latest Tweets from Favio Vázquez (@FavioVaz). Data Scientist. Physicist and computational engineer. I have a…

twitter.com

and LinkedIn.

Favio Vázquez — Principal Data Scientist — OXXO | LinkedIn

View Favio Vázquez’s profile on LinkedIn, the world’s largest professional community. Favio has 15 jobs jobs listed on…

linkedin.com

See you there :)

Discuss the post on Hacker News.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Comet Newsletter), join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.