Unite the People – Closing the Loop Between 3D and 2D Human Representations
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler

Abstract. 3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits “in-the- wild”. However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.

PaperSupplementary TextSupplementary Video


  title = {Unite the People: Closing the Loop Between 3D and 2D Human Representations},
  author = {Lassner, Christoph and Romero, Javier and Kiefel, Martin and Bogo, Federica and Black, Michael J. and Gehler, Peter V.},
  booktitle = {IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
  month = jul,
  year = {2017},
  url = {http://up.is.tuebingen.mpg.de},
  month_numeric = {7}


The images for the datasets originate from the Leeds Sports Pose dataset and its extended version, as well as the single person tagged people from the MPII Human Pose Dataset. Because we publish several types of annotations for the same images, a clear nomenclature is important: we name the datasets with the prefix “UP” (for Unite the People, optionally with an “i” for initial, i.e., not including the FashionPose dataset). Followed by a dash, we specify the type of the annotation (Segmentation, Pose or 3D) and the granularity. If the annotations have been acquired by humans, we append an “h”. We make the annotations freely available for academic and non-commercial use (see also license). For the images always the license of the original dataset applies, we only provide our cut-outs for convenience and reproducibility. The datasets are linked from their respective thumbnail image.

Our six body part segmentation annotations from human annotators for evaluation on the Leeds Sports Pose Dataset is included in the UPi-S1h download.


The models will be available soon!

We provide our trained models for download. Again, they are freely available for academic and non-commercial use (see also license). Click the thumbnails for downloading (the direct prediction model will be made available together with the code). The `S31' model must be run with the Deeplab V2 caffe, the `P91' model was trained with the Deepercut-CNN caffe, but should run for deployment with any recent caffe version.


Our code, including the training scripts, will soon be available on github.

License & Contact

We make the code as well as the datasets available for academic or non-commercial use under the Creative Commons Attribution-Noncommercial 4.0 International license.