Multi‐modal deep network for RGB‐D segmentation of clothes

Boris Joukovsky, Pengpeng Hu, Adrian Munteanu

April 2020

Abstract

In this Letter, the authors propose a deep learning based method to perform semantic segmentation of clothes from RGB-D images of people. First, they present a synthetic dataset containing more than 50,000 RGB-D samples of characters in different clothing styles, featuring various poses and environments for a total of nine semantic classes. The proposed data generation pipeline allows for fast production of RGB, depth images and ground-truth label maps. Secondly, a novel multi-modal encoder–ecoder convolutional network is proposed which operates on RGB and depth modalities. Multi-modal features are merged using trained fusion modules which use multi-scale atrous convolutions in the fusion process. The method is numerically evaluated on synthetic data and visually assessed on real-world data. The experiments demonstrate the efficiency of the proposed model over existing methods.

Type

Journal article

Publication

Electronics Letters

Pengpeng Hu

Senior Lecturer (Associate Professor)

Pengpeng Hu is currently a Senior Lecturer (Associate Professor) with The University of Manchester. His research interests include biometrics, geometric deep learning, 3D human body reconstruction, point cloud processing, and vision-based measurement. He serves as an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Automation Science and Engineering, and Engineering and Mathematics in Medical and Life Sciences, as well as an Academic Editor for PLOS ONE and a member of the editorial board for Scientific Reports. He is also the Programme Chair for the 25th UK Workshop on Computational Intelligence (UKCI 2026) and an Area Chair for the 35th British Machine Vision Conference (BMVC 2024). He is the recipient of the Emerald Literati Award for an outstanding paper in 2019.