The world’s leading publication for data science, AI, and ML professionals.

Dress Segmentation with Autoencoder in Keras

Extract dresses from photographs

Photo by Oladimeji Odunsi on Unsplash
Photo by Oladimeji Odunsi on Unsplash

The fashion industry is a very profitable field for Artificial Intelligence. There are a lot of areas where Data Scientists can develop interesting use cases and provide benefits. I have already demonstrated my interest in this sector here, where I developed a solution for recommendation and tagging dresses from Zalando online store.

In this post, I try to go further developing a system that receives as input raw images (taken from the web or made with a smartphone) and try to extract dresses shown in it. Keeping in mind that challenges of Segmentation are infamous for the extreme noise present in the original images; we try to develop a strong solution with clever tricks (during preprocessing) that deal with this aspect.

In the end, you can also try to merge this solution with the previous one cited. This permits you to develop a system for real-time recommendation and tagging for dresses, through photographs you take while out and about.

THE DATASET

Recently also a Kaggle competition was launched on Visual analysis and Segmentation of clothing. It is a very interesting challenge but this is not for us… My object is to extract dresses from photographs so this dataset is not adequate due to its redundancy and fine-grained attributes. We need images that contain mostly dresses, so the best choice was to build the data ourselves.

I collected from the web images containing people wearing woman dresses of various types and in different scenarios. The next step required to create masks: this is necessary for every task of object segmentation if we want to train a model that will be able to focus only on the points of real interest.

Below I report a sample of data at our disposal. I collected the original images from the internet and then I enjoy myself to cut them further, separating people from dresses.

Example of image segmentation
Example of image segmentation

We operate this discrimination because we want to mark separation among background, skin, and dress. Backgrounds and skins are the most relevant sources of noise in this kind of problem, so we try to suppress them.

With these cuttings we are able to recreate our masks as shown below, this is made simple binarizing the image. The skin is obtained as the difference among persons and dress.

Example of masks
Example of masks

As the final step, we merge all in a single image of three dimensions. This picture decodes the relevant features of our original image which we are interested in. Our purpose is to maintain separation among background, skin end dress: this result is perfect for our scope!

Final mask
Final mask

We iterated this process for every image in our dataset in order to have for every original image an associated mask of three dimensions.

THE MODEL

We have all at our disposal to create our model. The workflow we have in mind is very simple:

We fit a model which receives as input a raw image and outputs a three-dimensional mask, i.e. it is able to recreate from the original images the desired separation among skin/background and dress. In this way, when a new raw image comes in, we can separate it in three different parts: background, skin and dress. We take into consideration only the channel of our interest (dress), use it to create a mask from the input image and cut it to recreate the original dress.

All this magic is possible due to the power of UNet. This deep convolutional Autoencoder is often used in the task of segmentation like this. It is easy to replicate in Keras and we train it to recreate pixel for pixel each channel of our desired mask.

Before starting training we decided to standardize all our original images with their RGB mean.

RESULTS AND PREDICTIONS

We notice that during prediction when we encounter an image with high noise (in term of ambiguous background or skin) our model start to struggle. This inconvenience can be exceeded by simply increasing the number of training images. But we also develop a clever shortcut to avoid these mistakes.

We make use of the GrubCut Algorithm **** provided by OpenCV. This algorithm was implemented to separate the foreground from the background making use of the Gaussian Mixture Model. This makes for us because it helps to point the person in the foreground denoising all around.

Here the simple function we implement to make it possible. We assume that the person of our interest stands in the middle of the image.

def cut(img):
   img = cv.resize(img,(224,224))

    mask = np.zeros(img.shape[:2],np.uint8)
    bgdModel = np.zeros((1,65),np.float64)
    fgdModel = np.zeros((1,65),np.float64)
    height, width = img.shape[:2]
    rect = (50,10,width-100,height-20)
    cv.grabCut(img,mask,rect,bgdModel,fgdModel,5,
               cv.GC_INIT_WITH_RECT)
    mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
    img2 = img*mask2[:,:,np.newaxis]
    img2[mask2 == 0] = (255, 255, 255)

    final = np.ones(img.shape,np.uint8)*0 + img2

    return mask, final
GrubCut in action
GrubCut in action

Now we apply UNet and are ready to see some results on new images!

Input - GrubCut + Prediction - Final Dress
Input – GrubCut + Prediction – Final Dress
Input - GrubCut + Prediction - Final Dress
Input – GrubCut + Prediction – Final Dress
Input - GrubCut + Prediction - Final Dress
Input – GrubCut + Prediction – Final Dress
Input - GrubCut + Prediction - Final Dress
Input – GrubCut + Prediction – Final Dress
Input - GrubCut + Prediction - Final Dress
Input – GrubCut + Prediction – Final Dress

Our preprocess step, combined with UNet powers, are able to achieve great performance.

SUMMARY

In this post, we develop an end-to-end solution for Dress Segmentation. To achieve this purpose we make use of a powerful Autoencoder combined with clever preprocess techniques. We plan this solution in order to use it in a realistic scenario with real photographs, with the possibility to build on it a visual recommendation system.


CHECK MY GITHUB REPO

Keep in touch: Linkedin


Related Articles