The world’s leading publication for data science, AI, and ML professionals.

October Edition: Unsupervised Learning

8 Must-Read Articles

Pixabay.com
Pixabay.com

Unsupervised learning is used by data scientists and other data practitioners, in this family of machine learning algorithm, there are no known output or label that instructs the learning algorithm. There are different types of unsupervised learning used in clustering such as K-Means clustering, text mining and dimensionality reduction such as Principal Component Analysis used in the pre-processing stage to transform a dataset for supervised learning machine learning modelling.

In recent times, clustering algorithms can be used by airlines to analyse customer feedback surveys, insurance companies have reviewed their customer complaints register, election Twitter feeds have been grouped together to monitor key discussion themes throughout the day, analyzing a corpus of news documents and performing topic modelling to extract the key themes.

Clustering partitions the data into distinct groups based on their similarity. The types of algorithms are varied including hierarchical clustering which connects objects based on their distance and K-means clustering which represents the cluster as a single mean vector.

The challenge of using unsupervised learning is to determine if it ‘performed well’, since the label of the output is unknown. The benefits of unsupervised learning includes the ability to preprocess the data in the exploratory data analysis phase for scaling the training and testing data. Hence, transformed data allows the data scientist to visualize the direction of the data and reduce the number of dimensions or features using principal component analysis, prior to supervised learning such as ridge regression.

We hope that you will enjoy and learn something new this month as we explore unsupervised learning with our carefully curated selection of the Editor’s picks from Towards Data Science.

Wendy Wong, TDS Editor.


Unsupervised Learning with Python

By Vihar Kurama – 7 min read.

Unsupervised Learning is a class of Machine Learning techniques to find the patterns in data. The data given to unsupervised algorithm are not labelled, which means only the input variables(X) are given with no corresponding output variables. In unsupervised learning, the algorithms are left to themselves to discover interesting structures in the data.


Using Unsupervised Learning to plan a vacation to Paris: Geo-location clustering

By Hamza Bendemra – 6 min read

Since I’ve been to Paris a few times myself, I figured I could help in other ways like contributing to the list of sights and places to visit. After listing all those sights and attractions, I created a Google map with a pin for each location.


K-Means in Real Life: Clustering Workout Sessions

By Carolina Bento – 6 min read

K-means clustering is a very popular unsupervised learning algorithm. In this article I want to provide a bit of background about it, and show how we could use it in an anecdotal real-life situation.


What a Disentangled Net We Weave: Representation Learning in VAE

By Cody Marie Wild – 15 min read

One common strategy for unsupervised learning is that of generative models, the idea of which is: you should give a model the task of producing samples from a given distribution, because performing well at that task require the model to implicitly learn about that distribution.


A wizard’s guide to Adversarial Autoencoders

By Naresh Nagabushan – 9 min read

We know that a Convolutional Neural Networks (CNNs) or in some cases Dense fully connected layers (MLP – Multi layer perceptron as some would like to call it) can be used to perform image recognition. But, a CNN (or MLP) alone cannot be used to perform tasks like content and style separation from an image…


Unsupervised Learning of Gaussian Mixture Models on a SELU auto-encoder (Not another MNIST)

By Gonçalo Abreu – 5 min read

Most clustering methods suffer from curse of dimensionality. This way, to perform the unsupervised learning a dimensionality reduction method is necessary.


Discovering similarities across my Spotify music using data, clustering and visualization

By Juan De Dios Santos – 16 min read

Music taste is a very unique, peculiar and characteristic quality of a person. Of all the millions of songs and sounds that exist, the fact that many people decide to develop a liking for a particular style, genre or a "subset" of music is something that I believe is not by mere chance.


A Practitioner’s Guide to Natural Language Processing

By Dipanjan (DJ) Sarkar – 31 min read

Unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine.


Related Articles