Machine Learning

Intro
Working with large data presents many challenges, one of them being a loss of efficiency and performance in your models due to too high dimensionality.
Luckily, many dimensionality reduction techniques are available that can help us overcome challenges by enabling us to remove "less important" data.
In this article, I dive into metric Multidimensional Scaling (MDS) to give you an understanding of how it works and how to use it for your Data Science projects. I do this by covering the following topics:
- Types of Multidimensional Scaling (MDS)
- MDS place in the universe of Machine Learning algorithms.
- How does metric MDS actually work?
- How can I use MDS to reduce data dimensions in Python?
Types of Multidimensional Scaling (MDS)
There are two major types of MDS, metric (classical) and non-metric. While both aim to find the best lower-dimensional representation of your high-dimensional data, their differences arise in the type of data they are designed to work with.
-
Metric (classical) MDS – is also known as Principal Coordinate Analysis (PCoA). Make sure not to confuse it with Principal Component Analysis (PCA), a separate yet similar technique. Metric MDS attempts to model the similarity/dissimilarity of data by calculating distances between each pair of points using their geometric coordinates. The key here is the ability to measure a distance using a linear scale. E.g., a distance of 10 units would be considered twice as far as a distance of 5 units.
- Non-metric MDS – is designed to deal with ordinal data. E.g., you may have asked your customers to rate your products on a scale of 1 to 5, where 1 is terrible, and 5 is amazing. Here, a product with a rating of 2 is not necessarily twice as good as a product with a rating of 1. It’s the order that matters (1 < 2 < 3 < 4 < 5) rather than the absolute value. This is the kind of situation where you would use non-metric MDS.
As mentioned in the intro, in this article, I focus on metric MDS. Note, though, Sklearn’s implementation of the MDS algorithm in Python lets you easily switch between metric and non-metric approaches. Hence, you could use the Python example provided at the end of this article for a non-metric approach too.
Multidimensional Scaling (MDS) in the universe of Machine Learning Algorithms
The truth is there are way more Machine Learning algorithms than any of us can list. However, I have attempted to collect and categorize some of the most commonly used ones, which you can see in the interactive sunburst chart below. Make sure to click👇 on different categories to enlarge and reveal more.
Multidimensional Scaling sits under the Unsupervised branch of Machine Learning algorithms within the group of Dimensionality Reduction techniques.
If you enjoy Data Science and Machine Learning, please subscribe to get an email with my new articles.
How does metric Multidimensional Scaling (metric MDS) actually work?
In general, the metric MDS calculates distances between each pair of points in the original high-dimensional space and then maps it to lower-dimensional space while preserving those distances between points as well as possible.
Note, the number of dimensions for the lower-dimensional space can be chosen by you. Typically, one would choose either 2D or 3D as it allows for the data to be visualized.
So let’s take a look at high-level steps performed by metric MDS. I have tried to keep maths to a minimum in this explanation, but it was impossible to avoid it altogether.
Steps used by metric MDS algorithm
Step 1 – The algorithm calculates distances between each pair of points, as illustrated below.

Step 2 – With the original distances known, the algorithm attempts to solve the optimization problem by finding a set of coordinates in a lower-dimensional space that minimizes the value of Stress.

Multiple approaches can be used to optimize the above cost function, such as Kruskal’s steepest descent method or De Leeuw’s iterative majorization method. However, I will not delve into maths this time to keep this article focused on high-level explanation.
One important thing to note is that both aforementioned methods are iterative approaches, sometimes giving different results since they are sensitive to the initial starting position.
However, Sklearn’s implementation of MDS allows us to specify how many times we want to initialize the process. In the end, the configuration with the lowest stress is picked as the final result.

How can I use MDS to reduce data dimensions in Python?
Leaving theory behind, let’s get into the fun bits and use MDS for dimensionality reduction in Python.
Typically, you would want to use MDS for high-dimensionality data, for example, images of hand-written text or numbers. However, I am keen to demonstrate what the data looks like before and after. Hence, I will use MDS on simpler data that can be visualized in both 3D and 2D.
Setup
We will use the following data and libraries:
-
Scikit-learn library for 1) creating data for us to use (make_swiss_roll); 2) performing Multidimensional Scaling (MDS);
- Plotly for data visualizations
- Pandas for data manipulation
Let’s start by importing libraries.
# Data manipulation
import pandas as pd # for data manipulation
# Visualization
import plotly.express as px # for data visualization
# Skleran
from sklearn.datasets import make_swiss_roll # for creating a swiss roll
from sklearn.manifold import MDS # for MDS dimensionality reduction
Next, we create some data using Sklearn’s make_swiss_roll and display it on a 3D plot.
# Make a swiss roll
X, y = make_swiss_roll(n_samples=2000, noise=0.05)
# Make it thinner
X[:, 1] *= .5
# Create a 3D scatter plot
fig = px.scatter_3d(None, x=X[:,0], y=X[:,1], z=X[:,2], color=y,)
# Update chart looks
fig.update_layout(#title_text="Swiss Roll",
showlegend=False,
scene_camera=dict(up=dict(x=0, y=0, z=1),
center=dict(x=0, y=0, z=-0.1),
eye=dict(x=1.25, y=1.5, z=1)),
margin=dict(l=0, r=0, b=0, t=0),
scene = dict(xaxis=dict(backgroundcolor='white',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
),
yaxis=dict(backgroundcolor='white',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
),
zaxis=dict(backgroundcolor='lightgrey',
color='black',
gridcolor='#f0f0f0',
title_font=dict(size=10),
tickfont=dict(size=10),
)))
# Update marker size
fig.update_traces(marker=dict(size=3,
line=dict(color='black', width=0.1)))
fig.update(layout_coloraxis_showscale=False)
fig.show()
Make sure to explore the above interactive graph from every angle by rotating it.
Performing MDS
We will now use MDS to map this 3D structure to 2 dimensions while preserving distances between points as best as possible. Note that the depth of the swiss roll is smaller than its height and width. We expect this feature to be preserved in the 2D graph.
Note, "n_components" tells the algorihtm how many dimensions you would like to have. Meanwhile, "metric" can be set to "False" if you would like to use non-metric MDS instead of metric MDS. Aslo, you can specify how many times you would like to initialize using "n_init" hyperparameter.
### Step 1 - Configure MDS function, note we use default hyperparameter values for this example
model2d=MDS(n_components=2,
metric=True,
n_init=4,
max_iter=300,
verbose=0,
eps=0.001,
n_jobs=None,
random_state=42,
dissimilarity='euclidean')
### Step 2 - Fit the data and transform it, so we have 2 dimensions instead of 3
X_trans = model2d.fit_transform(X)
### Step 3 - Print a few stats
print('The new shape of X: ',X_trans.shape)
print('No. of Iterations: ', model2d.n_iter_)
print('Stress: ', model2d.stress_)
# Dissimilarity matrix contains distances between data points in the original high-dimensional space
#print('Dissimilarity Matrix: ', model2d.dissimilarity_matrix_)
# Embedding contains coordinates for data points in the new lower-dimensional space
#print('Embedding: ', model2d.embedding_)
The above code gives us these results:

We can see that the shape of the new array is 2000 by 2, which means that we have successfully reduced it to 2 dimensions. Also, it took the algorithm 64 iterations to reach the lowest Stress level.
Let’s now plot the new 2D data to see how it compares to the original 3D version.
# Create a scatter plot
fig = px.scatter(None, x=X_trans[:,0], y=X_trans[:,1], opacity=1, color=y)
# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))
# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')
# Set figure title
fig.update_layout(title_text="MDS Transformation")
# Update marker size
fig.update_traces(marker=dict(size=5,
line=dict(color='black', width=0.2)))
fig.show()

The results are pretty good since we could preserve the global structure while at the same time not losing the separation observed between points in the original depth dimension.
While it depends on the exact problem we want to solve, MDS seems to perform better in this scenario than PCA (Principal Component Analysis). For comparison, the below graph shows a 2D representation of the same 3D swiss roll after applying PCA transformation.

As you can see, PCA gives us a result that looks like a picture from one side of the swiss roll, failing to preserve depth information from the third dimension.
Conclusions
Multidimensional Scaling is a good technique to use when you wish to preserve both global and local structures of your high-dimensional data. This is achieved by keeping distances between points in lower dimensions as similar as possible to distances in the original high-dimensional space.
However, if your analysis requires you to focus more on the global structures, you may wish to use PCA.
PCA: Principal Component Analysis – How to Get Superior Results with Fewer Dimensions?
Alternatively, you can explore Isomap (Isometric Mapping), which combines kNN (k-Nearest Neighbors) and MDS for better preservation of local structures.
Isomap Embedding – An Awesome Approach to Non-linear Dimensionality Reduction
Feel free to reach out if you have any questions or suggestions.
Cheers 👏 Saul Dobilas