Dimensionality reduction methods allow examining the dataset in another axis according to the relationship between various parameters such as correlation, distance, variance in datasets with many features. After this stage, operations such as classification are performed on the dataset with supervised or unsupervised learning methods easily. In addition, if we consider the dataset with 30 features, instead of making 30-dimensional visualization, it would be easier in terms of visualization to consider 30 features from a different aspect according to various factors among them and to make them 2-dimensional. It can be thought of the dimensionality reduction methods’ handling of the dataset from a different aspect as transforming the signal given in the time axis to the frequency axis by using the Fourier Transform and processing the signal effectively. This article deals with the various types of multi-dimensional scaling, which is one of the dimensionality reduction and visualization methods, from the theoretical level and includes its usage areas. Studies are enriched with python implementation.
Table of Contents
1. What is Multi-Dimensional Scaling?
2. Goodness of fit - Stress -
3. PCA vs MDS
4. Different Distance Approaches on image dataset
- Euclidean Distance
- Manhattan Distance
- Chebyshev Distance
- Minkowski Distance
5. Tutorials
- S curve
- Digits Dataset
6. Metric MDS and Non-Metric MDS
7. References

1. What is Multi-Dimensional Scaling?
Multidimensional scaling is a visual representation of distances or dissimilarities between sets of objects.[1] "Objects" can be colors, faces, map coordinates, political persuasion, or any kind of real or conceptual stimuli [2]. As well as interpreting dissimilarities as distances on a graph, MDS can also serve as a dimension reduction technique for high-dimensional data[3]. In short, the main purpose of MDS is to keep these dissimilarities in the reduced dimensionality.
Distance is set as Euclidean Distance by default in MDS presented in the Sklearn library. Besides, other distances such as Manhattan can be adjusted and used (see section 4).

In the code block below, a 6 x 6 dataset is created using the airline distances between the cities seen in Figure 1 and Figure 2(left), and the base version in the MDS sklearn library is applied.

The result is seen in Figure 2(right). In addition, the stress value is 0.188
. Before moving on to the topic of stress, let’s discuss the results obtained. Initially, we had a 6 x 6 matrix, and the dimensionality of the dataset is reduced to 6 x 2 by applying MDS, and then it is visualized.
If the coordinate plane is considered, the position of the (x,y) point is determined by the reference (0,0) origin point. Any (x,y) point is positioned regarding the (0,0) point. In MDS, on the other hand, each column pair is calculated with the specified distance type. This is why distance is preserved with MDS. When we look at Figure 2, it is seen that Erzurum is the farthest from other cities. Similarly, cities that are close to each other are positioned close after MDS is applied, so when we look at the MDS result, a similar picture is encountered. The dataset is viewed from a different angle while maintaining the distance relationship between the data.
2. Goodness of fit – Stress –
A specific expression is needed to determine to what extent dimensionality reduction is required in data analysis applications. In PCA, cumulative variance is determined by drawing a scree plot. In MDS, the distances are modeled. Therefore, the best choice for MDS is one based on the differences between the actual distances and their estimated values. This measure is called stress. The code block where the stress graph of the above example is drawn and its output are as follows:

The result is shown in Figure 3. In the original paper on MDS, Kruskal (1964) gave the following advice about stress values based on his experience[4]:
0.2 – poor
0.1 – fair
0.05 – good
0.025 – excellent
0 – perfect
Nowadays, academic studies are of the opinion that it is misleading to follow this table according to the size of the dataset and the quality of the data.
3. PCA vs MDS
The procedure performed by maintaining distance in MDS is performed by considering variance-correlation values in PCA. Minimizing the linear distance using Euclidean Distance is similar to maximizing the linear correlations. Therefore, it can be said that the 2D graphics of the PCA and MDS applied dataset would have similar characteristics. Of course, this only applies to the use of MDS with Euclidean distance. In addition, different distance methods can be applied according to the project. (For example, Euclidean distance can be weak for large datasets.) PCA has been applied to the above dataset in the code block above and the result is shown in Figure 4.

4. Different Distance Approaches
It has already been mentioned that the Euclidean distance is used by default in the Sklearn library. In addition, various distances can be used by setting dissimilarities = "precomputed"
. In the code block below, MDS is applied to the fetch_olivetti_faces
dataset in the sklearn library at various distances and visualized in 2D.
Euclidean Distance

Manhattan Distance

Chebyshev Distance

Minkowski Distance

Looking at the images above, it is seen that each result is shaped according to a different characteristic based on distance. It should be chosen according to the structure of the dataset used while making the selection. For example, biologists working on genes should choose and use log-fold changes, as they are interested in log-fold changes on genes.
5. Tutorials
S curve
The 3D S curve is imported in the code block below. It is rendered 2D and visualized with PCA and MDS. The results are shown in Figure 9.

If the dataset would be classified with machine learning in the project, the classification process can be easily done after applying MDS with various unsupervised learning methods or supervised learning methods.
Digits
The First 5 labels of the Digits dataset, that is 0,1,2,3,4 are imported. The shape of the dataset is converted from 901 x 64 to 901 x 2 by applying the MDS and PCA processes separately. Then visualization is performed and the result is as in Figure 10.

After the MDS process, it is seen that especially the 2. and 3. groups are formed in better clusters compared to PCA. After this stage, the application of various machine learning processes will give effective results.
6. Metric MDS and Non-Metric MDS
So far, it has been focused on metric (classical) multidimensional scaling, also called Principal Coordinate Analysis (PCoA). In this method, dimensionality reduction was made by considering the distances between the features, and visualization was performed.
The non-metric MDS is suitable for ordinal datasets. For example, in the survey data collected during market research, let’s say how many points will be given to the X brand car over 10. Here, 8 points would mean to have better quality than 3 points. Labels 0–1–2–3–4 were used in the digits dataset above, but neither had an advantage over the other. In another example, when psychiatric patients are asked to rate their mood, a high score would mean something different from a low score.
In a nutshell, while metric MDS shows a linear relationship, nonmetric MDS (also called ordinal MDS) is described by a set of curves that depend only on the value of the ranks.[5] By setting the metric = False
, it can be used as non-metric MDS presented in the Sklearn library.
7. References
[1] "Multidimensional Scaling: Definition, Overview, Examples – Statistics How To." https://www.statisticshowto.com/multidimensional-scaling/ (accessed Oct. 14, 2021).
[2] "Multidimensional Scaling – Joseph B. Kruskal, Myron Wish – Google Books." https://books.google.de/books/about/Multidimensional_Scaling.html?id=iTydugEACAAJ&redir_esc=y (accessed Oct. 13, 2021).
[3] A. Buja, D. F. Swayne, M. L. Littman, N. Dean, H. Hofmann, and L. Chen, "Data Visualization with multidimensional scaling," J. Comput. Graph. Stat., vol. 17, no. 2, pp. 444–472, Jun. 2008, doi: 10.1198/106186008X318440.
[4] NCSS and LLC, "435–1 Multidimensional Scaling."
[5]"Multidimensional Scaling: Definition, Overview, Examples – Statistics How To." https://www.statisticshowto.com/multidimensional-scaling/ (accessed Oct. 14, 2021).