The world’s leading publication for data science, AI, and ML professionals.

Unsupervised Regression for Dimensionality Reduction

Using regression to compute low-dimensional embeddings

Image created with an AI.
Image created with an AI.

Every first-year student in data science and A.I. learns that regression is a supervised learning method. Originally this is true, but we can turn things on their head for Dimensionality Reduction with Unsupervised Regression [1]. In dimensionality reduction, we look for low-dimensional embeddings of high-dimensional data. For certain purposes such as visualization or preprocessing, the task is to find low-dimensional embeddings that satisfy constraints such as distance, density, and neighborhood preservation. For example, close patterns in data space should have close counterparts in the low-dimensional space, while embeddings of dissimilar patterns should be far apart.

Unsupervised Regression

For Unsupervised Regression, we need regression methods that allow mapping to a multidimensional label space. Examples are Nearest Neighbor Regression and Kernel Regression, i.e., the Nadaraya-Watson estimator. The regression method maps from the low-dimensional latent space to the high-dimensional data space. The question becomes: how do the low-dimensional patterns look like that map to their high-dimensional counterparts?

Imagine we use Kernel Regression for mapping from a 2-dimensional latent space to an N-dimensional data space. The task is to map n randomly chosen 2-dimensional points X_ = [x₁,…,xₙ]_ (which we do not know but want to get) with Kernel Regression to the N-dimensional space, let’s write in short _F(X). The optimization problem is to minimize the reconstruction of data Y = [y₁,…,y**ₙ]_ we seek to embed, i.e., to minimize the data space reconstruction error:

by changing X. The norm used here is the Frobenius norm, which is the square root of the sums of all elements of a matrix.

How can we change X to minimize E? The answer is: by gradient descent or by random sampling. For example, Unsupervised Kernel Regression (UKR) [1] uses gradient descent. The equation of Kernel Regression with a Gaussian kernel is derivable. Based on the data space reconstruction error E one can derive the gradient for UKR and perform gradient descent in latent space X. UKR will mainly obtain the pattern density information in the dimensionality reduction results, since Kernel Regression is a density-based method. The higher the density around a pattern in space, the higher its contribution to the prediction. In case of UKR, the model must be regularized to avoid overfitting. Otherwise, the model would only reconstruct the patterns without any ability to generalize.

A Simple Example

A simple example that we look at in more detail here is Unsupervised Nearest Neighbors (UNN) [2], which is based on random sampling instead of gradient descent. K-nearest neighbors is used to embed the data pattern by pattern with random sampling in the latent space. Let us assume we want our embeddings to live in space [0,1]². The first pattern y is embedded anywhere, for example in the middle (0.5,0.5) of the latent space. Each new pattern y is embedded by randomly sampling ν points in [0,1]² and choosing the **x*** with the smallest pattern reconstruction error, i.e., the difference between yᵢ and the nearest neighbor prediction based on K nearest neighbors in latent space of the previously embedded points X and their "labels" Y.

In Python, the code for UNN looks like this, using the KNeighborsregressor from scikit-learn for K-nearest neighbor prediction.

On the famous and simple IRIS dataset, this script generates the following embeddings, see Figure 1. The colors of the embeddings are based on the class labels and show that UNN separates the different classes with few exceptions.

Figure 1: Embedding of Iris dataset with UNN, K=2, uses Scikit-learn IRIS sample script for plotting
Figure 1: Embedding of Iris dataset with UNN, K=2, uses Scikit-learn IRIS sample script for plotting

Conclusions

Unsupervised regression is not only an interesting regression variant for solving dimensionality reduction problems, but it has also been shown to yield good embeddings. The result depends on the characteristics of the regression method employed: nearest neighbors places a pattern in the neighborhood of already embedded patterns. Kernel regression takes into account density information of neighboring patterns. Classic competitors in dimensionality reduction such as Kernel PCA, ISOMAP, LLE, T-SNE, and Autoencoders are always good choices, but for the next embedding task you might want to try Unsupervised Regression.

All images unless otherwise noted are by the author.

References

[1] P. Meinicke, S. Klanke, R. Memisevic, H. J. Ritter, Principal Surfaces from Unsupervised Kernel Regression. IEEE Trans. Pattern Analysis and Machine Intelligence 27(9): 1379–1391 (2005)

[2] O. Kramer, Unsupervised nearest neighbor regression for dimensionality reduction, Soft Computing 19(6): 1647–1661 (2015)


Related Articles