Graph Laplacian and its application in Machine learning

Apply Graph Laplacian in Spectral clustering

Taaniya Arora
Towards Data Science

--

Photo by Pietro Jeng on Unsplash

This article highlights graphs, properties of its representations and its application in Machine learning to perform Spectral clustering.

Introduction

A graph is a data structure with nodes connected to each other through directed or undirected edges. The edges can have weights to represent for eg. the distance between 2 cities for a graph representing an urban road network with cities as nodes.

Contents

  • Loading data
  • Constructing a graph from features
  • Deriving its Laplacian representation
  • Explore properties of Graph Laplacian
  • Applications in machine learning — Spectral clustering

Loading data

We will begin with a toy make_moons data set from python library scikit-learn.

import numpy as np
from scipy import sparse
from sklearn.datasets import make_moons
from sklearn.neighbors import kneighbors_graph
from sklearn.cluster import KMeans
from sklearn.metrics import homogeneity_score, completeness_score,v_measure_score
import networkx as nx
import matplotlib.pyplot as plt
random_state = 213
np.random.seed(random_state)
data_size = 150features,y = make_moons(n_samples=data_size, noise=0.07, random_state=213)
Data set with 2 clusters

Construction of K-nearest neighbors graph

K-nearest neighbors graph can be constructed in 2 modes — ‘distance’ or ‘connectivity’.

With ‘distance’ mode, the edges represent the distance between 2 nodes and with ‘connectivity’ , the graph has edge weight 1 or 0 to denote presence or absence of an edge between them. We will choose euclidean distance metric to compute the distance.

n_neighbors = 10
knn_dist_graph = kneighbors_graph(X=features,
n_neighbors=n_neighbors,
mode='distance',
metric='euclidean',
n_jobs=6)

It returns a sparse graph with edges representing the distance between the data points. The distance of the first data point with its nearest 10 neighbors is shown below. Note the sparsity of the graph and it will have continuous value representing distance at only those places/indices in the matrix which corresponds to its k nearest neighbors, the rest will be zero.

knn_dist_graph.todense()[0]

This data is commonly used to find groups within the data points where similar data points lie in the same class or cluster.

Similarly in other such cases when you want to capture similarity within the data points rather than the distance, we can convert this graph to similarity based using a Gaussian kernel with its width (sigma = 1 ) and the distance d(x1, x2) is the euclidean distance in the non zero places in the sparse graph above.

Note that the locations with distance = 0 in the array above mean that the distance is beyond the largest distance of k nearest neighbors. This value can be interpreted as highest similarity when fed to Gaussian kernel. So, we will only apply this kernel to indices containing distances.

sigma = 1
similarity_graph = sparse.csr_matrix(knn_dist_graph.shape)
nonzeroindices = knn_dist_graph.nonzero()
similarity_graph[nonzeroindices] = np.exp( -np.asarray(knn_dist_graph[nonzeroindices])**2 / 2.0 * sigma**2)similarity_graph = 0.5 * (similarity_graph + similarity_graph.T)
similarity_graph.todense()[0]

This graph may be asymmetric since it is based on k-nearest neighbors. We need to make this graph symmetric for reasons we will know next in properties of Laplacian. We add the transpose of its graph to itself and divide all the values by 2.

Let’s see how the graph looks like when visualized

Visualization with networkx

Deriving Graph Laplacian representation

L = D-W

D: Degree matrix

W: Similarity graph (represented by weighted adjacency matrix)

We will create a degree matrix of the graph which is a diagonal matrix with node degrees d at the diagonals.

degree_matrix = similarity_graph.sum(axis=1)
diagonal_matrix =
np.diag(np.asarray(degree_matrix).reshape(data_size,))
L = diagonal_matrix - similarity_graph

Note: By subtracting the similarity matrix from the degree matrix, the effect of cycles in a graph gets nullified. Finally, the Laplacian contains the degree on diagonals and the negative edge weights in the rest of the matrix.

Properties of Graph Laplacian

  1. Real symmetric

Because it is real and symmetric, its eigen values are real and its eigen vectors are orthogonal.

2. Positive semi-definite

The Laplacian has at least one eigen value equal to 0. We can check this by its quadratic form. L being real symmetric and if x is a n x 1 column vector and then its quadratic form Q is given by

A quadratic form is positive semi-definite if -

  • Q≥0 for all x and Q = 0 for some x≠0

We will set x to be a column vector of 1's.

x = np.ones(shape=(data_size,1))
Q = np.dot(np.dot(x.T, L.todense()), x)
Q.round(10)

Any column vector x containing same values throughout, will result in quadratic form equal to 0.

3. The number of zero eigen values of Laplacian is equal to the number of connected components in the graph

# Finding eigen values and eigen vectors
e, evecs = np.linalg.eig(L.todense())
e.shape, evecs.shape
# No. of eigen values equal to 0
e_rounded_off = e.round(5)
e_rounded_off[e_rounded_off == 0].shape
# No. of connected components
nx.number_connected_components(nx_graph)

4. Two similar data points with high edge weight will have similar values at their corresponding indices in the resulting eigen vectors

One of these eigen vectors is Fiedler vector — Eigen vector corresponding to the smallest non-zero eigen value. In the image below, the data points are quite well separated by sign.

Eigen vector plot corresponding to smallest non-zero eigen value

Application of Graph Laplacian

By extension of all the above properties, and the fact that the eigen vector separates data points in groups, it is used for clustering. This method is called Spectral clustering.

This is performed by choosing a threshold to separate data points into 2 clusters from the 1st smallest eigen vector. For more than 2 clusters, we can use Kmeans algorithm to obtain K clusters directly from the first smallest K eigen vectors.

Eigen values plotted against data indices
Eigen values (sorted) plotted against data indices

Using Fiedler vector to partition data points

Graph partitioning — In this, the vertices are partitioned into disjoints sets. The graph is partitioned such that the edges within the groups have high weights (points within clusters are similar) and the edges between groups have small weights (points in different clusters are dissimilar to each other).

The Fiedler vector is the eigen vector corresponding to smallest non-zero eigen value. Indices of values below 0 are assigned to cluster 1 and the indices of rest of them are assigned to cluster 2.

We are able to use this vector this way because this eigen vector is a scaled version of the indicator vector. An indicator vector is associated with every non-zero eigen value of a matrix. Each indicator vector is also orthogonal to each other and ideally contains binary values 0 or 1 to indicate cluster membership.

In this case, instead of indicator vector containing 0’s and 1’s, we have signed continuous values.

# Get smallest non-zero eigen value's index for obtaining partition to cluster
fiedler_index = sorted_indices[1]

# The eigen vector for smallest non-zero eigen value i.e plotting the Fiedler vector
plt.figure(figsize=(8,6))
plt.scatter(np.arange(data_size), evecs[:,fiedler_index].tolist())
plt.title("Eigen (Fiedler) vector plot")
plt.show()
fiedler_vector = evecs[:,fiedler_index].copy()
# Thresholding the values in this eigen vector at 0
fiedler_vector[fiedler_vector < 0.0] = 0
fiedler_vector[fiedler_vector > 0.0] = 1
new_labels = np.asarray(fiedler_vector)[:,0]
# Plot cluster result
plt.scatter(features[:,0], features[:,1],
c=new_labels.astype(float))
plt.title("Clusters plot")
plt.show()
Clusters obtained from Spectral clustering

Evaluation with entropy based external cluster evaluation measures

Homogeneity & completeness score and v-measure evaluate the cluster results based on the ground truth and takes into account the distribution of class and cluster labels to measure coherence of clusters obtained. Highest score is 1.0 for each of the measure. We get 1.0 score each metric for the above procedure.

# Evaluation of clustering result of the above procedure
homogeneity_score(y, new_labels), completeness_score(y, new_labels), v_measure_score(y, new_labels)

Comparison with Kmeans

num_clusters = 2
kmean_labels = KMeans(n_clusters=num_clusters, random_state=random_state, n_jobs=6).fit_predict(features)
plt.scatter(features[:,0], features[:,1], c=kmean_labels.astype(float))
plt.show()
Clusters obtained from KMeans clustering
# Evaluation of clustering result of KMeans
homogeneity_score(y, kmean_labels), completeness_score(y, kmean_labels), v_measure_score(y, kmean_labels)
# Scores
# (0.1836464702880451, 0.1837407327840609, 0.18369358944333708)

Kmeans, assuming presence of spherical clusters, isn’t able to distinguish the 2 clusters well.

Recalling about Laplacian representation and solving the eigen value system to obtain clusters, one question is — Why Eigen value system, though?

This is because the eigen value system approximates a graph cut.

For eg. Cut(V1, V2) to obtain 2 partitions V1 & V2 is expressed as —

In the above approach, the eigen values approximate a Normalized graph cutNCut(V1, V2).

How is it so?

The above equation of NCut can be re-expressed as Rayleigh’s quotient, whose minimum is obtained by the smallest eigen value from a generalized eigen value problem. Let x be an N dimensional indicator vector where xi=1, if the graph node i is in V1 and 0 otherwise, then the minimization of Ncut is expressed as the following Rayleigh’s quotient,

subject to one of the conditions,

where y is also subject to constraints as that of our indicator vector x.

This way the graph partitioning problem gets converted to clustering problem.

Conclusion

We began with features of data, constructing its Graph and Laplacian and using its spectral embeddings (eigen vectors), we found optimal partitions to obtain clusters. Spectral theory, the source of this concept of working with eigen values and eigen vectors of graph representation, is also used in other areas of machine learning such as image segmentation, spectral graph convolutional neural networks and many more in the industry as well as research community.

This article aimed to summarize the concepts and the applications in machine learning and linear algebra and I hope it also raises curiosity to explore and learn more about it.

Hope it is helpful.

Do share your feedback or if you come across the application of these concepts in other areas.

You can find the code on GitHub here.

References

--

--

Sr. Data Scientist | NLP & Gen AI | Researcher | Reinforcement Learning | Problem Solver