The world’s leading publication for data science, AI, and ML professionals.

GSoC 2020 with CERN-HSF | Dark Matter and Deep Learning

This blog is a very brief summary of my Google Summer of Code (GSoC) 2020 project under CERN-HSF. This year marks the 16th anniversary of…

This blog is a very brief summary of my Google Summer of Code (GSoC) 2020 project under CERN-HSF. This year marks the 16th anniversary of Google Summer of Code which saw 8,902 proposals from 6,626 students, out of whom 1,198 students were given an opportunity to work with 199 organizations.


Image credits: Google Summer of Code
Image credits: Google Summer of Code

The DeepLense Project

Project description

DeepLense is a deep learning pipeline for particle dark matter searches with strong gravitational lensing and is a part of the umbrella organization CERN-HSF. Specifically, my project is an extension of the work published in the paper titled "Deep Learning the Morphology of Dark Matter Substructure," in which my mentors have explored the use of state-of-the-art supervised deep learning models such as ResNet for the multiclass classification of strong lensing images.

Gravitational lensing has been a cornerstone in many Cosmology experiments, and studies since it was discussed in Einstein’s calculations back in 1936 and discovered in 1979, and one area of particular interest is the study of dark matter via substructure in strong lensing images. While statistical and supervised machine learning algorithms have been implemented for this task, the potential of unsupervised deep learning algorithms is yet to be explored and could prove to be crucial in the analysis of LSST data. The primary aim of this GSoC 2020 project is to design a python-based framework for implementing unsupervised deep learning architectures to study strong lensing images.

Refer to the paper "Decoding Dark Matter Substructure without Supervision" for more details.

Repositories

I have compiled my work into two open-source repositories. The first one titled PyLensing which is a tool for generating lensing images based on PyAutoLens simulations and the second one titled Unsupervised Lensing which is a PyTorch-based tool for Unsupervised Deep Learning applications in strong lensing cosmology.

About Me

I am K Pranath Reddy, an M.Sc (Hons.) Physics and B.E (Hons.) Electrical and Electronics Engineering major at Birla Institute of Technology and Science (BITS) Pilani – Hyderabad Campus, India.

Why DeepLense?

Being a physics student, I am familiar with the operations of CERN and possess a fundamental understanding of a lot of projects associated with the organization, and I have worked extensively on the application of deep learning in the field of cosmology. This experience has motivated me to contribute to the DeepLense project.

The Data

Simulated sample lensing images with different substructure. none (left), vortex (middle), and spherical (right). | Image by Author
Simulated sample lensing images with different substructure. none (left), vortex (middle), and spherical (right). | Image by Author

Our Dataset consists of three classes, strong lensing images with no substructure, vortex substructure, and spherical substructure. Considering the samples with substructure to be outliers, we will be training our unsupervised models on a set of strong lensing images with no substructure to solve the task of anomaly detection.

We have generated two sets of lensing images, Model A and Model B. We have used the python package PyAutoLens for our simulations. The difference between the two models is that all simulated images for Model A are held at fixed redshift while Model B allows the lensed and lensing galaxy redshifts to float over a range of values. An additional difference is the SNR in both models. Images for Model A have SNR ≈ 20 where Model B is constructed such that simulations produce images whose SNR varies from 10 to 30. More details about the simulation can be found in the paper.

Unsupervised Models

I have studied and implemented various Unsupervised Models in the context of anomaly detection. In this section, I will be discussing four models, namely Deep Convolutional Autoencoder (DCAE), Convolutional Variational Autoencoder (VAE), Adversarial Autoencoder (AAE), and Restricted Boltzmann Machine (RBM) along with the code for implementing the models using my PyTorch tool Unsupervised Lensing.

Deep Convolutional Autoencoder (DCAE)

An autoencoder is a type of neural network that learns its own representation and consists of an encoder network and a decoder network. The encoder learns to map the input samples to a latent vector whose dimensionality is lower than the dimensionality of the input samples, and the decoder network learns to reconstruct the input from the latent dimension. Thus, autoencoders can be understood qualitatively as algorithms for finding the optimal compressed representation of a given class.

We first consider a deep convolutional autoencoder, which is primarily used for feature extraction and reconstruction of images. During training, we make use of the mean squared error (MSE),

MSE Loss | Image by Author
MSE Loss | Image by Author

as our reconstruction loss where θ and θ′ are the real and reconstructed samples.

Implementation using the PyTorch tool:

from unsupervised_lensing.models import Convolutional_AE
from unsupervised_lensing.models.DCAE_Nets import *
from unsupervised_lensing.utils import loss_plotter as plt
from unsupervised_lensing.utils.EMD_Lensing import EMD
# Model Training
out = Convolutional_AE.train(data_path='./Data/no_sub_train.npy', 
                             epochs=100,
                             learning_rate=2e-3,
                             optimizer='Adam',
                             checkpoint_path='./Weights',         
                             pretrain=True,                       
                             pretrain_mode='transfer',            
                             pretrain_model='A')                  

# Plot the training loss
plt.plot_loss(out)
# Model Validation
recon_loss = Convolutional_AE.evaluate(data_path='./Data/no_sub_test.npy', 
                                       checkpoint_path='./Weights',        
                                       out_path='./Results')               

# Plot the reconstruction loss
plt.plot_dist(recon_loss)

# Calculate Wasserstein distance
print(EMD(data_path='./Data/no_sub_test.npy', recon_path='./Results/Recon_samples.npy'))

Convolutional Variational Autoencoder (VAE)

We also consider a variational autoencoder, which introduces an additional constraint on the representation of the latent dimension in the form of Kullback-Liebler (KL) divergence,

Kullback-Liebler (KL) divergence | Image by Author
Kullback-Liebler (KL) divergence | Image by Author

where P(x) is the target distribution and Q(x) is the distribution learned by the algorithm. The first term on the r.h.s. is the cross-entropy between P and Q and the second term is the entropy of P. Thus the KL divergence encodes information of how far the distribution Q is from P. In the context of variational autoencoders, the KL divergence serves as a regularization to impose a prior on the latent space. For our purposes, P is chosen to take the form of a Gaussian prior on the latent space z and Q corresponds to the approximate posterior q(z|x) represented by the encoder. The total loss of the model is the sum of reconstruction (MSE) loss and the KL divergence.

Implementation using the PyTorch tool:

from unsupervised_lensing.models import Variational_AE
from unsupervised_lensing.models.VAE_Nets import *
from unsupervised_lensing.utils import loss_plotter as plt
from unsupervised_lensing.utils.EMD_Lensing import EMD
# Model Training
out = Variational_AE.train(data_path='./Data/no_sub_train.npy', 
                           epochs=100,
                           learning_rate=2e-3,
                           optimizer='Adam',
                           checkpoint_path='./Weights',         
                           pretrain=True,                      
                           pretrain_mode='transfer',            
                           pretrain_model='A')
# Plot the training loss
plt.plot_loss(out)
# Model Validation
recon_loss = Variational_AE.evaluate(data_path='./Data/no_sub_test.npy', 
                                     checkpoint_path='./Weights',        
                                     out_path='./Results')
# Plot the reconstruction loss
plt.plot_dist(recon_loss)
# Calculate Wasserstein distance
print(EMD(data_path='./Data/no_sub_test.npy', recon_path='./Results/Recon_samples.npy'))

Adversarial Autoencoder (AAE)

Finally, we consider an adversarial autoencoder which replaces the KL divergence of the variational autoencoder with adversarial learning. We train a discriminator network D to classify between the samples generated by the autoencoder G and samples taken from a prior distribution P(z) corresponding to our training data. The total loss of the model is the sum of reconstruction (MSE) loss and the loss of the discriminator network,

Discriminator Loss | Image by Author
Discriminator Loss | Image by Author

We additionally add a regularization term to the autoencoder of the following form,

Image by Author
Image by Author

As the autoencoder becomes proficient in the reconstruction of inputs the ability of the discriminator is degraded. The discriminator network then iterates by improving its performance at distinguishing the real and generated data.

Implementation using the PyTorch tool:

from unsupervised_lensing.models import Adversarial_AE
from unsupervised_lensing.models.AAE_Nets import *
from unsupervised_lensing.utils import loss_plotter as plt
from unsupervised_lensing.utils.EMD_Lensing import EMD
# Model Training
out = Adversarial_AE.train(data_path='./Data/no_sub_train.npy', 
                           epochs=100,
                           learning_rate=2e-3,
                           optimizer='Adam',
                           checkpoint_path='./Weights',         
                           pretrain=True,                       
                           pretrain_mode='transfer',            
                           pretrain_model='A')
# Plot the training loss
plt.plot_loss(out)
# Model Validation
recon_loss = Adversarial_AE.evaluate(data_path='./Data/no_sub_test.npy', 
                                     checkpoint_path='./Weights',        
                                     out_path='./Results')
# Plot the reconstruction loss
plt.plot_dist(recon_loss)
# Calculate Wasserstein distance
print(EMD(data_path='./Data/no_sub_test.npy', recon_path='./Results/Recon_samples.npy'))

Restricted Boltzmann Machine (RBM)

To compare with our three autoencoder models, we also train a restricted Boltzmann machine (RBM), which is a generative artificial neural network algorithm that is realized as a bipartite graph that learns a probability distribution for inputs. RBMs consists of two layers, a hidden layer and a visible layer, where training is done in a process called contrastive divergence.

A detailed architecture of all the models can be found in Appendix B of the paper.

Implementation using the PyTorch tool:

from unsupervised_lensing.models import RBM_Model
from unsupervised_lensing.models.RBM_Nets import *
from unsupervised_lensing.utils import loss_plotter as plt
from unsupervised_lensing.utils.EMD_Lensing import EMD
# Model Training
out = RBM_Model.train(data_path='./Data/no_sub_train.npy', 
                      epochs=100,
                      learning_rate=2e-3,
                      optimizer='Adam',
                      checkpoint_path='./Weights',         
                      pretrain=True,                       
                      pretrain_mode='transfer',            
                      pretrain_model='A')
# Plot the training loss
plt.plot_loss(out)
# Model Validation
recon_loss = RBM_Model.evaluate(data_path='./Data/no_sub_test.npy', 
                                checkpoint_path='./Weights',        
                                out_path='./Results')
# Plot the reconstruction loss
plt.plot_dist(recon_loss)
# Calculate Wasserstein distance
print(EMD(data_path='./Data/no_sub_test.npy', recon_path='./Results/Recon_samples.npy'))

Results

I have used 25,000 samples with no substructure and 2,500 validation samples per class for training and evaluating the unsupervised models. The models are implemented using the PyTorch package and are run on a single NVIDIA Tesla K80 GPU for 500 epochs. We utilize the area under the ROC curve (AUC) as a metric for classifier performance for all our models. For unsupervised models, the ROC values are calculated for a set threshold of the reconstruction loss. Additionally, we also use the Wasserstein distance value to compare the fidelity of reconstructions. A more detailed set of results can be found in the paper.

ROC-AUC curve for unsupervised algorithms. Plots on left correspond to Model A, plots on the right correspond to Model B. | Image by Author
ROC-AUC curve for unsupervised algorithms. Plots on left correspond to Model A, plots on the right correspond to Model B. | Image by Author
Performance of architectures used in this analysis. AUC values for ResNet are calculated for classification between images with and without substructure - thus it is not a macro-averaged AUC. W₁ is the average 1st Wasserstein distance for images without substructure. | Image by Author
Performance of architectures used in this analysis. AUC values for ResNet are calculated for classification between images with and without substructure – thus it is not a macro-averaged AUC. W₁ is the average 1st Wasserstein distance for images without substructure. | Image by Author

Future Work and Final thoughts

Although we got some very promising results for our unsupervised models, there is still further room for improvement in their performance compared to the supervised results of the ResNet model. I am currently exploring the application of graph-based models since they have been successful in tasks related to sparse datasets such as sparse 3D point clouds and sparse detector data. Another future task is using transfer learning to train our architecture on real data by starting from our models which have been trained on simulations.

I want to thank my mentors Michael Toomey, Sergei Gleyzer, Stephon Alexander, and Emanuele Usai, and the entire CERN-HSF community for their support. I had a great summer working on my Gsoc project. I also want to thank Ali Hariri, Hanna Parul, and Ryker Von Klar for their useful discussions.

To students who want to participate in GSoC in the future, don’t view GSoC as a competition or an exam that needs to be "cracked". GSoC is about open-source development and becoming a part of a wonderful community of developers. Find projects that you are passionate about and understand the requirements of the organization. Most importantly, stay active on community forums and interact with your project mentors regularly.

Thank you, Google, for giving me such an amazing opportunity.

Update: The DeepLense project is now a part of the ML4SCI umbrella organization.

Important links

Decoding Dark Matter Substructure without Supervision

Simulating Dark Matter with Strong Gravitational Lensing

DeepLense-Unsupervised/unsupervised-lensing

DeepLense-Unsupervised/PyLensing

Pranath Reddy – Birla Institute of Technology and Science, Pilani – Hyderabad, Telangana, India |…

pranath-reddy – Overview


Related Articles