Thoughts and Theory
Maxim Ziatdinov¹ ² & Sergei V. Kalinin¹
¹ Center for Nanophase Materials Sciences and ² Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, United States
Data often comes in the form of one-dimensional signals. These can be the time series recording from a sensor or detector measuring the temperature, humidity, and wind velocity in meteorology, heart rate in an individual fitness tracker, or more sophisticated detection systems that are now becoming widespread due to the proliferation of the Internet of Things (IoT) devices and light edge computing such as Arduino, Raspberry Pi, and NVIDIA Jetson. Even without any specialized electronics, recent apps such as Arduino Science Journal give access to the pressure, magnetic field, and acceleration sensors on a cell phone, allowing for a multitude of home experimentation ("How much does the pressure change if I walk up the stairs?", "What is the acceleration pattern when I run or walk?").
In many of these applications, the signal has some characteristic aspects, whether it is heartbeat rhythm, walking gate, vibration pattern that contains information on fitness, health, and other characteristics that we may want to discover. If the time reference points for the signals are well known or the data is periodic, the analysis of such data is straightforward. However, the precise positioning of events and characteristic features is often unavailable and can be something we aim to discover jointly with their characteristic patterns.
Similar problems abound in scientific domains. A simple example can be the X-ray scattering data, where the peaks corresponding to the specific indices can shift with the molar volume of material in solid solutions. Another example can be mass-spectrometry data, where peaks can shift due to surface charging during data acquisitions, Raman spectra where peak shifts indicate strains in materials, and so on. In many cases, such data can be available over a large parameter space, e.g. position in hyperspectral imaging, concentration in combinatorial libraries assessment, or voltage in the active device characterization. The question is – can machine learning help us to process and understand these types of data in an unsupervised manner?
For one of the coauthors (S.V.K.) the interest in this area started with the introduction of the band excitation (BE) method in scanning probe microscopy (SPM). The variety of existing SPM methods are generally based on the interaction between the sharp (sometimes atomically-sharp) probe and the surface. The probe is positioned at the end of the cantilever with a laser beam bouncing off the backside. The small displacements of the probe result in the large shifts of a laser beam on the detector, allowing for the detection of the minute displacements. Practically, displacements as small as fractions of the nanometer can be detected. In terms of relevant forces, for spring constants of the order of ~1 N/m, the corresponding force will be on the order of interatomic forces. Hence the name Atomic Force Microscopy.
The late 90ies have seen a broad introduction of the dynamic SPM methods [1, 2]. In these, the cantilever is excited by an oscillatory signal applied either to the piezo element driving cantilever (intermittent contact topographic imaging, magnetic force microscopy), or cantilever directly (Kelvin Probe Force Microscopy and Electrostatic Force Microscopy). The microscope electronics detects either amplitude and phase of a signal at a single frequency (lock-in detection, LIA), or by tracing the changes of resonance frequency due to tip-surface interactions (phase-locked-loop, PLL). However, both LIA and PLL detect signals at a single frequency only, leaving the system physically underdetermined (i.e., simple harmonic oscillator has three parameters – amplitude, resonance frequency, and quality factor). The assumption of the constant driving force is only approximately correct in many SPM methods leading to spurious signal changes and lack of quantitative information and does not apply to many others.
The band excitation (BE) scanning probe microscopy has been developed to address this problem [3, 4]. In this method, the atomic force microscope detects the segment of the amplitude/phase-frequency resonance curve within a chosen frequency band in parallel (hence the name). The 2D SPM image hence becomes 3D, where each spatial pixel now contains the full frequency-dependent response curve. The fit by a simple harmonic oscillator equation then allows reconstructing the amplitude, resonance frequency, and quality factor maps from which materials properties can be extracted. However, the peak shape can (and more often than not does) differ from the SHO shape. This peak shape information is ignored in the analysis. As one possible strategy, we can utilize prior physical knowledge in the form of the functional form of responses, etc., in least-square fits or Bayesian inference [5]. However, this strategy is often time-consuming, and in many cases, analytical solutions are unavailable.
Unsupervised machine learning to the rescue! The applications of the principal component analysis (PCA) to the BE data in 2007–2008 [6] have started the foray into ML methods for our group that still continues. However, ironically, the first application of the PCA to the band excitation data has shown that meaningful results can be obtained only in a small number of cases where peak shifts within the image are small and the noise level is relatively high. Then a small number of PCA components is sufficient to represent the data, and they can even be identified with specific physical mechanisms. For data sets where the resonance frequency changes a lot, or noise levels are low, PCA yields a large number of components [7] the physical meaning (or even usefulness for exploratory data analysis) is unclear. The same fate befalls other linear decomposition methods such as non-negative matrix factorization, Bayesian Linear Unmixing [8], and others [9].
However, now we can clearly formulate our expectations for the "ideal" unsupervised ML method for such data sets. It should be able to find the descriptors or disentangle the representations of the 1D data sets, ignoring (or separating) relative positions of the peaks. And this is exactly what shift variational autoencoders (shift-VAE) are designed to accomplish.
In shift-VAE, we designate one of the latent variables to capture the information about the relative position of spectra, whereas the rest of the latent variables "look" for other (than position) factors of variation, such as height, width, inter-peak distance (if more than one peak), etc. Let’s call this special latent variable an "offset latent variable". We start by creating a 1D x-coordinate grid whose length is equal to the number of points in the spectra. Our encoder maps the inputs (spectra) into the offset latent variable and two (or more) conventional latent variables. Assuming that shifts in position are normally distributed, the offset latent vector is sampled from a unit Gaussian distribution. The sampled values are used to shift our coordinate grid, which is then concatenated with the conventional latent variables and passed to the VAE’s decoder to enforce a geometric consistency between the shifted spectra.
Now, let’s experiment with the shift VAE and compare it to the PCA/NMF and conventional VAE. As a first step, we can generate a synthetic data set formed by Gaussians (or linear combinations of Gaussians and Lorentzians with the same center).

Here, the centers and width of the Gaussians are stored as the ground truth data, whereas the curves themselves form the feature set.

Simple PCA analysis implemented in the accompanying notebook (see link at the end of the article) indeed illustrates that the number of components required to represent such data set is large, and the readers are welcome to play with the behavior of the components as a function of the data spread in the parameter space and noise level.
Now let’s apply the simple VAE. To do this, we are going to use our pyroVED package which is built on top of the Pyro probabilistic programming language. Using the pyroVED (as pv ), the VAEs can be trained with just a few lines of code:

In this case, the VAE encodes the dataset via the two latent variables, and the distribution of the data points in the latent space of the trained VAE model is shown in the figure below.


The results are not so bad! Our data points clearly show the structure resembling the joint uniform distribution. To check this guess, we can color each point using the ground truth label and see that labels corresponding to width are changing (mostly) from top to bottom of the image, whereas the ones corresponding to the center position are changing (mostly) from left to right. We can also reconstruct the curves from the grid of points in the latent space and observe that indeed the width changes from top to bottom and position changes from left to right. Hence, our variational autoencoder has (mostly) disentangled the representations of the data, and in this case one of the disentangled factors of variation is the width and another is the position.
However, once we plot our latent variables versus the ground truth, we see that while there is a definite relationship between the ground truth variables and latent variables, they are not equal. Furthermore, the offsets are encoded in arbitrary units, which is not of great practical use.

Now, let’s repeat this analysis with the shift VAE. Here, the relative shift is separated as a special (offset) latent variable, and the remaining variability is encoded as two conventional latent variables. To do this in pyroVED, we just need to change coord=0 to coord=1:

The corresponding latent space representation is shown below:

Note that the latent space is quasi-collapsed, with the data points forming a 1D manifold (although there is a tiny "leakage" from the offset latent variable to the first ("collapsed") latent variable). Classically in VAEs, this collapse of latent space is perceived as a problem calling for the adjustment of the "loss" function. However, here we know that our ground truth data set has only two factors of variability, one of which is position. Hence, the dimensionality of our latent space hints at the true physical dimensionality of the data!

Note that the reconstructed spectra from the latent space of the system are all centered at zero, suggesting that the shift VAE was able to separate the shift and other variability factors. Finally, the offset derived from shift-VAE is numerically very close to the ground truth values:

This summarizes the introduction of the shift-VAE. Feel free to play with the notebook and apply it to your data sets. Please check out our pyroVED software package for applying this and other VAEs to scientific image and spectral data (it is in the alpha stage and the contributions/suggestions are more than welcome!). If you are interested to learn more about different SPM modes as applied to the electrical and electromechanical characterization, you are welcome to sign to our YouTube channel M*N: Microscopy, Machine Learning, Materials.
Finally, in the scientific world, we acknowledge the sponsor that funded this research. This effort was performed and supported at Oak Ridge National Laboratory’s Center for Nanophase Materials Sciences (CNMS), a U.S. Department of Energy Office of Science User Facility. You can take a virtual walk through it using this link and tell us if you want to know more.
The executable Google Colab notebook is available here.
References
- Butt, H. J.; Cappella, B.; Kappl, M., Force measurements with the atomic force microscope: Technique, interpretation and applications. Surf. Sci. Rep. 2005, 59 (1–6), 1–152.
- Garcia, R.; Perez, R., Dynamic atomic force microscopy methods. Surf. Sci. Rep. 2002, 47 (6–8), 197–301.
- Jesse, S.; Kalinin, S. V.; Proksch, R.; Baddorf, A. P.; Rodriguez, B. J., The band excitation method in scanning probe microscopy for rapid mapping of energy dissipation on the nanoscale. Nanotechnology 2007, 18 (43), 435503.
- Jesse, S.; Kalinin, S. V., Band excitation in scanning probe microscopy: sines of change. J. Phys. D-Appl. Phys. 2011, 44 (46), 464006.
- Vasudevan, R. K.; Kelley, K. P.; Eliseev, E.; Jesse, S.; Funakubo, H.; Morozovska, A.; Kalinin, S. V., Bayesian inference in band excitation scanning probe microscopy for optimal dynamic model selection in imaging. J. Appl. Phys. 2020, 128 (5), 10.
- Jesse, S.; Kalinin, S. V., Principal component and spatial correlation analysis of spectroscopic-imaging data in scanning probe microscopy. Nanotechnology 2009, 20 (8), 085714.
- Belianinov, A.; Kalinin, S. V.; Jesse, S., Complete information acquisition in dynamic force microscopy. Nat. Commun. 2015, 6.
- Dobigeon, N.; Brun, N., Spectral mixture analysis of EELS spectrum-images. Ultramicroscopy 2012, 120, 25–34.
- Kannan, R.; Ievlev, A. V.; Laanait, N.; Ziatdinov, M. A.; Vasudevan, R. K.; Jesse, S.; Kalinin, S. V., Deep data analysis via physically constrained linear unmixing: universal framework, domain examples, and a community-wide platform. Adv. Struct. Chem. Imag. 2018, 4, 20.