The world’s leading publication for data science, AI, and ML professionals.

TensorFlow 2: How to use AutoEncoder for Interpolation

A Short Tutorial on using TensorFlow 2 with AutoEncoder for Interpolation and Denoising

Making Sense of Big Data

Original image from a simulation by the author.
Original image from a simulation by the author.

Autoencoder

Autoencoders are another Neural Network used to reproduce the inputs in a compressed fashion. Autoencoder has a special property in which the number of input neurons is the same as the number of output neurons.

Image created by the author using draw.io
Image created by the author using draw.io

Look at the above image. The goal of Autoencoder is to create a representation of the input at the output layer such that both output and input are similar but the actual use of the Autoencoder is for determining a compressed version of the input data with the lowest amount of loss in data. This is very similar to what Principal Component Analysis does, in a black-box manner. Encoder part of Autoencoder compresses the data at the same time ensuring that the important data is not lost but the size of the data is reduced.

The downside of using Autoencoder for interpolation is that the compressed data is a black box representation— we do not know the structure of the data in the compressed version. Suppose we have a dataset with 10 parameters and we train an Autoencoder over this data. The encoder does not omit some of the parameters for better representation but it fuses the parameters to create a compressed version but with fewer parameters (brings the number of parameters down to, say, 5 from 10). Autoencoder has two parts, encoder, and decoder. The encoder compresses the input data and the decoder does the opposite to produce the uncompressed version of the data to produce a reconstructed input as close to the original one as possible.


Interpolation Methods

Interpolation is a process of guessing the value of a function between two data points. For example, you are given x = [1, 3, 5, 7, 9], and y = [230.02, 321.01, 305.00, 245.75, 345.62], and based on the given data you want to know the value of y given x = 4. There are plenty of interpolation methods available in the literature – some model-based and some are model-free, i.e. data-driven. The most common way of achieving interpolation is through data-fitting. As an example, you use linear regression analysis to fit a linear model to the given data.

In linear regression, given the explanatory/predictor variable, X, and the response variable, Y, the data is fitted using the formula Y = β0 + β1X where β0 and β1 are determined using least square fit. As the name suggests, linear regression is linear, i.e., it fits a straight line even though the relationship between predictor and response variable might be non-linear.

However, the most general form of interpolation is polynomial fitting. Given k sample points, it is straightforward to fit a polynomial of degree k -1. Given the data set {xi, yi}, the polynomial fitting is obtained by determining polynomial coefficients ai of function

by solving matrix inversion from the following expression:

Once we have coefficients ai, we can find the value of function f for any x.

There are some specific cases of polynomial fitting where a piecewise cubic polynomial is fitted to data. A few other non-parametric methods include cubic spline, smoothing splines, regression splines, kernel regression, and density estimation.

Picture from the author's coursework
Picture from the author’s coursework

However, the point of this article is not polynomial fitting, but rather interpolation. Polynomial fitting just happens to facilitate interpolation. However, there is an issue with polynomial fitting methods – whether it is parametric or non-parametric, they behave the way they are taught. What it means is that if data is clean, the fitting will be clean and smooth, but if data is noisy, the fitting will be noisy. This issue is more prevalent in sensor data, for example, hear-beat data captured from your heart-rate sensor, distance data from LiDAR, CAN Bus speed data from your car, GPS data, etc.

How does a Self-driving Vehicle see using LiDAR?

Further, because of the noise, they are harder to deal with, especially if your algorithm requires performing double, or second derivative on such data. In general, those sensor data are timeseries data, i.e. they are collected over time, thus the response variable might be some physical quantity such as speed, the distance of objects from LiDAR mounted on the top of a self-driving car, heart-rate, and predictor variable is time. While operating on such data, there can be a few objectives: I want to have data interpolated to some time-stamp over which my sensor couldn’t record any response, but since sensors operate in the real-time world and because of the underlying physics, those data stay noisy, I also want a reliable interpolation that is not impacted by sensor noise. Further, my requirement may also include derivatives of such timeseries data. Derivatives tend to amplify the noise present in the underlying timeseries data. What if there is a way by which I can get an underlying representation of the data, discarding the noise at the same time? Autoencoder comes to the rescue to achieve my objective in such a case.


Autoencoder as Interpolator

To demonstrate the denoising + interpolation objective using Autoencoder, I use an example of distance data collected from a vehicle by my lab, where the response variable is the distance of the vehicle ahead of my vehicle, and the predictor is time. I have made a small subset of the data available on my GitHub repo as a part of the demonstration that you are free to use. However, it is really small and serves no purpose beyond the tutorial described in this article.

rahulbhadani/medium.com

Okay, it is time to code now.

Note: Before you use data, I should point out that the time (predictor) and message (response) must be re-scaled. In my case, the original time starts from 1594247088.289515 (in POSIX format, in seconds) and ends at 1594247110.290019. I normalized my time value using the formula _(time - start_time)/(end_time - start_time)_. Similarly, the response variable was normalized using (message - message_min)/(message_max -message_min). The sample data provided in my GitHub is already normalized and you can reuse it out of the box.

Training

import glob
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.read_csv("../data/lead_dist_sample.csv")
time = df['Time']
message = df['Message']
import TensorFlow as tf
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units = 1, activation = 'linear', input_shape=[1]))
model.add(tf.keras.layers.Dense(units = 128, activation = 'relu'))
model.add(tf.keras.layers.Dense(units = 64, activation = 'relu'))
model.add(tf.keras.layers.Dense(units = 32, activation = 'relu'))
model.add(tf.keras.layers.Dense(units = 64, activation = 'relu'))
model.add(tf.keras.layers.Dense(units = 128, activation = 'relu'))
model.add(tf.keras.layers.Dense(units = 1, activation = 'linear'))
model.compile(loss='mse', optimizer="adam")
model.summary()
# Training
model.fit( time, message, epochs=1000, verbose=True)

As you can see, I have not performed any regularization as I deliberately want to do overfitting so that I can use the underlying nature of data to the full extent. Now it’s time to make a prediction. You will see that I rescaled back the time axis to original values before making predictions. For this example, I had`time_original[0] = 1594247088.289515,time_original[-1] = 1594247110.290019,msg_min = 33,msg_max = 112`

newtimepoints_scaled = np.linspace(time[0] - (time[1] - time[0]),time[-1], 10000)
y_predicted_scaled = model.predict(newtimepoints_scaled)
newtimepoints = newtimepoints_scaled*(time_original[-1] - time_original[0]) + time_original[0]
y_predicted = y_predicted_scaled*(msg_max - msg_min) + msg_min

Note that I am creating much denser time-points in variable newtimepoints_scaled which allows me to interpolate data on unseen time-points. Finally, here is the curve:

# Display the result
import matplotlib.pylab as pylab
params = {'legend.fontsize': 'x-large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'x-large',
         'axes.titlesize':'x-large',
         'xtick.labelsize':'x-large',
         'ytick.labelsize':'x-large'}
pylab.rcParams.update(params)
plt.scatter(time*(1594247110.290019 - 1594247088.289515) + 1594247088.289515, message*(112 - 33) + 33, label='Original Data')
plt.scatter(newtimepoints, y_predicted, c = 'red', s = 1, label = 'Interpolated Data')
plt.xlabel('Time')
plt.ylabel('Message')
plt.legend()
plt.show()
Image by the author: Interpolated Data and Original Data
Image by the author: Interpolated Data and Original Data

Concluding Remarks

While I trained for only 1000 epochs, your training might not be that short, if your data is big. The biggest advantage of this method is taking derivatives, as from the following plot, it is clear that the derivative performed on the original data is poor – may not even represent the true derivative!

df_interpolation = pd.DataFrame()
df_interpolation['Time'] = newtimepoints
df_interpolation['Message'] = y_predicted
df_interpolation['diff'] = df_interpolation['Message'].diff()/df_interpolation['Time'].diff()
df_original = pd.DataFrame()
df_original['Time'] = time*(1594247110.290019 - 1594247088.289515) + 1594247088.289515
df_original['Message']  = message*(112 - 33) + 33
df_original['diff'] = df_original['Message'].diff()/df_original['Time'].diff()
# Display the result
import matplotlib.pylab as pylab
params = {'legend.fontsize': 'x-large',
          'figure.figsize': (15, 5),
         'axes.labelsize': 'x-large',
         'axes.titlesize':'x-large',
         'xtick.labelsize':'x-large',
         'ytick.labelsize':'x-large'}
pylab.rcParams.update(params)
plt.scatter(df_original['Time'], df_original['diff'], label='Derivative on Original Data')
plt.scatter(df_interpolation['Time'], df_interpolation['diff'], s= 10, c = 'red', label='Derivative on Interpolated Data')
plt.xlabel('Time')
plt.ylabel('Message')
plt.legend()
plt.show()
Image by the author: A comparison of calculating derivative on original and interpolated data
Image by the author: A comparison of calculating derivative on original and interpolated data

The only downside of this method is time-complexity. Depending on the number of data points, it may take hours before your training is complete. However, if you have access to High-Performance Computing clusters, Amazon EC2, or likewise, it may not take too much time to train your Autoencoder.

The notebook to reproduce this tutorial can be found on my GitHub at

https://github.com/rahulbhadani/medium.com/blob/master/01_02_2021/AutoEncoder_Interpolation_TF2.ipynb.

A longer version of the article is posted on ArXiv.org.

If this article benefits you, please use the following citations for referencing my work:

Rahul Bhadani. Autoencoder for interpolation. arXiv preprint arXiv:2101.00853, 2021.

or

@article{bhadani2021autoencoder,
    title={AutoEncoder for Interpolation},
    author={Rahul Bhadani},
    year={2021},
    eprint={2101.00853},
    archivePrefix={arXiv},
    primaryClass={stat.ML},
  journal={arXiv preprint arXiv:2101.00853},
}

If you like this article, you will want to learn more about how to use TensorFlow 2. Check some of my other articles on TensorFlow 2:

Tensorflow 2: Model validation, regularization, and callbacks

Lambda Layer in tf.keras


Related Articles