Spatial Transformer Networks Tutorial, Part 2 — Bilinear Interpolation

A Self-Contained Introduction

Thomas Kurbiel
Towards Data Science

--

Spatial Transformer modules are a popular way to increase spatial invariance of a model against spatial transformations such as translation, scaling, rotation, cropping, as well as non-rigid deformations. They can be inserted into existing convolutional architectures: either immediately following the input or in deeper layers. They achieve spatial invariance by adaptively transforming their input to a canonical, expected pose, thus leading to a better classification performance. The word adaptive indicates, that for each sample an appropriate transformation is produced, conditional on the input itself. Spatial transformers networks can be trained end-to-end using standard backpropagation.

Spatial transformer module transforms inputs to a canonical pose, thus simplifying recognition in the following layers (Image by author)

In this tutorial, we are going to cover all prerequisites needed for gaining a deep understanding of spatial transformers. In the last post, we have introduced the concepts of forward and reverse mapping. In this post we will delve into the details of bilinear interpolation. In the next post, we will introduce all building blocks a spatial transformer module is made of. Finally, in the fourth and last post, we will derive all backpropagation equations from scratch.

Linear Interpolation

As a warm-up we will start with the simple one-dimensional case. Here we are dealing with a sequence of data points that lie on an equally spaced grid:

Sequence of data points (Image by author)

Since in this tutorial, we will mostly deal with image data, we can safely assume the space between two consecutive data points to be one without loss of generality.

Please note, that the discrete sequence in the diagram above is defined only for integer positions and undefined everywhere else. However, oftentimes we require values at non-integer positions, such as 2.7 in the above example. This is accomplished by interpolation techniques, which estimate the unknown data values from known data values. In the following, we will refer to undefined points, which we want to estimate using an interpolation technique, as sample points and denote them with letter 𝑥.

In linear interpolation we simply fit a straight line to only (!) two neighboring points of 𝑥 and then look up the desired value:

Linear interpolation (Image by author)

We find the neighboring points of 𝑥 by taking the floor and ceil operations. Remember: floor() rounds 𝑥 to the nearest integer below its value, whereas ceil() rounds it to the nearest integer above its value.

Since both neighboring points lie on the grid, we known their 𝑈 values (shown above to the right of the arrow). Furthermore, for two consecutive points on the grid we have that:

Next, we must fit the following straight line to the two neighboring points:

where 𝑚 is the slope and 𝑏 is the intersect. We obtain 𝑏 by plugging the left neighboring point into the equation:

The slope 𝑚 is obtained by plugging the right neighboring point into the equation:

Our preliminary formula hence is:

To gain more intuition, let us expand the above product and rearrange:

Let us denote the distance from the sampling point 𝑥 to its right neighboring point as:

which we can also rewrite as:

Thus, we arrive at our final formula:

See how linear interpolation can be interpreted in terms of the values of neighboring points weighted by opposite distances:

Linear interpolation intuition (Image by author)

Bilinear Interpolation

Next, we will extend our foray to the two-dimensional case. Here we are dealing with a set of data points that lie on an equally spaced two-dimensional grid:

Data points on an equidistant 2-D grid (Image by author)

As in the 1-dimensional case, values at non-integer positions are undefined and must be estimated using interpolation techniques.

In bilinear interpolation we fit a plane to the closest four (!) neighboring points surrounding the sample point (𝑦, 𝑥) and then look up the desired value:

Bilinear interpolation (Image by author)

Luckily, to obtain the equation of the plane, we don’t need to solve a complicated system of linear equations, as done in the last section. Instead we will repeatedly apply linear interpolation (1-D) to two points each. We start with the upper two points and apply linear interpolation in horizontal direction. Next, we do the same for the lower two points. Finally, to obtain the desired value, we apply linear interpolation in vertical direction to the just obtained intermediate points, see animation below:

Extending linear interpolation to bilinear interpolation (Image by author)

Linear interpolation in 𝑥-dimension:

Linear interpolation in 𝑦-dimension:

Combined, we get:

Due to symmetry, the order of the above two steps is irrelevant, i.e. carrying out the linear interpolation first in vertical direction and subsequently in horizontal direction, yields exactly the same formula.

Bilinear interpolation has also a nice geometric visualization: to obtain the value of the desired point (blue), we have to sum up the products of the value at each corner and the partial area diagonally opposite the corner:

Bilinear interpolation intuition (Image by author)

There are two other important interpolation techniques called: nearest-neighbor interpolation and bicubic interpolation. The nearest-neighbor interpolation simply takes the nearest pixel to the sample point and copies it to the output location. Bicubic interpolation considers 16 neighbors of the sample point and fits a higher order polynomial to obtain the estimate.

In the upcoming parts of the tutorial we will explain all concepts of spatial transformer networks using bilinear interpolation. The concepts can easily be extended to more advanced interpolation techniques.

References

Original Paper
Linear Interpolation
Bilinear Interpolation

--

--