Data Science Deciphered: What is a Spline?

Katie House
Towards Data Science
4 min readAug 10, 2018

--

In a meeting at the Reich Lab this week, I had a realization. We were not using normal-people talk. It’s funny how mathematical terms are so enigmatic, when in fact, they (usually) refer to something not-so-complicated!

This week, the term was fitting a basis spline, that at first threw me for a loop. Let’s dissect this for a second.

** Disclaimer: before you read any further, there is no math in this article. I will try to investigate splines so that they make sense in simpler terms. If you are looking for a proven mathematical definition of splines, check out this chapter.**

Fitting Makes Sense

The word “fitting” isn’t too scary. We use this word all the time! A dress fitting, fitting in the elevator, fitting into old clothes — to name a few… Surprisingly, the mathematical term “fitting” means almost the same thing.

Let’s say you are buying some new cycling shoes. How do you know you have a “perfect fit”? If the shoe is too loose, your feet will wiggle around. If the shoe is too tight, you’re in for a surprise bunion. A perfect fit is hard to come by. You probably have to compromise in some way and find a shoe somewhere between too tight and too loose.

Now imagine a bunch of data points on a graph. Lets say you wanted to find a line that fits those points. “Fitting a curve” basically means finding a curve that is as close to the data as desirable. This is a lot like finding a perfect shoe. Your data points (your feet) are usually unique and the line you fit (the shoe) will most likely not look exactly like the data.

Check out this line fitting a bunch of data:

Notice how the final curve doesn’t go through every data point. Instead, it finds the a good middle-ground. This middle-ground is determined by whatever mathematical function/algorithm you choose to fit. To sound fancy, you can say today you explored data fit to a curve by a Gauss–Newton algorithm with a variable damping factor (the gif above). Or you can say you found a new pair of cycling shoes.

A Spline is…

What is it… a bunch of cactus spines?

Unfortunately, not. :(

The Wikipedia definition: “a function defined piecewise by polynomials” doesn’t help much either. This definition assumes you know what a polynomial piecewise function means. Which may be a good subject for the next Data Science Deciphered article (Yes I am hoping to do more!).

Let’s put it in simpler terms. Have you ever used the line tool on PowerPoint? If not, this pretty much sums it up:

Turns out, PowerPoint knows how to make splines! Splines add curves together to make a continuous and irregular curves. When using this tool, each click created a new area to the line, or a line segment. Each click also creates what’s called a control point, or points that determine the shape of the curve.

And that’s the gist of a spline. They create smooth curves out of irregular data points— cool, right?!

Okay so Basis Splines

Basis splines are just a type of spline formation. Splines can look different depending on which type you use. This makes sense — there are many different ways to form a line through data points.

Take this image for example:

source

Each colored line is a different type of spline. The red line is a type of basis spline!

Why fit a curve with basis spline?

This question goes a little beyond the scope of this article. Essentially, these types of lines are helpful when you’d like to fit a bunch of points to a smooth curve, but are unsure of what the underlying structure of the data points.

Data is unpredictable sometimes:

And basis splines help make sense of it!

--

--

Data scientist interested in making data science straight forward and drinking lots of coffee.