?
The t-distribution, is a continuous probability distribution that is very similar to the normal distribution, however **** has the following key differences:
- Heavier tails: More of its probability mass is located at the extremes (higher kurtosis). This means that it is more likely to produce values far from its mean.
- One parameter: _The t-distribution has only one parameter, the degrees of freedom, as it’s used when we are unaware of the population’s variance._
An interesting fact about the t-distribution is that it is sometimes referred to as the "Student’s t-distribution." This is because the inventor of the distribution, William Sealy Gosset, an English statistician, published it using his pseudonym "Student" to keep his identity anonymous, thus leading to the name "Student’s t-distribution."
Theory & Definition
Let’s go over some theory behind the distribution to build some mathematical intuition.
Origin
The origin behind the t-distribution comes from the idea of modelling normally distributed data without knowing the population’s variance of that data.
For example, say we sample n data points from a normal distribution, the following will be the mean and variance of this sample respectively:


Where:
- x̄ is the sample mean.
- s is the sample standard deviation.
Combining the above two equations, we can construct the following random variable:

Here μ is the population mean and t is the t-statistic belongs to the t-distribution!
See here for a more thorough derivation.
Probability Density Function
As declared above, the t-distribution is parameterised by only one value, the degrees of freedom, ν, and its probability density function looks like this:

Where:
- t is the random variable (the t-statistic).
- ν is the degrees of freedom, which is equal to n−1, where n is the sample size.
- Γ(z) _is the gamma function, which is:_

Don’t worry too much about this scary maths (I certainly don’t!), but the key things to know are:
- The PDF is symmetric and is overall bell-shaped.
- Closely resembles a standard normally distributed variable, the mean is 0 and the variance is 1, except that it is a bit shallower and wider.
- As ν increases, the t-distribution approaches the standard normal distribution.
Characteristics
- The mean is defined as follows for ν > 1:

- And the variance is defined as follows for ν > 2:

Example Plots
Below is an example plot of the t-distribution as a function of various degrees of freedom and also compared to the standard normal distribution:
# Import packages
import numpy as np
from scipy.stats import t, norm
import plotly.graph_objects as go
# Generate data
x = np.linspace(-5, 5, 1000)
normal_pdf = norm.pdf(x, 0, 1)
# Create plot
fig = go.Figure()
# Add standard normal distribution to plot
fig.add_trace(go.Scatter(x=x, y=normal_pdf, mode='lines', name='Standard Normal'))
# Add t-distributions to plot for various degrees of freedom
for df in [1, 5, 10, 20]:
t_pdf = t.pdf(x, df)
fig.add_trace(go.Scatter(x=x, y=t_pdf, mode='lines', name=f't-distribution (df={df})'))
fig.update_layout(title='Comparison of Normal and t-distributions',
xaxis_title='Value',
yaxis_title='PDF',
legend_title='Distribution',
font=dict(size=16),
title_x=0.5,
width=900,
height=500,
template="simple_white")
fig.show()

Notice that as the degrees of freedom, df, get larger and larger the t-distribution becomes similar to the normal distribution. It’s at around df=30 when we say the two distributions are sufficiently similar.
Applications
The following are the most common and frequent applications of t-distribution in Data Science and machine learning:
- T-test: _The most famous application of the t-distribution is hypothesis testing through use of the t-test, which measures the statistical difference between two sample means. You can check my previous blog about it here:_
- Confidence intervals: For small sample sizes (typically less than 30), it is used to compute the confidence interval for that certain statistic with increased uncertainty. You can read more about confidence intervals here:
- Regression: The t-distribution is used to determine if we should add certain covariates to our regression model and calculate hypothesis tests around the significance of their coefficients.
- Bayesian Statistics: _The t-distribution is sometimes used as a prior distribution in bayesian inference, which can be applied in all areas of data science, particularly reinforcement learning. See here for more info:_
- Quantitive Finance: In finance, assets, and derivatives often have excess kurtosis therefore they are modelled by the t-distribution which has heavy tails. This is very useful for Data Scientists in the finance space.
Summary & Further Thoughts
The t-distribution is a useful statistical distribution that is very similar to the normal distribution but with heavier tails. This makes it an important tool in situations where the population variance is unknown. It is parametrised by a single parameter: the degrees of freedom and as it increases, the t-distribution tends to resemble the normal distribution. It has sundry applications within the areas of data science, spanning hypothesis testing with the t-test, constructing confidence intervals for small datasets, and aiding in regression modelling.
The code used in this article is available on my GitHub here:
Medium-Articles/Statistics/Distributions/t_dist.py at main · egorhowell/Medium-Articles
References & Further Reading
- Some more info on the t-distribution.
- Thorough mathematical derivation of the distribution.
- And some more maths behind the distribution.
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.