Hypothesis Testing and Climate Change

Learning hypothesis testing with an example of climate change

Published in

Towards Data Science

7 min readJul 4, 2022

Hypothesis testing is one of the fundamental topics in machine learning and not only. Certainly, if you are going to be given a job interview it will be one question you need to be prepared on. But why is it so important?

Why Hypothesis Testing ?

Suppose you are working on solving a classification problem, and assume that you are only interested in the accuracy of the result. You start by very quickly implementing a simple model A that returns a certain accuracy acc(A) on your test data. After hard work and study, you manage to implement a second model B that is much more complicated than the first one and has an accuracy acc(B).

In the final stage of your project, in the model selection phase, you are going to compare the two accuracies and notice that acc(B) > acc(A). Does this mean that model B is better than A? Well, it’s not that easy… what if B just got lucky? Are you sure that by rerunning the two models on a different split (train-test) of your dataset, B will always result better than A?

Thankfully, statistics provide us with tools to be able to say objectively whether one model performs better than another regardless of the chance events (such as a lucky split) that may occur, which is why we need to study hypothesis testing.

General Idea

Imagine you are a researcher who is investigating the daily temperature variation of the city of Rome in Italy. The daily temperature variation is the difference between the maximum and minimum temperatures measured in a day.

The scientific community believes that on average the daily temperature range is 13°. You, on the other hand, based on your studies, believe that this average is no longer 13° but is larger.
Let’s, therefore, state that the null hypothesis is the hypothesis that is currently taken as true which is that the average range is 13°, while your alternative hypothesis is that the average range is larger than 13°.

But now how do you convince yourself and convince others that your hypothesis is correct and make your new idea prevail?

Test Statistic

Notice carefully that your goal is to reject the null hypothesis, if you fail to do so we say you fail to reject the null hypothesis.

Well, the easiest thing to do is to just go and measure the temperature range in Rome for several days and calculate the average. You decide to go and measure the temperature excursion over the next 60 days, so you collect a sample of data. You average the data for these 60 days and the result is µ = 13.7°. What do you think? Maybe the null hypothesis is correct and my studies are wrong.

Then you take another 60-day sample and you obtain µ = 16.3°. Now you begin to believe instead that your alternative hypothesis is actually correct, although you are not sure.

However, in a final 60-day sample you measure µ = 24.3° instead. Now you are pretty much sure you are right and can finally reject the null hypothesis.

But these considerations are very intuitive and personal. Maybe someone would say that µ = 24.3° is not enough to reject the null hypothesis, perhaps it was just a case where there was a lot of traffic and the measured temperatures were higher. We then want to be able to assert statements more objectively using statistical tests, and be able to reject or not reject the null hypothesis with a certain confidence C (usually C = 95%).

The table below shows the measurements on your data sample taken over the 60 days:

The statistic we are measuring here is the mean of the values, and we want to understand whether it is actually different from the mean accepted by the scientific community: µ = 13°.

We can use a Z-test in our case to see if the average calculated on our sample is significantly different from 13° or if it is just a chance event.

Now take it for granted that the z-test is the appropriate test among the many existing ones to use in this case, in the next articles we will see how to choose the appropriate tests according to the particular situation. The formula for calculating the z-test is the following:

This test will return us a number (z) that is somehow the summary of the various measured during the 60 days of hard work.

Confidence Interval

Now the goal is to use this z-value to see whether or not we can reject the null hypothesis. To do this we will use a right-tailed test because our alternative hypothesis was µ >13°, if it were µ <13° we would have used a left-tailed test and finally for µ !=13° a two-tailed test.

We know that the various z-values we can have come from a Gaussian distribution centred in zero, so we can calculate the probability of having a certain z-value. This probability corresponds to the area under the curve that we call the p-value.

We can calculate the p-value of a score using the tables provided in all statistical books such as this one. (Be careful to use the tables properly depending on whether you are using left, right, or two tailed test)

We want our z-value to be far from the centre, which instead represents the null hypothesis.
Keep in mind that we would like to reject the null hypothesis with a certain confidence level C of say 95%, or with a significance level α = 1-C = 0.05.

Now we can follow a simple rule to reject or fail to reject the null hypothesis:

To understand this simple rule take a look at the following image.

All we needed to reject the null hypothesis with significance α = 0.05 (C = 95%) was a z-value that matched the value of α in the graph. Our z-value is even farther from H0 than the one we imposed, so we can reject H0 with total confidence, which is why we reject if p-value ≤ α.

For example, if our p-value was p-value = 0.16 we could have rejected H0 with confidence of C = 84%.

Let’s code

Do you want real climate data?

Would you like to work on real climate data? I can suggest you an interesting website where you can find free available data collected from several space missions of Earth Observation.

I am currently working at the European Space Agency and I am focusing on improving this gateway where you can find a bunch of staff (including data).

Home

Earth Online presents news and information on European Space Agency activities in the field of Earth observation. The…

earth.esa.int

Here you can browse data from the Copernicus missions.

Home - CSCDA (Copernicus Space Component Data Access)

Various collections of Earth Observation data have been generated since 2010, to cover the needs of the Copernicus…

spacedata.copernicus.eu

Open Access Hub

Since the beginning of operations of the Sentinel-1 mission, Wave Mode OCN products contain the significant wave height…

scihub.copernicus.eu

Final Thoughts

Hypothesis testing provides a reliable framework for making data decisions about the population of interest. It helps the researcher to successfully extrapolate data from the sample to the larger population.

Whether you are working on models to solve climate change issues, or to predict the value of stocks in the next month, you always need to evaluate and understand if your model is better than previous models or the starting baseline, and to do this you need the hypothesis testing.
In future articles I will address the various statistical tests more specifically in the area of Machine Learning and explain how to use them in real world applications.

The End

Marcello Politi

Linkedin, Twitter, CV