Note: This article is a part of a bigger study. You can find here and love it till death do you guys part.
The majestic gas ball that is the Sun has crucial effects in our lives. We know and live this life thanks to this star and it is close enough to have magnetic and deep study about it too.
In particular, solar flares are sudden flashes that occur on the Sun. You may think that in your life you have more serious stuff to think about, and you are probably right. Nonetheless these little (not so little actually) flashes have several consequences on our lives.
- These flares are often close to sunspots, that are maybe the most important physical phenomena about solar magnetic energy. Indirectly speaking, solar flares due to these sunspots have important effects in our climate as they affect the Sun energy.
- When these flares appear, they are usually accompanied by Coronal Mass Ejection that can disturb our communication, destroy our satellites, kill our space astronauts.
It is thus essential to monitor solar flares, and classify them. It is even more appealing to instruct a computer how to do that with (data and) Machine Learning.
Shall we dance? 🙂
Summary:
1. The challenge
2. The tool
3. The code
4. The conclusion
1. The challenge
As a non astrophysicist I know basically nothing about the Sun. But if I knew a lot about the Sun I would’ve written an analytical function and find it by myself a way to detect solar flares. Unfortunately, even the ones that know a lot about the Sun didn’t manage to do it so far. So the challenge here was to find an algorithm that was able to detect solar flares with 0 domain knowledge, without using any physical quantity or theory or hypothesis. The task is thus the following:
Given a certain number of images, build a classification algorithm that is able to detect whether or not there is a flare on the Sun.
To summarise this with an image, you can think about something like this:

2. The tool
The tool that has been used during this process is called Convolutional Neural Network. You can find more detail of how these little creatures work in here, but essentially they just run over the image and analyze each pixel. Then they build a network that collapses in the last layer that is a probability that the image belongs or not to a certain class.
In this study, two different networks have been used:
A) A convolutional neural network to detect, given a total image of the Sun, whether or not the Sun has active regions (a.k.a. magnetical active zones).
B) A convolutional neural network to detect, given an active zone of the Sun, if the Sun actually has or not Solar Flares in that region.
Why is that? Why am I using two networks? Because the Sun could have active regions, but it is not obvious that these zones have solar flares too.
Let’s give a look to these two beasts:
A) As an input, the first network "eats" black and white images, and it does this:

B) The second one takes as an input coloured images (RGB) and it goes like this:

3. The Code
Ok, here we go.
Let’s start with that: the data extraction part was really (I mean, really) hard. As it is a physical matter, it was hard to find just the raw images as a lot of sources referred to physical quantities only. Moreover, as this is an academic topic and has not an open-source request, data were hard to find by themselves. Let alone be to find reliable images or csvs ready to be used. But we are data scientists and we don’t give up.
So I’ve started scraping.
The scraping part is pretty intense, so please refer to GitHub if you want to see the lines of codes or we will get out of the track. Nonetheless, these were the strategies to get the data.
For the first part, two online sources were used. As we are obviously talking about Supervised Learning, the labels were extracted scraping from the SpaceWeatherLive.com site. **** This source (ftp://ftp.swpc.noaa.gov/pub/) was used to get the Solar Flares images.
For the second part, a single source has been applied and it was this one. I’ve extracted the csv with the solar flares/active regions images link and I’ve written a script that stores the images with their labels on my PC.
As my final goal is to show you the power of the CNNs I will go over the technical part of scraping (but please, again, visit the GitHub page or hit me if you have any doubt).
Let’s start with the first CNN.
3.1 First CNN
Here’s an example of an image of the dataset:
And here you can find an example of the dataset with its labels.
The labels are pretty balanced, as you can observe:
So let’s start playing.
Train/test rigid split:
And cross validation (CV=10):
Lovely accuracy (94.7%). And it is obtained with cross validation so it is pretty reliable.
3.1 Second CNN
Here’s an example of the image



And here is the code
Good results on train/test split:
Good results on the Cross Validation too:
Almost 96% of accuracy.
4. The conclusion
The topic of solar flares is extremely intriguing. A lot of research teams have their accurate way to detect and classify solar flares. Someone look at some physical threshold, someone use the energy distribution etc…
I think that we are not at a point that we can let a super-intelligence discover stuff, formulate theories or kill us all, but these tools are actually a great support and it is always pleasant to have something to rely on, especially in these crazy complex scenarios. In this particular case, the method is extremely powerful as it performs well almost 95 times out of 100.
Of course, a lot of technical issues have been faced and solved in this 3 months project, and these are just the best results out of a long process. Anyway you can find more about data, and a complete report about all the issues and the solutions to that issue in my GitHub repository (here).
Please, let me know if you have any doubt or extra ideas to add to enlighten me (haha, you got it?).
If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:
A. Follow me on Linkedin, where I publish all my stories B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have. C. Become a referred member, so you won’t have any "maximum number of stories for the month" and you can read whatever I (and thousands of other Machine Learning and Data Science top writer) write about the newest technology available.
And follow the Sun. 🙂