The world’s leading publication for data science, AI, and ML professionals.

Detecting Exoplanets from Light Curves of Kepler Mission

Analyzing to what extent FFT and Recurrence Plot affects the accuracy of Exo-Planets Classification

Figure 1: Photo by Brett Ritchie on Unsplash
Figure 1: Photo by Brett Ritchie on Unsplash

What are Exo-Planets, and how are they detected?

Before getting to Exoplanets detection, I would like to focus on why search for Exoplanets is important. Is it really worth searching? Whether life is also possible outside our solar system is a profound question of all time. If a planet rich with life is to be found, then it will change humankind forever. Apart from this, this question will also answer the most fundamental questions about our existence. This is where Exoplanet detection comes into the picture.

A planet is a body comprising of gas, dust etc., orbiting a star. All planets outside our solar system that revolve around a star are called Exoplanets. Due to glaring bright light from a star, Exoplanets are difficult to detect using just a telescope. To answer this problem, scientists have devised a method for the detection of these distinct planets. Instead of directly viewing these planets through telescopes, which is not always feasible, they look out for the effects these planets have on the stars they orbit.

One way to find these planets is by looking out for unsteady stars. A star about which a planet is revolving tends to wobble. This is due to the mass of the revolving planet. Many planets have been discovered using this technique. But the problem is, only massive planets like Jupyter can have a gravitational impact on its star, which can cause the star to wobble. Smaller planets like Earth have less impact on a star, thus making the unsteady motion difficult to detect. Then how to detect smaller Exoplanets?

Figure 2: Wobbling star due to gravity of exoplanet (Source: https://en.wikipedia.org/wiki/File:Dopspec-inline.gif)
Figure 2: Wobbling star due to gravity of exoplanet (Source: https://en.wikipedia.org/wiki/File:Dopspec-inline.gif)

Keplar detected smaller planets using another technique called a ‘transit method’. A transit is when a planet passes in front of its star and the observer. Due to this transit, there is a small drop in the intensity of the light reaching the observer. Thus making it less bright. A planet revolving around a star will show a periodic dip in light intensity. This can be seen in the below figure,

Figure 3: Change in the light intensity due to transit (Source: https://en.wikipedia.org/wiki/Transit_(astronomy))
Figure 3: Change in the light intensity due to transit (Source: https://en.wikipedia.org/wiki/Transit_(astronomy))

The primary Eclipse denotes the dip in the intensity of light reaching the observer from the star due to the Exoplanet obstruction. Thus by studying the time interval between consecutive transits, one can classify whether it is a planet or some celestial body. For this research, I have used the output from a similar technique to classify a celestial body as an Exoplanet and Non-Exoplanet.

Extracting Light Curves from Astronomical Data

The time-series data is downloaded from the Kepler website. This data has a .FITS extension. A Flexible Image Transport System also called FITS, is a standard format exchanging astronomical data, independent of the hardware platform and software environment. In python, the ASTROPY library is used to read astronomical data. Both positive and negative samples are downloaded for training purposes. The downloaded data contains a multidimensional array with multiple values. This table of various values is shown below,

Figure 4: Astronomical Data in .FITS file (Source: Image by author)
Figure 4: Astronomical Data in .FITS file (Source: Image by author)

From all those columns, SAP_FLUX was used to train the ML model. Visualization of the time VS SAP_FLUX for positive and negative data are shown below,

Figure 5: Time series of positive and negative data points (Source: Image by author)
Figure 5: Time series of positive and negative data points (Source: Image by author)

It’s well evident that positive data has a specific pattern. This is due to the transit motion of the Exoplanet around the star. Whereas, for negative data, no repeating pattern can be seen. Furthermore, in some cases, there is a random time series in the negative dataset. This time-series data is trained using SVM. It can be seen that the accuracy is around 52% for simple models. Along with the accuracy, the classification report and confusion matrix are presented for evaluation purposes. The remaining article discusses the effect of FFT and RP as preprocessing techniques for time series data on classification accuracy.

Figure 6: Classification Report and Confusion matrix for time series ML model (Source: Image by author)
Figure 6: Classification Report and Confusion matrix for time series ML model (Source: Image by author)

Fast Fourier Transform and Recurrence Plots model

Fast Fourier Transform converts the data from Time-domain to the Frequency-domain. Scipy has a built-in function for converting the Time series flux data to the Frequency domain. The data after FFT can be visualised as shown below,

Figure 7: FFT applied on time series data (Source: Image by author)
Figure 7: FFT applied on time series data (Source: Image by author)

After applying FFT to the time series data, the transformed data is used to train the SVM model. The results after applying FFT are shown below,

Figure 8: Classification Report and Confusion matrix for FFT ML model (Source: Image by author)
Figure 8: Classification Report and Confusion matrix for FFT ML model (Source: Image by author)

It can be seen that the accuracy increases to almost 59% from 52% for the same number of data points and same model. After FFT, the results were evaluated using a Recurrence Plot.

A Recurrence Plot is an image obtained from a time series representing the distances between each time point. This technique can be used to improve the accuracy of classifying the Exoplanets. Python has a library called pyts that contains RecurrencePlot as an inbuilt function. The time series is feed as an input to the function. It generates an image as an output. Recurrence plot image of the input data can be visualized as below,

Figure 9: Recurrence Plot applied on time series data. (Source: Image by author)
Figure 9: Recurrence Plot applied on time series data. (Source: Image by author)

From the above images, it can be seen that for positive data points, that is, the data points for which Exoplanet is true, there is a specific pattern formed in the images. Conversely, in the case of non-Exoplanets, no specific pattern can be found. The images have a random noise.

After converting time series data to RP, these images were used for training a VGG16 model. The Classification report after applying RP are shown below,

Figure 10: Classification Report and Confusion matrix for Recurrence plot model (Source: Image by author)
Figure 10: Classification Report and Confusion matrix for Recurrence plot model (Source: Image by author)

As seen above, FFT performs better than other techniques. But why FFT was chosen for this research. The reason is, in the Keplar mission, Exoplanets were detected using the transit method, as explained at the start of this article. Exoplanets will show periodic dips in the intensity of light reaching the observer. If the data represents an Exoplanet, the periodic time-series data is converted into frequency domain which makes the pattern more visible for the positive class and almost has no effect on the negative class due to the random noise. Therefore, the ML model with FFT preprocessing performs better than other techniques.

Conclusion

  1. The above results show that the accuracy when the data is trained directly using time series data is 52%.
  2. When FFT is applied to the time-series data, the accuracy increases to 59%. Thus by using the same input data and same model, there is an increase in accuracy by 7%. This is due to the pre-processing technique(FFT) applied to the data before training the model.
  3. FFT also performs better than RP preprocessing technique.

Note: The data used for the training is less because of hardware limitations, due to which the accuracy is in the 50’s range. If more data is used, the model achieves higher accuracy. Still, the FFT approach performs better than directly using the time series data for Exoplanet classification.

Complete code for this problem can be found on Github.

References

  1. Asif Amin, R. M., Talha Khan, A., Raisa, Z. T., Chisty, N., SamihaKhan, S., Khaja, M. S. and Rahman, R. M. (2018). Detection of exoplanet systems in kepler light curves using adaptive neuro-fuzzy system, 2018 International Conference on Intelligent Systems (IS), pp. 66-72.
  2. https://www.sciencedirect.com/science/article/pii/S2213133719300319
  3. https://spaceplace.nasa.gov/all-about-exoplanets/en/
  4. https://exoplanets.nasa.gov/search-for-life/why-we-search/

Contact

For more such stories related to Quantum Computing and Machine Learning, follow me on Medium. Also, check out my Github and Linkedin.


Related Articles