Using Machine Learning to Identify the Minerals in Meteorites

Jeremy Neiman

Follow

Published in

Towards Data Science

7 min readFeb 17, 2019

--

This past weekend, The American Museum of Natural History (AMNH) hosted its fifth annual hackathon. This year’s theme was “Hack the Solar System.” My team worked on a project to identify and map the mineral composition of meteorites.

Understanding the mineral composition of meteorites can teach us about the solar system. When they come from interplanetary bodies (such as asteroids), we can learn about how the solar system formed 4.5 billion years ago. Meteorites can even come from other planets, giving us a way to study them from millions of kilometers away.

Chemistry 101

Atoms are the teeny-tiny objects that make up matter. Atoms consist of a nucleus of protons and neutrons in the center surrounded by electrons. The number of protons in the nucleus defines what element that atom is. For example, hydrogen, the lightest element, has 1 proton; oxygen has 8 and iron has 26.

Molecules are comprised of multiple atoms bonded together. For example, a water molecule is made of 2 hydrogen atoms and 1 Oxygen atom. This can be written as a chemical formula as H₂O.

Minerals are groups of solid molecules such as diamonds, quartz or topaz. Atoms can come together in a myriad of ways to form very different minerals. There are only 118 known elements, but together they make up 3800 minerals.

Some mineral formulas can be quite complex, such as topaz: Al₂SiO₄(OH,F)₂ — 2 aluminum, 1 silicon, 4 oxygen and either 2 oxygen+hydrogen or 2 fluorine. While the formulas for others can be quite simple. The formula for diamond is C — just carbon. But, the structure of the atoms is very important too. Diamonds and graphite (the stuff in your pencil) are both comprised of all carbon, but they are so different because the carbon atoms are bonded in different ways.

The chemical structure of diamonds vs graphite. (http://www.jidonline.com/viewimage.asp?img=JInterdiscipDentistry_2011_1_2_93_85026_f3.jpg)

How Meteorites are Studied

The scientists scan meteorites using an electron microprobe (EMP). An EMP shoots a beam of electrons at the meteorite. When the beam of electrons collides with the atoms in the meteorite, the atoms emit x-rays. Each element has a distinct, characteristic frequency.

A graph of characteristic frequencies of different elements. (https://commons.wikimedia.org/wiki/File:XRFScan.jpg)

Before starting the scan, the scientists set the EMP to detect a specific set of elements. When the characteristic x-rays of the desired element are emitted, the detector records the intensity. The EMP produces a series of grayscale images, one for each element, where each pixel represents the proportion of that element at that location.

The 10 images produced by the EMP when scanning a meteorite for 10 different elements. Brighter means that more of that element is present at that location. The 10 elements are: Aluminum (Al), Calcium (Ca), Chromium (Cr), Iron (Fe), Magnesium (Mg), Nickel (Ni), Phosphorus (P), Silicon (Si), Sulfur (S) and Titanium (Ti).

This can tell us what elements are present. But how do we determine the minerals?

The relative intensity of the x-rays from different elements corresponds to the relative proportion of that element in different minerals. The more of an element that is present, the more electrons that will hit an atom of that element and thus emit its characteristic x-ray. For example, the mineral troilite is about 2/3 iron and 1/3 sulfur, which means that we would expect 2/3 of the x-rays emitted to be x-rays characteristic of iron, and 1/3 to be characteristic of sulfur.

But there are a couple of wrinkles that make this difficult. First, the results can be very noisy, meaning that the EMP will rarely give results which match the theoretical element ratio in the mineral. Second, each element’s sensor is scaled differently. For example, an iron reading of 400 might mean 50% iron while a sulfur reading of 200 might mean 100% sulfur.

To control for these, meteorites are scanned along with “standards.” Standards are known minerals that are scanned to understand the behavior of the EMP. We had 8 standard minerals, arranged in a 4x2 grid as shown below. Each square is the labeled with the known mineral that was placed there. The top is the image from the iron (Fe) scan, the middle is from the sulfur (S) scan and the bottom from the nickle (Ni) scan.

Scans from 3 of the 10 elements for the known standard minerals. Each square is the labeled with the known mineral that was placed there. The top is the image from the iron (Fe) scan, the middle is from the sulfur (S) scan and the bottom from the nickle (Ni) scan. *SCOlv stands for San Carlos Olivine, which doesn’t have one distinct chemical formula.

This should give you a sense of how we could figure out minerals from their constituent elements. For example, the Fe-only square is bright in the Fe scan and dark for the rest, indicating that only Fe is present there. Fe₃O4 and FeS both look similar in the Fe scan, but FeS also lights up in the S scan. Similarly, FeS and NiS look similar is the S scan, but the Ni scan can help differentiate them. Also notice all the speckling in the images. Even though we’re scanning known minerals, they are not giving completely consistent readings. We will have to keep this in mind when trying to identify the unknown minerals in the meteorites.

Machine Learning

Now we have all the pieces we need to try and solve this problem. To recap, our end goal is to identify the minerals inside of a meteorite. To do that, we scan the meteorite in an electron microprobe which can tell us the relative proportions of elements at each location in that meteorite. At the same time, we scan a set of known minerals called standards so that we can interpret the results from the EMP properly.

My goal is to create a classifier for the unidentified minerals in our meteorite, but so far we only have labeled data for the standards. We can’t directly build a classifier from the standards because if the meteorite has minerals that were not in the standards, we won’t be able to classify it properly. But if we have a guess what minerals might be in the meteorite, we can simulate those minerals (the target minerals) and build a classifier on top of that. And the standards can help us with the simulation.

The first step is to use the standards to find the coefficient that converts an element’s weight proportion to EMP intensity. We expect a linear relationship between weight proportion and intensity. For example, if a mineral is 50% iron we expect the intensity to be twice as high than if the mineral were 25% iron. This means that we can use linear regression to find the coefficients.

The following shows the regression results for each element. Each blue blob is from a different standard. In theory, each blue blob should be a single point, but they are stretched out because of noise in the readings. Even with noise, you can see a fairly clean linear relationship. But, in the case where the expected proportions are close, such as in the Fe graph, things can be ambiguous.

The linear regression results for each element. The x-axis is the intensity from the EMP results and the y-axis is the theoretical weight-proportion of that element in the mineral.

I can use these results to simulate the target minerals that we’re looking for. To simulate a mineral, we calculate the theoretical weight proportions for each of the mineral’s elements, convert them to expected intensities using the regression results and add random noise based on the noise we found in the standards.

I simulated each of the target minerals 10,000 times and split the samples into 80% train/20% test sets. I trained a random forest classifier on the training set and the resulting model achieved 96.8% accuracy on the test set.

Results

Applying that model to the meteorite scans, here are the results:

The predicted mineral composition of a meteorite. Each color represents a different mineral.

And in a second meteorite:

We don’t have the ground truth for these meteorites, so we don’t know how accurate it is. But the scientists we were working with said that the initial results are promising!

Conclusions and Next Steps

One big issue with this approach is that it requires the scientists to know what minerals they’re looking for ahead of time. So it seems promising in identifying known-unknowns, but will give false classifications for unknown-unknowns.

Another approach my team took towards identifying minerals was to use unsupervised clustering with DBSCAN. It won’t tell you exactly what the mineral is, but it can tell you that there are minerals with similar elemental compositions in the meteorite that might represent a distinct mineral. We think that there might be a way to combine these approaches and get the best of both worlds.

We’re continuing to work on this and are hopeful that it can be more than just a fun hackathon project and become a useful part of the scientist’s toolkit.

All our code can be found on github. We’re still working on it, so things might be changing.

This page describes the original challenge in more detail: https://github.com/amnh/HackTheSolarSystem/wiki/Meteorite-Mineral-Mapping

The rest of the team was: Katy Abbott, Meret Götsche, Peter Kang, Jackson Lee, Cecina Babich Morrow and John Underwood. We were advised by Samuel Alpert and Marina Gemma. Thanks to Abigail Pope-Brooks for editing this post.

Cecina also wrote a blog post about the weekend.