Soil spectroscopy to predict soil properties
This is the first article of a Series that I am devoting to the use of Deep Learning in Soil Science. My motivation is to show that Deep learning is useful for things other than to classify photos of cats and dogs, or sentiments. Not that there’s anything wrong with cat and dogs, but there are million of examples of that already…
This is an ongoing series and so far it also includes:
Other articles related to geosciences:
A bit of Context
Soil Science is a relatively broad discipline so I will try to give some context about what we do and the type of data with which we usually deal.
Soil in the field and the laboratory
Soil is a complex body which can be described in many ways depending on if you are interested in its physical, chemical and/or biological properties, its location in the landscape, its interaction with the rest of the biosphere, etc.
Usually, the description begins with a soil profile. We dig a pit and we are able to see something like the picture below.

The first you might notice in the picture are the different colours and vertical organisation in layers. Each of those layers have different characteristics and, as soil scientists, we are interested in describing them as thoroughly as possible.
A typical description usually includes attributes that are observable in the field (coordinates, layer thickness, colour, etc.) and information that we get after processing samples in the laboratory (pH, particles size, nutrients content, etc.).
As you can imagine, the field and laboratory components of data acquisition are expensive, so we spend a lot of time trying to optimise sampling designs, predicting what we will find at a certain location, and predicting soil attributes based on things that are easier, faster or cheaper to measure.
Soil spectroscopy
Soil spectroscopy is a technique that allows rapid acquisition of soil information in the field or in the laboratory. Simply speaking, we hit the soil sample with a beam of light and we measure what bounces back. Depending on the composition of the sample, the energy that bounces back to the instrument (spectrometer) differs from sample to sample, obtaining a spectral signature of the sample.

Another way of representing this data is by generating a spectrogram. You have probably seen them used in audio analysis. You can find more information about them here. The 2D structure of the spectrogram makes it a perfect candidate to be ingested by a Convolutional Neural Network.

Convolutional Neural Network model
Designing a convolutional neural network (CNN) is a highly iterative process. Sometimes it feels more like an art than an exact science. Nevertheless, it is always good to read and get inspired by what other people do. I will not explain how they work, but here you can find a good description with an example of image classification of the MNIST dataset.
At the moment of designing the CNN used in this work, a series of factors guided or constrained the process:
- 2D structure of the input data: The spectrogram is a matrix (1-band image) which is probably better to process using a 2D-CNN.
- The data set size is relatively small: It is important to remember that the data is obtained from field samples. That implies going to the field, digging a hole, and scanning the sample in the field or after sending it to the laboratory. This process is time consuming and expensive. The dataset that we use in this example contains about 20,000 samples from all over Europe. Not the smallest dataset that you can find, but small compared with the dataset used to train AlexNet for example (over 15 million images). It is easy to overfit a small dataset, so I used a small network.
- Multiple outputs: There are many soil properties that we can predict using spectroscopy. For this specific study we are interested in predicting: a) organic carbon content (OC), b) cation exchange capacity (CEC), c) clay particle size fraction, d) sand particle size fraction, e) pH measured in water, and f ) total nitrogen content (N). We could train a different model for each of them but I’m interested in the potential of multi-task learning to produce some kind of synergistic effect.
The final network structure looks like this:

The head of the network ("Common layers") is a series of convolutional and max-pooling layers as commonly seen in image classification. This section of the network is shared by all the target soil properties and should be able to learn how the spectrogram is structured. After the "Common layers" extract a general representation of the data represented by the spectrogram, the information is directed to 6 different branches, one for each target soil property. Each branch consists of a convolutional layer (BN) which is flattened (to 1D) before generating the output . The branches should be able to learn signals found in the spectrogram that are specific for each soil property.
Results
Comparison with other conventional methods
Two commonly used models to predict soil properties using spectral data are Cubist regression tree model (Quinlan et al., 1992) and Partial Least Squares regression (PLS; Martens and Naes, 1989). We used these models as a baseline to assess the performance of our CNN. The models where trained using the spectra (not the spectrograms), which were pre-processed using a series of methods commonly used in the literature:
- Converting reflectance to apparent absorbance (a = −log10 (r)).
- Savitzky–Golay smoothing (Savitzky and Golay, 1964), using a window size of 11, and a second order polynomial.
- Edges trimming (< 500 nm and > 2450 nm) to discard artefacts.
- Sampling every tenth measurement.
- Applying a standard normal variate transformation (Barnes et al., 1989).
The figure below compares the prediction error of all the models (PLS, Cubist and CNN). We also included the error of a CNN that predicts a single property as a reference.

The CNN performed better than the PLS and Cubist models and the multi-task CNN generally performed better than a single-prediction CNN.
Synergistic effect of multi-task learning
I think the most interesting result is the synergistic effect observed by using a multi-task network. In the figure below you can see how the prediction error decreased (except for pH) as we increased the number of properties being simultaneously predicted (we modified the network architecture for each case, obtaining a number of branches in the range [1, 6]). Predicting our 6 properties simultaneously decreased the prediction error for OC by almost 50% compared with predicting OC by itself.

When the network is predicting a property, it uses the rest of the predicted properties as "hints" that constrain the prediction. A simplified example is the case of clay and sand content. In the most general case, soil mineral particles are divided in 3 groups of increasing size: clay, silt and sand. The proportion of those 3 groups should add up to 1 (100%). If the model is predicting a very high clay content, that is a hint that indicates that the sand content should be low. Obviously, the interactions between 6 properties are more complex, but the network is capturing this effect. Consequently, we observe a decrease in the prediction error.
Final words
In my research group we usually use Machine Learning techniques like random forest, regression trees, etc. This was my first attempt to use convolutional neural networks to do something different than classifying images.
What I especially liked about this work was the synergistic effect of multi-task learning. In our minds we generate rules that guide us when making decisions and it is neat that the CNN can do something similar.
The results were quite promising and, since I worked on this, I have been trying to use CNNs for everything!
In the next articles I will explore a little bit of transfer learning, and also some applications in soil mapping, so stay tuned!
Citation
More details about this work can be found in the corresponding paper.
Padarian, J., Minasny, B. and McBratney, A.B., 2018. Using deep learning to predict soil properties from regional spectral data. Geoderma Regional. https://doi.org/10.1016/j.geodrs.2018.e00198
References
- Barnes, R., Dhanoa, M. S. & Lister, S. J. (1989). Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Applied spectroscopy 43 (5), 772–777.
- Martens, H. & Naes, T. (1989). Multivariate calibration. John Wiley & Sons.
- Quinlan, J. R. et al. (1992). Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence. Vol. 92. Singapore, pp. 343–348.
- Savitzky, A. & Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry 36 (8), 1627–1639.