Soil spectroscopy and transfer learning
This is the second article of a Series that I am devoting to the use of Deep Learning in Soil Science. This is an ongoing series and so far it also includes:
Other articles related to geosciences:
In the first part of the series I introduced a multi-task convolutional neural network (CNN) to simultaneously predict multiple soil properties from spectral data. We observed a significant decrease in the prediction error compared with conventional methods thanks to a) the superior capacity of CNNs to process complex signals, and b) the synergistic effect of multi-task learning.
In this article I will talk about transfer learning in the context of soil spectroscopy. This is an extension of the previous work, so I will skip concepts like soil spectral data, spectrograms and multi-task learning.
Context
The scale problem
It is believed that models that were generated within a specific area or soil domain will perform best in that area and perform poorly when applied to other contrasting soil types. In soil science, the high spatial dependency of soil properties means that a model generated for a specific region should be carefully used beyond that spatial domain since they may lose their validity. Conversely, the application of a global, continental or national spectral libraries to local areas or regions can also be problematic. National models often performed poorly when applied in a local area within that country. The same is observed when a global model is applied at country level. This is understandable, as a global or national model usually captures general trends with a greater generalisation, spanning different soil types. On the other hand, a local area may have short-scale variation which cannot be captured by the global models.
This article uses the term "global" model referring to a model calibrated based on a large-scale spectral library (the same European library used in the first part of this Series), while "local" refers to an area within the "global" dataset (country). Both, global and local models are valuable and, ideally, we would like to transfer some of the rules learned by the more general global models to a local domain. In Machine Learning, this process of sharing intra-domain information is known as transfer learning.
This work aims to evaluate the effectiveness of transfer learning to "localise" a general soil spectral calibration model to a national context, simulating a situation where the global dataset is not available for the local user. As far as we know, this is the first time transfer learning has been applied to soil spectroscopy modelling.
Transfer learning
"A comic strip in my possession concerns industrial espionage between firms which manufacture expert systems. One firm has gained a lead in the market by developing super expert systems and another firm employs a spy to find out how they do it. The spy breaks in to the other firm only to discover that they are capturing human experts, removing their brains, slicing them very thin, and inserting the slices into their top-selling model."
- Harry Collins. Humans, machines, and the structure of knowledge.

As humans, we are capable of applying previously acquired knowledge to tasks that have similar characteristics. Transfer learning, also known as induction learning, is a branch of ML which tries to emulate this process. In our particular case, given a global data domain G and a local data domain L, with L⊂G, a traditional ML approach considers both domains as different, generating two independent models, f(G) and f(L). In contrast, acknowledging that G and L are somehow related, transfer learning is capable of generating a model f(L’) using part of the generalisations learned by f(G) in conjunction with data domain L’, with L’⊆L. It is worth noting that, in practice, it is possible that |L’|<<|L|, which gives a considerable advantage to transfer learning, especially when data collection and analysis is limiting.

The logic behind this procedure is that in the first training (on the global dataset) the algorithm generates an internal representation of the nature of spectral data. To successfully learn this representation the model needs a big volume of observations, which is exactly what the global dataset provides. In the subsequent training (on the local dataset) the model already "knows" how spectral data behaves, requiring just enough observations to fine-tune the model and adjust it to local conditions.
Models
We used multi-task CNNs to simultaneously predict organic carbon content, cation exchange capacity, clay content and pH. We compared 3 types of models:
- Local: Only utilising data from the corresponding country. From the available data we randomly held back 10% as a test dataset. From the remaining data, 90% was used for training and 10% for validation and hyper-parameter selection.
- Global: From the whole LUCAS dataset (with about 20,000 samples from all over Europe), the data from the corresponding country was excluded. From the remaining data, 90% was used for training and 10% for validation and hyper-parameter selection.
- Transfer: The training set of the corresponding country was used to "localise" the Global model previously trained with the global data. The validation set of the corresponding country was used for hyper-parameter selection.
The whole transfer process can be summarised with the following figure:

Results
Country by country
In 18 of the 21 countries transfer learning shows a significant improvement for at least one soil property, compared with the Local and Global models. Four countries showed a significant improvement in all four properties. Even in cases where no significant changes were found, there is a clear trend that the transfer model reduced the mean RMSE. In 14 countries the mean RMSE was reduced for all four properties, and in 76 of the total 84 (90.5%) combinations of countries and properties. The Transfer model yielded a mean RMSE decrease, compared with the second best performing model, of 10.5, 11.8, 12.0 and 11.5% for OC, CEC, Clay and pH, respectively.
In few cases, the Transfer model performed worse than the Global or Local models. The reason is that the error of the global model is much larger than the local one, producing negative transfer. In practice, when developing a local model, it is always useful to determine if the source model (global) is beneficial or not, or if a more aggressive transfer is needed, by tracking the performance of the models during training. Ideally, when a positive transfer occurs, the error of the transfer model should show lower initial and final magnitudes as shown bellow.

Final words
Transfer learning proved to be effective to localise a general soil spectral calibration model generated with a continental dataset. For most of the countries considered in this study, there was an improvement compared with using either a Global (the general model) or Local model (generated only with the data from the respective country).
Our findings also highlight the importance of global databases. They are crucial for understanding processes at the planetary scale but also important to complement our knowledge at a local scale. Collaboration can be beneficial for everyone, even for data-rich countries or organisations.
The transfer learning model does not require the global dataset to be available for local training. Once a global model has been calibrated, only the model that needs to be re-trained can be shared. This is a potential solution for the issue of data privacy. Is important to keep in mind that the proposed method is also applicable if the global dataset is available to a local user.
Citation
More details about this work can be found in the corresponding paper.
Padarian, J., Minasny, B. and McBratney, A.B., 2019. Transfer learning to localise a continental soil vis-NIR calibration model. Geoderma (in press).