Data for Change
Why is Agronomics Optimization important in Agriculture? Every year, 9 million people die from hunger which is 25,000 people per day.³ This staggering statistic reveals not only the need for equalities in distributions of food but also the need for more sustainable practices in agriculture.
From the scale of a small farm to a large corporation, it is becoming increasingly important to maximize resources of land and resources. Thanks to agronomics and data collection, we are starting to learn more about how agricultural practices can create a better future. How can we use data such as fertilizer use, geodata, soil data, and yield volume to gain a better understanding of best practices? If a farmer can learn more about his particular plot of land with data analysis, data-driven best practices can be applied to traditional farming techniques.
John Deere Dataset
This dataset is from the John Deere Challenge as part of the Hackdays Rhein-Neckar 2021 hackathon. My talented hackathon team included team members: Caroline Pereira, Andrew Fessler, and Ileskhan Kalysh.
The extensive dataset by John Deere contains 18 million data points spanning over a decade of data collection. In this dataset, we found data pertaining to three parts of a year for 10 years for a plot of land in Oregon, Illinois. We originally focused on the latest data from the 2019 season. This included the initial seedings which were done in April, the applications of products (fertilizers) which were done in April and June, and the harvestings which were done in October.
The variables included in this dataset were as follows:
Crop Variety, Wet Mass, Moisture, Date and Time, Elevation, Machine, Distance, Swadth Width, Section ID, and Yield Volume.
In the above maps, we used the GeoPandas python package² to map out the location of where the seedings, applications and harvestings were done for this particular field. The seedings and the applications were performed in the same areas of the field yet the harvestings were only done in a portion of the field. The data doesn’t reveal why a portion of the field was not harvested. This is one of the first conclusions that arose from the analysis is that not all agricultural data is explainable by the data itself. In this case, there were external factors (whether the data was simply not collected or a problem happened in this particular area of the field) that only the farmer could help give insight.
The graphs above show the location of the yield volume on the field in quartiles. For example, in the first map we see the lowest quartile. There seems to be a concentration more in the northern part of the field as opposed to the third and fourth quartiles which tend to be more concentrated in the southern part of the field. We can began to hypothesis whether this has any correlation to soil type or other conditions in the different parts of the field.
The above graphs show the location of the distribution of the products during the 2019 applications. The first map shows the High rate use in red which was sprayed around the borders of the field. The second map shows in yellow the low rate use which was sprayed everywhere except the borders. The third map shows where the ammonia was applied. This is interesting in relation to the previous maps because we can start to think about if and where the fertilizers were sprayed and correlation to the yield volume. Nitrogen fertilizers were the products used in the dataset. They are compressed from a gas into a liquid. They react with water in the soil and then change to ammonium form.¹
Weather Data
Weather data contributes to the understanding of how crops performed in a season. The first graph below is the 2019 season in relation to the weather for the year. This data would be very useful for farmers to help predict the timing of the year for when to plant and harvest. In this particular season we see the rainy season followed the first seedings in April which is very good for the crops. We also see that the harvest was interrupted by a frost which might have affected the harvest for the year.
The second graph shows the weather data overlaid with the data points from all 10 years of the collected data. The timing of the seedings and harvestings are generally outside of the time periods of frost and snow for the region.
Soil Data
By using data from the United States Department of Agriculture⁴, we are able to determine which areas of the field consist of which specific soils. Soil is a major component in the proper growth of a crop. The type of soil determines how quickly water and nutrients flow into and around crops. The variety of silty loam soils in this case may have effects over time on the concentration of nutrients as well as the shape of the land due to runoff, etc.. In fact, we see that there are subtle differences in some areas, especially with higher slopes.
Plotting Data Points
We began to think about specific questions that we could learn from the data:
What is the amount of seeded area?
What were the total seeds planted each year?
What were the seeds planted per acre?
What was the harvest area in acres?
What was the total harvest yield each year?
What was the harvest yield in pounds?
By having this information in numbers and in graphs, a farmer can look quickly at the data to start to see trends over the years in his farming techniques.
Was there a problem with the weather in a particular year? Was there a plant disease? Does a certain applied product not perform well? Was the data just not recorded?
The farmer can consider all factors affecting his particular field and begin to predict future years yield by taking into account these factors.
Probabilistic Forecasts
These boxplots show the distribution of yield volume over applied product types and seeded corn varieties respectively. This can be useful to see which product types or corn varieties tend to give more yield. From the first boxplot we can see that NH3 tends to give more yield volume on average than PP high-rate and PP low-rate types.
We propose probabilistic forecasting instead of deterministic, because it gives information on the uncertainty of the forecast. In Agriculture, the difficult problem is controlling the unpredictable. The output we get in probability ranges in points so that farmers can better plan his operations.
Conclusions and Moving Forward
By analyzing data with readable charts and graphs, a small farm will be able to come to conclusions about which problems to address for higher yield volume in future seasons. As I mentioned before, we originally focused on the 2019 season and then moved on to applying our findings to the other years. One possibility moving forward would be to look closer at the particular problem years where the harvest was much lower than the seeding. Since the data doesn’t tell the full story for a few of these scenarios, it could be helpful to interview the farm management to learn more if there were internal problems for those seasons and also to check the weather data for a particularly dry season.
References
De Ferrante, B., Van Der Vorm, P. & Van Diest, A. Comparative studies on the usefulness of ammonium sulphate and urea as fertilizers for lowland rice. Fertilizer Research 10, 119–133 (1986). https://doi.org/10.1007/BF01074367
Kelsey Jordahl, Joris Van den Bossche, Martin Fleischmann, Jacob Wasserman, James McBride, Jeffrey Gerard, … François Leblanc. (2020, July 15). geopandas/geopandas: v0.8.1 (Version v0.8.1). Zenodo. http://doi.org/10.5281/zenodo.3946761
Nations, United. « Losing 25,000 to Hunger Every Day ». United Nations, https://www.un.org/en/chronicle/article/losing-25000-hunger-every-day. Accessed March 4, 2021.
Web Soil Survey – Home. https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm. Accessed March 4, 2021.