Early Corn Yields Prediction Using Satellite Images

Joe Phongpreecha
Towards Data Science
7 min readAug 7, 2018

--

Growing up in a family whose business is primarily distribution of agricultural produce, it is always a challenge deciding when we will sell the product, and for how much as these ultimately depend on how much of the produce will be harvested at the end of the season. If there is a way to predict how much will be obtained at the end of the season, we would be able to make decision much easier. The implication for this project is much more than just my family business of course, big businesses can use this model to optimize their price and inventory, government can prepare for food shortage, even farmers can be informed of appropriate selling price if they know the regional yields.

Previous studies were able to show that satellite images can be used to predict the area where each type of crop is planted [1]. This leaves the question of knowing the yields in those planted areas. To this end, this project aims to use data from several satellite images to predict the yields of a crop. We chose corn as an example crop in this study. Corns are usually harvested in October, therefore, I ultimately aim to predict corn yields with only data before October.

In this Medium page, I will outline a brief summary of the project, for more detail analysis please visit here, and for the code used in this project, please go to my Github page.

Data Source:

I queried images from 4 satellites in Google Earth Engine data set starting from March to December (total of 146 GB ) in county level. These satellites include,

  1. MODIS Terra Surface Reflectance
  2. MODIS Surface Temperature
  3. USDA-FAS Surface and Subsurface Moisture
  4. USDA-NASS (for masking)

Each year, 38 images were collected for a county (March to December), each with a total of 11 bands (combined number of bands from the first three satellites). The corresponding yield for that year and county were collected in a county-level from USDA QuickStats.

I collected data from year 2010–2016. I will use data from year 2010–2015 for training purposes, and we will use the data from year 2016 for testing. If we look at data from each year and county as a video (of 38 frames), there are a total of 7,709 videos for training set and 1,353 videos for test set.

Exploratory Data Analysis

Again, a detailed analysis can be found on my webpage, here I will put just important figures and brief analysis. Let’s first see what the images look like in Fig. 1.

Fig. 1. Example of the image from the first band from each satellite (Scott County, Iowa, 2010).

Now let’s see if the values in each channel from each satellite correlates to the annual yield of that county in Fig. 2&3. Turns out there is!

Fig. 2. The correlation between yearly-averaged values in each band of MODIS Terra Land Reflectance and corn yields.
Fig. 3. The correlation between yearly-averaged values in each band of MODIS Land Surface Temperature and USDA-FAS Land Moisture and corn yields.

Modeling Results

Even though the image has been preprocessed, to put all 9,062 images with 11 bands per image and average size about 100 × 100 pixels, the training process would be extremely slow. Therefore, we further engineer the image before putting into the model. Specifically, we binned the values in each channel into 128 bins i.e. 1 row. For example, in MODIS Terra Land Reflectance band 1, we would bin the value of the image in that band into a 128 bins equally separating values from 0 to 4000 and then normalize the counts with total number of non-zero pixel within that band of image. This shrinks an entire image of about 100 × 100 to just 128 elements. The logic behind this is based on that each farm’s yield is not related to its surrounding. Therefore, the average yield in each county should only be correlated to the distribution of yields of farms in that county only. Figure 4 summarizes the process from data collection to feeding into model with dimensions of data shown. This technique is derived from the previous study [2].

Fig. 4. Summary of the workflow from data collection, preprocessing, binning, and model concept.

After the data has been fully preprocessed, it is fed into the model. We can look at these data as video or audio file, where each year we generate a maximum of 38 frames (with a height and width of 1 and 128). Therefore, we chose 5 models that could be used for video classification problems and modify them for regression problem in this study. These include

  1. Self-constructed convolutional neural network (CNN) followed by recurrent neural network (RNN). Herein, long-short term memory (LSTM) is used for RNN as it is commonly used to avoid gradient vanishing/exploding issues in vanilla RNN.
  2. Same as 1. but we use separable CNN instead.
  3. CNN-LSTM as defined by Xingjian et al. [3]
  4. 3-Dimension (3D) CNN
  5. CNN-RNN followed by 3D CNN.

The concept of a single layer CNN-RNN is shown in Fig. 4, where CNN is applied to all inputs prior to RNN to encode spatial data. RNN then take each frame (time input) as an input. The sequence output from RNN is then fed another layer of CNN-RNN (i.e. stacked layers) or to fully connected layer (with appropriate dropout and regularization) and finally feed to activation layer to yield predicted corn yield for that county in a certain year.

For example, the code for model 1 in Keras would look like,

def CNN_LSTM(self):frames_input = Input(shape=self.input_shape)
vision_model = Sequential()
vision_model.add(Conv2D(64, (1, 2),
activation='relu',
padding='same',
input_shape=self.image_dim))
vision_model.add(BatchNormalization())
vision_model.add(MaxPooling2D((1, 2)))
vision_model.add(Flatten())
vision_model.add(BatchNormalization())
encoded_frame_sequence = TimeDistributed(vision_model) \
(frames_input)
encoded_video = LSTM(256,
activation='tanh',
return_sequences=True) \
(encoded_frame_sequence)
fc1 = Dense(64, activation='relu',
kernel_regularizer=regularizers.l2(0.05)) \
(encoded_video)
out = Flatten()(fc1)
out = Dropout(0.5)(out)
output = Dense(1, activation='relu')(out)
CNN_LSTM = Model(inputs=frames_input, outputs=output)
return CNN_LSTM

Each type of model was aimed to have 4,500,000 to 5,200,000 training parameters and roughly studied by varying dropouts, and number of hidden layers. The results of each model is shown in Table 1. Note that the percent error from mean is the percent error from mean yield of the test set.

Table 1. Summary of model performance in term of mean absolute error and percent error from the mean of the test set (yields from year 2016).

In the next step we want to identify in which county can we do best and in which county we did poorly. This would help identify weakness in the model as well as enabling us to make better decision to see if we can trust the prediction. Figure 8 shows the ground truth corn yields in 2016 across U.S. and Fig. 5&6 shows the percent error of the predicted value from the ground truth.

Fig. 5. The ground truth corn yields across U.S. in 2016 (test set).
Fig. 6. The percent error of the predicted value from the ground truth.

So far we have been using 38 frames per video in each county for annual yield prediction. In this last section, we want to investigate how early can we predict the yields, i.e. reducing the number of frames per year. Figure 7 shows the MAE of predicted corn yields in year 2016 using different number of frames. Note that frame 0 start in March and frame 38 is at the end of the year. The result shows that as the number of frames increases, the lower the margin error is as one would expected. Notably, we can see that by using just 20 frames (roughly 2nd week of Aug.), we can already achieve percent error as low as 14.57% (compared to 10.46% if we use the images from the entire year). This is about 2 months before corn is typically harvested in October, although this could even be later in the year in warmer states. Therefore, this model would allow user to be able to predict corn yields at county level early in the season.

Fig. 7. The mean absolute error (MAE) of the model using different number of frames input per year with using just 20 frames per year (August) is sufficient to predict with only 15% error from the mean value of yields in year 2016.

In conclusion, we have shown the correlation between different satellite images including reflectance, land temperature, and land moisture to corn yields in U.S. We leveraged these correlations to construct a model that can capture both spatial and temporal information of these data to predict corn yields in a year. The best performing model on the test set (corn yields in year 2016) is ConvLSTM with percent error from the mean yields of only 10.46%. To enable early prediction, we lower the number of frames required per year (from the maximum of 38 frames). The results show that we can still get a good model performance (percent error 14.57%) even when just 20 frames were used. The 20th frame of the year corresponds to the month of August, which is 2 months prior to when most corn is harvested. This would have strong implications on business model of agricultural distributions and related-industry.

References

  1. Rustowicz, Rose M. “Crop Classification with Multi-Temporal Satellite Imagery.”
  2. Sabini, Mark, Gili Rusak, and Brad Ross. “Understanding Satellite-Imagery-Based Crop Yield Predictions.” (2017).
  3. Xingjian, S. H. I., et al. “Convolutional LSTM network: A machine learning approach for precipitation nowcasting.” Advances in neural information processing systems. 2015.

--

--