The world’s leading publication for data science, AI, and ML professionals.

Wave Height Prediction Using ARIMA, Prophet, and XGBoost

Wave buoys that are placed all around the world track and store data of ocean properties throughout time. This time series data includes…

Testing three time series prediction methods and evaluating them using the mean absolute error and r-squared metrics.

Wave buoys that are placed all around the world track and store data of ocean properties throughout time. This time series data includes information such as wave height, wave period and wind conditions. Wave height is shown below in Figure 1 as the vertical distance from the trough of a wave to the peak of a wave. The wave period is the amount of time it takes for a wave to complete one full cycle.

Figure 1: Wave properties (Image by Author)
Figure 1: Wave properties (Image by Author)

The buoy analyzed throughout this article is located north west of the Hawaiian Islands. Historical data was collected from the beginning of 2013 through the end of 2017 with measurements being taken every hour. The data was collected from the National Data Buoy Center buoy number 51101. The data collected from this buoy includes information regarding wave height, wave direction, wave period and wind direction as well as general ocean properties such as water temperature.

The goal of this study is to use historical data in order to predict wave heights for an hour in the future, thus the wave height is our target. A baseline was determined to be a wave height prediction of 2.47 meters which is the average wave height over the first three years of data collected (2013–2015). The baseline mean absolute error was determined to be 0.84 meters for our training data (2016). Several models were constructed with the goal of outperforming the baseline model.


ARIMA

The first model which was built to predict future wave heights was an ARIMA (autoregressive integrated moving average) model. This model reduced our data frame to just contain information for the date and height of Waves. The model performed best by using the previous five wave height measurements in order to predict the target wave height, which was one measurement into the future. The output from the ARIMA model can be shown below in Figure 2.

Figure 2: Results from the ARIMA(5,1,0) model (Image by Author)
Figure 2: Results from the ARIMA(5,1,0) model (Image by Author)

The ARIMA model was fairly accurate in using the previous five data points to make predictions. To check our metrics for this model, a mean absolute error and r-squared value were calculated to be 0.20 meters and 0.86.


PROPHET

The next model which was used to make predictions was Facebooks Prophet model. This model did not perform great with making short term wave predictions as these can be very random and this model is typically used for businesses that have more seasonal patterns. This model also required a data frame that only contained the date and the wave height measurement. The test data frame used only contained the dates to avoid leakage. The results from the Prophet model can be seen in Figure 3 below.

Figure 3: Results from Facebooks Prophet model (Image by Author)
Figure 3: Results from Facebooks Prophet model (Image by Author)

As seen in Figure 3 above, there is a lot of variability in our wave data. One thing the model does well is identify seasonality where it shows an increase in wave height in the winter months and a smaller wave height prediction in the summer. Swell generating storms do not show up at exactly the same time each year, so using the historical data from the previous year to predict current trends does not perform accurately. This model had a testing mean absolute error of 0.3 meters. Although this is still better than our baseline, we would like more accuracy in our model.


XGBOOST

The final model which was used to predict wave heights is an XGBoost model. Unlike the previous two models, the XGBoost model allowed us to input many features. All of the wave properties such as wave period and wind conditions during this time were used to train the model. Three new features were also engineered prior to training which included wave height data from the prior three buoy readings. The current wave height column was removed from the data frame to avoid leakage and the model was trained. Shown below, in Figure 4, is the performance of the XGBoost model.

Figure 4: Results from the XGBoost model (Image by Author)
Figure 4: Results from the XGBoost model (Image by Author)

The XGBoost model was the best performing model based on our metrics. The mean absolute error was calculated as 0.11 meters and the r-squared value was 0.94. The baseline mean absolute error was 0.84 meters, thus this model performed far better than the baseline.


CONCLUSION

Overall, the XGBoost model performed the best based on our metrics for evaluating the models. This information could be effective for many purposes. The National Data Buoy Center has buoys deployed all around the globe constantly collecting similar ocean properties. While ocean conditions are challenging to predict, the methods shown above in combination with the large network of buoys could help make future forecasts about wave conditions. This could be useful information in tracking the location of storms. In the future, other predictions could be made for wave properties such as wave and wind direction which would help forecast ocean currents.

GitHub


Related Articles