
Have you ever checked the weather forecast for the weekend and made plans, only to be surprised by rain? Then you may have wondered about the reliability of weather forecasts.
Predicting the weather is hard. The atmosphere is a chaotic system with many unknowns. Traditional weather models require complex numerical methods to predict the future state of the atmosphere.
Over the past five decades, meteorological services have built up a weather archive of reanalysis data. Driven by satellite observations, ground-based weather stations, and numerical weather prediction, this archive represents the past weather to the best of our knowledge.
The AI revolution in weather forecasting
Five years ago, the AI revolution in weather prediction began. Using reanalysis weather archives from the past 50 years, researchers developed AI models to forecast weather.
These models learn weather patterns from training data. In inference, they take the current state of the atmosphere and output the future state. This can be applied iteratively, allowing the model to forecast for several days.

AI weather models typically operate at a spatial resolution of 25 kilometers, which is the spatial resolution of the training data. In contrast, high-resolution numerical weather models operate at a spatial resolution of 9 kilometers.
For medium-range AI weather forecasts, up to 10 days ahead, the typical temporal resolution is six hours.
Current AI weather models
The last two years have seen an explosion of AI weather models. The WeatherBench project comprehensively evaluates them. We focus here on those models that have been evaluated in an operational setting, i.e., initialized with the same data as a numerical weather forecast.
- PanguWeather is developed by Huawei. It uses a 3D Earth-specific transformer architecture.
- GraphCast is developed by Google DeepMind. It is based on a graph neural network (GNN), using an encoder – processor – decoder architecture.
WeatherBench lists a number of models, such as NVIDIA’s FourCastNet, Ryan Keisler’s Graph Network, and the cascade-like FuXi model, that have been initialized with reanalysis data. Because these forecasts would be available with a five-day delay, they are not yet ready for operational use.
Metrics for weather forecasts
As Machine Learning experts and data scientists, we are used to metrics such as accuracy, precision, or recall. Machine learning expert Andrew Ng recommends using a single metric to evaluate performance.
But weather requires more than one metric. Different users are interested in different quantities. Farmers want to know how much rain to expect in a 24-hour period, tourists are interested in surface temperature, and solar energy forecasts rely on cloud cover.
Weather is evaluated using scorecards. The panel shows a scorecard from the WeatherBench 2 project, focusing on important surface weather variables that affect our daily lives.

Different meteorological quantities are compared by root mean squared error (RMSE) to the "ground truth," which includes actual weather observations as well as a weather model.
The current gold standard is the ECMWF high resolution forecast, the IFS HRES. The RMSE is calculated for this reference model as well as for the AI weather models. The color of the scorecard indicates whether a model performed better (blue) or worse (red) than the reference.
Focusing on the operational GraphCast model, we observe that it predicts temperature, surface pressure, and near-surface wind speed better than the reference model. Only for short-term precipitation does IFS HRES outperform GraphCast.
In addition to surface variables, meteorologists like to compare weather models at higher levels in the atmosphere. Getting them right is especially important for medium-range forecasts out to 15 days.
The panel shows geopotential at an atmospheric pressure level of 500 hPa (about 5.5 kilometers above ground). GraphCast consistently outperforms the IFS high-resolution model.

Users can try and compare different models directly on the WeatherBench project website.
Extreme events
One area where AI weather models perform surprisingly well is in tracking typhoons. Predicting where a typhoon will make landfall is difficult, and early warning is critical for planning evacuations of coastal areas.
PanguWeather was the first AI weather model to claim superior performance to numerical weather prediction. The panel shows two typhoon tracks, and we observe that the typhoon track predicted by PanguWeather matches the ground truth well, while the high-resolution forecast deviates.

On the other hand, deterministic AI weather models may not be able to capture some extreme events that occur with low probability. Traditional weather forecasts are run in an ensemble of up to 50 members, which allows for a probabilistic model of future weather conditions.
Cost of AI weather models
Traditional, numerical weather prediction requires massive amounts of computing time. The European Center for Medium-Range Weather Forecasting (ECMWF) operates supercomputers in Reading, UK and Bologna, Italy, to produce high-resolution forecasts four times a day.
In contrast, inference with an AI weather model takes less than a minute on a single TPU or GPU. This reduces the energy required for a forecast by a factor of 12,000.
You can even create your own AI weather forecast by following my tutorial:
While the cost of inference is small, we should not neglect the computational resources required to generate the training data and train the AI weather model.
The cost of generating the training data is hard to quantify. One would have to add up all the energy that has gone into producing reanalysis data for the last few decades. To be fair, this dataset has already been generated and is simply reused for training.
PanguWeather [was trained](https://www.science.org/doi/10.1126/science.adi2336) for 16 days on 192 NVIDIA Tesla-V100 GPUs. GraphCast was trained for 28 days on a cluster of 32 Cloud TPU v4 devices. The price for this would be about US-$ 38,000 according to the Google Cloud calculator.
What does this mean for the future of numerical weather prediction?
AI weather models outperform numerical weather prediction on a number of metrics. However, there are good reasons to invest in numerical weather prediction:
- High-resolution numerical weather forecasts are the foundation of AI weather models. They provide training data and initialization.
- Meteorologists work hard to understand the Science of weather better. In the past, this has improved weather prediction.
- Climate change affects the weather patterns that were learned by the AI weather models. Numerical weather prediction is less affected by this data drift.
- Numerical Weather prediction provides better spatial and temporal resolution than AI weather models.
- Localized extreme events, such as the severe flooding in Western Europe in 2021, could be washed out in AI weather models that were trained with mean squared error loss.
Future directions for AI weather models
Now that a number of labs are working on their own AI weather model, we can expect to see improvements in the near future.
As high-resolution training data becomes available, it will be a technical challenge to train a new generation of AI weather models. Increasing the resolution by a factor of 2, from 25 kilometers to 12.5 kilometers, means that eight times as many data points must be ingested during training.
Current AI weather models offer fewer meteorological variables than traditional Weather Forecasts. For example, GraphCast does not predict cloud cover, even though it is contained in reanalysis data. This variable would be useful for predicting photovoltaic power.
Already today, weather services run AI weather models on a daily basis. The models are continuously being evaluated and compared with established methods. On the ECMWF homepage, users can view forecasts generated by a variety of AI weather models.
Summary
AI weather models are now competitive with numerical weather models. They can make use of the extensive weather archives of the last 50 years and produce reliable forecasts.
It is likely that AI weather models will be integrated into operational weather forecasting in the near future. They are being closely monitored and further improved.
This does not mean that traditional numerical weather prediction is dead – accurate physical models of the atmosphere are critical and cannot be replaced by AI.
Finally, cheap and accessible AI weather models, many of which are open source, can democratize access to weather forecasting.

Links
- WeatherBench project: https://sites.research.google/weatherbench/ [Github] [arxiv]
- GraphCast: Lam et al, Learning skillful medium-range global weather forecasting, Science 382, 1416–1421 (2023). [free preprint]
- PanguWeather: Bi et al, Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).