The world’s leading publication for data science, AI, and ML professionals.

Analyze Tornado Data with Python and GeoPandas

Insights from NOAA's public domain database

QUICK SUCCESS DATA SCIENCE

Reported tornadoes with an assigned EF magnitude from 1950–2023 (by the author)
Reported tornadoes with an assigned EF magnitude from 1950–2023 (by the author)

If you’re interested in studying tornadoes, the National Oceanic and Atmospheric Administration (NOAA) has a wonderful public-domain database to make your day. Extending back to 1950, it covers tornado starting and ending locations, magnitudes, injuries, fatalities, financial costs, and more.

The data is accessible through NOAA’s Storm Prediction Center. In addition to the raw data, this site supplies many charts and maps that slice and dice the data in multiple ways. One of my favorite maps is the 20-year average of severe weather watches. It makes it look like Oklahoma’s borders are mostly based on bad weather!

Oklahoma (circled) experiences more than its share of bad weather (NOAA)
Oklahoma (circled) experiences more than its share of bad weather (NOAA)

These prefab maps are useful, but they can’t address every situation. There will be times when you need to make custom maps, either to answer a specific question or to incorporate outside data.

In this Quick Success Data Science project, we’ll use Python, pygris, Geopandas, Plotly Express, and part of NOAA’s extensive database to map tornado occurrences, graph their frequencies and magnitudes, and chart the loss of life.


The Dataset

The CSV-format dataset, highlighted in yellow in the following figure, covers the period 1950–2023. Don’t bother downloading it. For convenience, I’ve provided a link in the code to access it programmatically.

From the NOAA Severe Weather Database page (by the author)
From the NOAA Severe Weather Database page (by the author)

This CSV file contains 29 columns. You can find a key to the column names here. We’ll work primarily with the tornado starting locations (slat, slon), the year (yr) and month (mo), the storm magnitudes (mag), and the number of fatalities (fat).

As noted, this data is in the public domain. Anything credited to NOAA Climate.gov can be freely re-used with proper attribution.


Installing Libraries

You’ll want to install Matplotlib, GeoPandas, Plotly Express, and Pygris. The previous links provide installation instructions.

You may also need to install nbformat for Plotly Express. You can do this with either:

pip install nbformat

or

conda install nbformat

or

conda install anaconda::nbformat

NOTE: pygris must be installed with pip. If you’re using conda with a conda environment, the best practice would be to install pygris last.


Map Code

The following code, written in JupyterLab, plots the starting location of Tornadoes across the Lower 48 states. We’ll plot the full dataset first (almost 69,000 tornadoes!) and then plot subsets by year and magnitude.

Importing Libraries

The imported libraries support the maps and charts we’ll produce in this project. The calendar module will help us convert numerical months to month name abbreviations.

We’ll use the popular matplotlib library to make static plots and geopandas to handle the geospatial data. Pandas will help with data loading and manipulation. (Because geopandas includes pandas as a dependency there’s no reason for a separate install). GeoPandas includes shapely, which we’ll use to create a geospatially aware GeoDataFrame from the CSV file.

Finally, the pygris module provides easy access and loading of US Census Bureau TIGER/Line cartographic boundary shapefiles. This lets us plot both US state and county boundaries.

# Plot all tornadoes (1950-2023) with a known EF magnitude:

import calendar
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, box
from pygris import states, counties

Loading Data and Preparing the GeoDataFrame

The NOAA CSV file is stored on a Gist for convenience. We’ll start by loading it as a pandas DataFrame and then filter out the non-contiguous US components.

# Load the CSV file into a DataFrame 
df_raw = pd.read_csv('https://bit.ly/40xJCMK')

# Filter out Alaska, Hawaii, Puerto Rico, and Virgin Islands:
df = df_raw[~df_raw['st'].isin(['AK', 'HI', 'PR', 'VI'])]

# Create a GeoDataFrame for the data points
geometry = gpd.array.from_shapely([Point(xy) for xy in zip(df['slon'], 
                                                           df['slat'])])
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

# Filter out rows where magnitude is -9 (unknown):
gdf = gdf[(gdf['mag'] != -9)]

Note that the tilde (~) in the second line negates the boolean Series. So, we’re telling pandas to give us a DataFrame without the listed states.

The primary geospatial data in the CSV file is the latitude (slat) and longitude (slon) columns. We’ll use GeoPandas and Shapely to turn these into a geometry column, which is what distinguishes GeoDataFrames from DataFrames. We’ll also assign a coordinate reference system (crs) so GeoPandas knows how to project the data onto a flat map. (For more on GeoDataFrames and projection systems, check out this article):

https://towardsdatascience.com/comparing-country-sizes-with-geopandas-2ce027282ba0

The final step filters out tornadoes with no assigned magnitude value (unknown = -9). We do this so that we can work with magnitude values later. This removes about 1,000 tornadoes from the dataset:

# Count the number of occurrences of -9 in the 'mag' column:
num_negative_nines = (df['mag'] == -9).sum()
print(f"Number of -9 values in the 'mag' column: {num_negative_nines}")
Number of -9 values in the 'mag' column: 1024

Preparing the County and State Borders

The pygris library provides easy access to official US Census shapefiles for political boundaries. The following code uses pygris and GeoPandas to prepare the map onto which we’ll plot the tornado data.

# Load the U.S. census state boundaries using pygris:
states_df = states(year=2020)  
states_gdf = gpd.GeoDataFrame(states_df)

# Load the U.S. census county boundaries using pygris:
counties_df = counties(year=2020)  #
counties_gdf = gpd.GeoDataFrame(counties_df)

# Filter for the contiguous US:
states_gdf = states_gdf[~states_gdf['STUSPS'].isin(['AK', 'HI', 'PR', 'VI'])]
counties_gdf = counties_gdf[~counties_gdf['STATEFP'].isin(
    ['02', '15', '72', '78'])]  # FIPS state codes for AK, HI, PR, VI

# Create a bounding box for the specified map bounds:
bounds_box = box(-127, 23, -67, 50)  # (min_lon, min_lat, max_lon, max_lat)

# Clip the GeoDataFrames to the bounding box:
clipped_states = gpd.clip(states_gdf, bounds_box)
clipped_counties = gpd.clip(counties_gdf, bounds_box)

We first call the pygris states and counties modules, which return DataFrames. Each of these is passed to the GeoPandas GeoDataFrame() class to make a GeoDataFrame.

Here’s an example of the first few rows of the states_gdf GeoDataFrame:

First 5 rows of the states_gdf GeoDataFrame (by the author)
First 5 rows of the states_gdf GeoDataFrame (by the author)

Next, we use the "STUSPS" column of states_gdf to filter out states and territories not in the Lower 48. Then we use the FIPS state codes in the "STSTEFP" column to filter out the counties in these excluded states from the counties_gdf GeoDataFrame.

A FIPS state code is a numeric code defined in U.S. Federal Information Processing Standard Publication ("FIPS PUB") 5–2 to identify U.S. states and other associated areas. You can find these codes here.

The last two steps create a bounding box that encompasses the contiguous US and then uses it to clip the GeoDataFrames. This produces a better-looking map, in my opinion.

Plotting the Map

Now we use Matplotlib and GeoPandas to plot the data. While GeoPandas is built on Matplotlib and can plot without it, you get more refined results using Matplotlib.

# Plot the filtered data:
fig, ax = plt.subplots(1, 1, figsize=(15, 12))

# Plot the county and state borders:
clipped_counties.plot(ax=ax, 
                      color='none', 
                      edgecolor='gainsboro', 
                      linewidth=0.5)

clipped_states.plot(ax=ax, 
                    color='none', 
                    edgecolor='blue',
                    linewidth=1)

# Plot the tornado locations:
gdf.plot(ax=ax, 
         color='maroon', 
         marker='.', 
         markersize=1, 
         alpha=0.5,
         label='Tornado Start Location')

plt.title('Reported Tornadoes with Known Magnitude 1950-2023 (NOAA)', 
          fontsize=20)
plt.xlabel('Longitude', fontsize=15)
plt.ylabel('Latitude', fontsize=15)
plt.legend()

plt.show()

This code is annotated and straightforward. One thing to note, however, is that we are only plotting the first reported location of the tornadoes (the "s" in slat and slon stands for "start"). The ending locations (elat and elon) are also included in the NOAA data, and you can use them to draw tornado tracks.

Here’s the result, showing almost 70,000 tornadoes:

The starting location of all tornadoes of known magnitude from 1950–2023 (by the author)
The starting location of all tornadoes of known magnitude from 1950–2023 (by the author)

Look carefully at this map. You should be able to identify major cities (like Houston, Dallas, and Oklahoma City) and major roadways (like I-35 between San Antonio and Dallas, and I-20 across central Mississippi and northern Louisiana). This is because it’s easier to detect tornadoes where there are more people and radar stations.

Another interesting aspect of this map is that "Tornado Alley", historically located in midwestern states like Texas, Oklahoma, Kansas, South Dakota, Iowa, and Nebraska, is expanding eastward. In fact, a new term, "Dixie Alley," is now applied to southeastern states, especially Mississippi and Alabama. To see this, we can plot the data in two 30-year tranches from 1950 to 1978 and from 1980 to 2009:

Comparison of reported tornadoes from 1950–1979 with those from 1980–2009 show an increase in reports and the formation of "Dixie Alley." (by the author)
Comparison of reported tornadoes from 1950–1979 with those from 1980–2009 show an increase in reports and the formation of "Dixie Alley." (by the author)

NOTE: The widespread use of Doppler weather radar networks, especially the NEXRAD network, greatly improved tornado prediction and detection starting around 1990.

Dixie Alley is even more apparent in this KDE map from the NOAA website:

Mean number of EF1 and stronger tornado days per year within 25 miles of a point (1986–2015) (NOAA)
Mean number of EF1 and stronger tornado days per year within 25 miles of a point (1986–2015) (NOAA)

Mapping a Single Year’s Tornado Outbreaks

Next, we’ll leverage the previous code to look at the tornadoes occurring in 2023 (the 2024 dataset isn’t finalized yet). This involves copying the original GeoDataFrame to a new one named gdf_2023 and filtering out all rows where the "yr" column is not 2023:

# 2023 tornadoes with county boundaries:

# Copy the original GDF:
gdf_2023 = gdf.copy()

# Filter for rows where the year is 2023:
gdf_2023 = gdf[(gdf['yr'] == 2023)]

# Plot the filtered data:
fig, ax = plt.subplots(1, 1, figsize=(15, 12))
clipped_counties.plot(ax=ax, 
                      color='none', 
                      edgecolor='gainsboro', 
                      linewidth=0.5)

clipped_states.plot(ax=ax, 
                    color='none', 
                    edgecolor='dimgrey', 
                    linewidth=1)

gdf_2023.plot(ax=ax, 
              color='maroon', 
              marker='v', 
              markersize=14, 
              label='Tornado Start Location')

plt.title('Reported Tornadoes with Known Magnitude 2023 (NOAA)', 
          fontsize=20)
plt.xlabel('Longitude', fontsize=15)
plt.ylabel('Latitude', fontsize=15)
plt.legend()

plt.show()

Here’s the result. Note how the inverted triangle marker style ('v') resembles little tornadoes. Never give up the opportunity for a zestful life!

Tornadoes with assigned EF magnitudes in 2023 (by the author)
Tornadoes with assigned EF magnitudes in 2023 (by the author)

The "Dixie Alley" really shows up in this plot. Oklahoma and Mississippi appear to be battling for the title of "stormiest state."

2023 tornadoes in south-central US (by the author)
2023 tornadoes in south-central US (by the author)

Also of interest are the tornadoes dotted along the Gulf of Mexico and Atlantic coastlines. These tornadoes may have been spawned from hurricanes or tropical storms.

Mapping Magnitudes

In 2007 the Enhanced Fujita Scale, or EF Scale, became operational. This scale uses estimated wind speeds and related damage to assign a rating of 0 to 5 to each tornado, with 5 being the strongest.

The following code breaks out the weakest (EF0–1) and strongest (EF2–5) storms and maps them using different colors.

# Plot EF0-1 and EF2-5 tornadoes separately:

# Ensure gdf_2023 is a copy, not a view:
gdf_2023_colors = gdf_2023.copy()

# Plot tornadoes with dynamic coloring and z-order:
fig, ax = plt.subplots(1, 1, figsize=(15, 12))

# Plot counties and states:
clipped_counties.plot(ax=ax, 
                      color='none', 
                      edgecolor='gainsboro', 
                      linewidth=0.5, 
                      zorder=1)

clipped_states.plot(ax=ax, 
                    color='none', 
                    edgecolor='dimgrey', 
                    linewidth=1, 
                    zorder=2)

# Plot magnitude 2+ tornadoes (maroon) with a lower z-order (on top):
gdf_2023_colors[gdf_2023_colors['mag'] >= 2].plot(ax=ax,
                                                  color='maroon',
                                                  marker='v',
                                                  markersize=14,
                                                  zorder=4,
                                                  label='EF 2+')

gdf_2023_colors[gdf_2023_colors['mag'] <= 1].plot(ax=ax,
                                                  color='goldenrod',
                                                  alpha=0.6,
                                                  marker='v',
                                                  markersize=14,
                                                  zorder=3,
                                                  label='EF 0-1')

# Add titles and labels:
plt.title('Tornadoes Reported in 2023 by Magnitude (NOAA)', 
          fontsize=20)
plt.xlabel('Longitude', fontsize=15)
plt.ylabel('Latitude', fontsize=15)

# Add legend with title:
plt.legend(loc='lower left', 
           title="Tornado Starting Location", 
           shadow=True, 
           fancybox=True)

plt.show()
2023 tornadoes color-coded by magnitude groups (by the author)
2023 tornadoes color-coded by magnitude groups (by the author)

Note how the more severe storms (colored maroon) are often near — or east of — the Mississippi River, rather than in the historical "Tornado Alley" in the Midwest.

Bonus: Plotting 2023 Tornadoes with Plotly Express

So far, we’ve been plotting with Matplotlib/GeoPandas. For a more interactive plot, you can use the Plotly Express library. This also obviates the need for GeoPandas, as Plotly Express works with pandas DataFrames with latitude and longitude columns.

The following code plots the location of tornadoes reported in 2023.

# Plot 2023 tornadoes with Plotly Express:

import pandas as pd
import plotly.express as px
from IPython.display import display

# Load the CSV file into a DataFrame
csv_file = "1950-2023_actual_tornadoes.csv"
df_raw = pd.read_csv(csv_file)

# Filter out rows where magnitude (F-scale) is -9 (unknown):
df = df_raw[df_raw['mag'] != -9]

# Filter for rows where the year is 2023:
df_2023 = df.query('yr == 2023').copy()

# Convert 'mag' to a string (object) column for discrete color mapping:
df_2023['mag'] = df_2023['mag'].astype(object)

# Specify a discrete color scale:
color_scale = px.colors.qualitative.Set1

# Make plot:
fig = px.scatter_geo(df_2023,
                     lat='slat',
                     lon='slon',
                     color='mag',  # Color points by magnitude
                     size=df_2023['mag'].astype(float),
                     size_max=12,
                     hover_name='st',  # state abbreviation
                     hover_data={'yr': True,  # year
                                 'mag': True,  # magnitude
                                 'fat': True},  # fatalities
                     title="Tornadoes Reported in 2023 (NOAA)",
                     scope="usa",
                     projection="albers usa",
                     labels={'mag': 'Magnitude'},
                     color_discrete_sequence=color_scale)

# Customize layout for better display in Jupyter Notebook:
fig.update_layout(geo=dict(showland=True,
                           landcolor="lightgray",
                           showlakes=True,
                           lakecolor="lightblue",
                           showcountries=True,
                           countrycolor="white"),
                  width=1200,
                  height=800,
                  autosize=True,
                  margin=dict(l=50, r=50, t=100, b=50),
                  title=dict(y=0.95,
                             x=0.5,
                             xanchor="center",
                             yanchor="top",
                             font=dict(size=18)))

# Use an upside-down triangle to resemble tornadoes:
fig.update_traces(marker=dict(symbol="triangle-down"))

# Display the plot
display(fig)

Here’s the output:

The Plotly Express figure showing the hover window in action (by the author)
The Plotly Express figure showing the hover window in action (by the author)

While I prefer the appearance of the Matplotlib-based maps, Plotly Express permits many useful features like zooming, panning, and viewing detailed point data with the hover window.


Charting Code

You can visualize the NOAA data with more than just maps. In this section, we’ll make various graphs to analyze tornado frequency, EF magnitude, and fatalities per EF magnitude.

Number of Tornadoes per Year

First, we’ll look at the number of tornadoes per year. This time, we’ll include the roughly one thousand tornadoes with no recorded EF magnitude.

# Make bar chart of number of tornadoes per year:

# Load the CSV file into a DataFrame:
df = pd.read_csv('https://bit.ly/40xJCMK')

# Filter out Alaska, Hawaii, Puerto Rico, and Virgin Islands:
df = df[~df_raw['st'].isin(['AK', 'HI', 'PR', 'VI'])]

# Group by the year column and count the number of tornadoes per year:
tornadoes_per_year = df.groupby('yr').size()

# Create the bar chart
fig, ax = plt.subplots(figsize=(15, 8))
ax.bar(tornadoes_per_year.index, tornadoes_per_year.values, 
       color='dimgrey', 
       alpha=0.8)

# Add a horizontal line with arrows from 1990 to 1995:
start_year = 1990
end_year = 1995
y_position = tornadoes_per_year.max() * 0.81

# Plot the horizontal line with arrows:
ax.annotate('', 
            xy=(end_year, y_position), 
            xytext=(start_year, y_position),
            arrowprops=dict(arrowstyle='<->', 
                            color='firebrick', 
                            lw=2))

# Add annotation text above the line:
ax.text((start_year + end_year) / 2,
        y_position + tornadoes_per_year.max() * 0.03,
        "NEXRAD Doppler Radar Deployed", 
        color='firebrick', 
        fontsize=13, 
        ha='center')

# Add labels and title:
ax.set_title('Number of Tornadoes per Year (1950-2023)', 
             fontsize=18)
ax.set_xlabel('Year', fontsize=16)
ax.set_ylabel('Number of Tornadoes', fontsize=16)
ax.grid(axis='y', linestyle='--', alpha=0.7)

plt.show()
Bar chart of the number of tornadoes per year (1950–2023) (by the author)
Bar chart of the number of tornadoes per year (1950–2023) (by the author)

The increase in the number of tornadoes starting around 1990 is partly due to the deployment of the NEXRAD weather surveillance radar network in the early-to-mid 1990s. While climate change has surely influenced the number of tornadoes, it’s important to be cognizant of sampling issues when interpreting data.

Number of Tornadoes per Month

Now, we’ll look at the monthly frequency of tornadoes over the full timeframe.

# Plot reports per month for the full dataset:

# Add a column for month abbreviations:
df.loc[:, 'month_abbr'] = gdf['mo'].apply(lambda x: calendar.month_abbr[x])

# Plot using pandas:
df['month_abbr'].value_counts().reindex(calendar.month_abbr[1:]).plot(
    kind='bar',
    color='firebrick',
    figsize=(10, 6),
    xlabel='Month',
    ylabel='Count',
    title='Reports of Tornadoes by Month (1950-2023)')

plt.xticks(rotation=45) 
plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.show()
Number of tornadoes reported per month (1950–2023) (by the author)
Number of tornadoes reported per month (1950–2023) (by the author)

As expected, tornadoes are more common in the spring, when the weather transitions from winter to summer.

NOTE: you can make charts like these directly from a GeoDataFrame by changing the df to gdf in the previous code. The syntax is the same.

Annual Counts of Tornadoes: Cumulative Plot

Here’s a question: "Have the number of tornadoes in the last 15 years increased or decreased relative to the long-term average?"

To answer this, we’ll use the NOAA data to make a cumulative plot of tornadoes per month for each year since 2008 and overlay it with the cumulative average per month for the full dataset (1950–2023).

Here’s the annotated code:

# Plot cumulative tornado counts vs. month:

# Load the CSV file into a DataFrame:
df_raw = pd.read_csv('https://bit.ly/40xJCMK')

# Filter out Alaska, Hawaii, Puerto Rico, and Virgin Islands:
df = df_raw[~df_raw['st'].isin(['AK', 'HI', 'PR', 'VI'])]

# Define function to group, pivot, and calculate cumulative sums:
def prepare_data(df, start_year=None, end_year=None):
    if start_year and end_year:
        df = df[(df['yr'] >= start_year) &amp; (df['yr'] <= end_year)]
    grouped = df.groupby(['yr', 'mo']).size().reset_index(name='tornado_count')
    pivot = grouped.pivot(index='mo', 
                          columns='yr', 
                          values='tornado_count').fillna(0)
    return pivot.cumsum()

# Calculate cumulative sums for all years and average:
cumulative_all_years = prepare_data(df)
mean_all_years = cumulative_all_years.mean(axis=1)

# Calculate cumulative sums for 2009-2023:
cumulative_counts = prepare_data(df, start_year=2009, end_year=2023)

# Plot the data:
fig, ax = plt.subplots(figsize=(12, 8))
cmap = plt.cm.tab20  # Define the colormap

# Plot each year's cumulative count:
for idx, year in enumerate(cumulative_counts.columns):
    linestyle = 'dashed' if 2009 <= year <= 2016 else 'solid'
    ax.plot(cumulative_counts.index, cumulative_counts[year], 
            label=str(year), 
            color=cmap(idx / len(cumulative_counts.columns)), 
            linestyle=linestyle, alpha=0.7)

# Plot the mean of all years:
ax.plot(mean_all_years.index, mean_all_years, 
        color='k', 
        linewidth=3, 
        label='Mean (1950-2023)')

# Define function to customize the plot:
def customize_plot(ax, title, xlabel, ylabel):
    ax.set_title(title, fontsize=16)
    ax.set_xlabel(xlabel, fontsize=14)
    ax.set_ylabel(ylabel, fontsize=14)
    ax.set_xticks(range(1, 13))
    ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                        'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
    ax.grid(axis='y', linestyle='--', alpha=0.7)

customize_plot(ax=ax, 
               title='Cumulative Tornado Counts vs. Month (2009-2023)', 
               xlabel='Month', 
               ylabel='Cumulative Tornado Count')

# Reverse the order of the legend:
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], 
          labels[::-1], 
          loc='upper left', 
          fontsize=10, 
          title="Year")

# Mirror the y-axis on the right side of the plot:
ax2 = ax.twinx()
ax2.set_yticks(ax.get_yticks())
ax2.set_ylim(ax.get_ylim())
ax2.set_yticklabels([int(label) for label in ax.get_yticks()])

# Show the plot:
plt.tight_layout()
plt.show()

And here’s the result; the older curves use dashed lines for clarity:

Cumulative tornado counts vs. month with 1950–2023 mean (by the author)
Cumulative tornado counts vs. month with 1950–2023 mean (by the author)

For most of the last 15 years, the number of tornadoes per month and year has been greater than the long-term average. However, this is partly due to the use of modern weather surveillance radar over this period. If we look at the mean for 1990–2023, when these new systems became widespread, the relationship is more reasonable:

Cumulative tornado counts vs. month with 1990–2023 mean (by the author)
Cumulative tornado counts vs. month with 1990–2023 mean (by the author)

Based on preliminary reports, 2024 will be the second most active year for tornadoes, with at least 1,777 tornadoes confirmed out of 1,880 reports. Only 2004 had more tornadoes with 1,817.

Number of Tornadoes by EF Magnitude

Now we’ll count tornadoes by EF magnitude, where 0 is the weakest and 5 is the strongest. We’ll do this for the full dataset (1950–2023).

# Calculate number of reported tornadoes by EF magnitude:
magnitude_counts = df['mag'].value_counts().sort_index()

# Chart tornadoes by magnitude:
plt.figure(figsize=(10, 6))
bars = plt.bar(magnitude_counts.index, magnitude_counts.values,
               color='maroon', 
               alpha=0.8)

# Annotate each bar with the value:
for bar in bars:
    height = bar.get_height()  # Get the height of the bar
    plt.text(bar.get_x() + bar.get_width() / 2,  # center of bar)
             height + 5,                         # Y-coordinate above bar
             f'{int(height):,}',                 # Text to display
             ha='center',                        # Center text
             va='bottom',                        # Align text below Y-coordinate
             fontsize=10,                        
             color='maroon')                     

# Customize the plot:
plt.xlabel('Tornado Magnitude (EF-scale)')
plt.ylabel('Number of Reported Tornadoes')
plt.title('Tornadoes by EF Magnitude (1950-2023)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.xticks([-9, 0, 1, 2, 3, 4, 5])

# Show the plot:
plt.tight_layout()
plt.show()
Number of tornadoes by EF magnitude (-9 = Unknown magnitude) (by the author)
Number of tornadoes by EF magnitude (-9 = Unknown magnitude) (by the author)

A magnitude value of -9 represents an unknown EF magnitude. For magnitudes 0–5, the behavior is logical. The weakest tornadoes are the most abundant, and the strongest (EF4–5) are thankfully rare.

Number of Fatalities by EF Magnitude

Here’s an interesting question: do you expect more fatalities from EF1 storms, due to their abundance, or from EF5 storms, due to their intensity?

Let’s find out:

# Calculate number of fatalities by EF magnitude:
fatalities_by_magnitude = df.groupby('mag')['fat'].sum()

# Chart fatalities by magnitude:
plt.figure(figsize=(10, 6))
plt.bar(fatalities_by_magnitude.index, fatalities_by_magnitude.values,
        color='orange', 
        alpha=0.8)
plt.xlabel('Tornado Magnitude (EF-value)')
plt.ylabel('Total Fatalities')
plt.title('Total Fatalities by Tornado EF Magnitude')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.xticks([-9, 0, 1, 2, 3, 4, 5])

plt.show()
Fatalities by EF magnitude (1950–2023) (by the author)
Fatalities by EF magnitude (1950–2023) (by the author)

Tornadoes become increasingly deadly up through EF4. EF5 tornadoes are technically deadlier than EF4 tornadoes but have killed fewer people due to their extreme rarity.

We’ll stop here but be aware that we’ve only scratched the surface. To see more tornado-related visualizations, visit the Severe Weather Maps, Graphics, and Data Page, and elsewhere on NOAA’s Storm Prediction Center website.


Tornado Alley and the Hundredth Meridian

As previously noted, the famous "Tornado Alley" has been migrating eastward. Interestingly, another famous climatic feature, the "Hundredth Meridian," has also been marching eastward. This vertical boundary traces the demarcation of the arid west from the humid east. Scientists now believe it is closer to the 98th meridian, just west of the Dallas-Fort Worth metroplex in Texas. This represents an eastward shift of about 140 miles (224 km) since it was first recognized in the late 1800s.

You can read more about it here:

Tell a Climate Story with Plotly Express


Summary

In this project, we used free NOAA data to investigate tornado statistics in the contiguous USA from 1950 to 2023. While we generated static maps and charts using Matplotlib, GeoPandas, and pandas, we also looked at Plotly Express as a dynamic mapping alternative.


Thanks!

Thanks for reading and please follow me for more Quick Success Data Science projects in the future.


Related Articles