Visualizing the 2017 wildfire season

Jared Whalen
Towards Data Science
3 min readApr 23, 2019

--

2017 was a devastating year for wildfires in the United States, especially in California that had more than 9,000 fires burning over 1.3 million acres.

Big Local News, a project of the Stanford Journalism and Democracy Initiative, developed a database that tracks large wildfires (100+ acres) managed by Federal authorities. According to that data, more than 23,000 wildfires occurred in the U.S. in 2017, up 21 percent from the year before and nearly 80 percent from 2014, the year the data starts.

While the database isn’t a comprehensive look at wildfires in the country, it illustrates their increase, which experts have attributed to global warming.

Data from Big Local News / Chart by Jared Whalen.

Methodology

Starting with the Big Local News dataset (The Costs of U.S. Wildfires (2014–2017), I counted records by unique identifier and grouped by date. Records without an incident date were removed.

Tools: R (dplyr, lubridate, ggplot), Illustrator

Process

I came across this dataset through the Data is Plural newsletter, a weekly collection of datasets by Jeremy Singer-Vine of Buzzfeed. After poking around, it looked like something that would be interesting to chart.

Going with the topic of wildfires, I wanted to pick a chart type that visually looked like flames. This reminded me of Nadieh Bremer’s beautiful visualization in her piece The Baby Spike, which used a radial area chart and vivid color coloring. Heavily inspired by Bremer’s piece, I wanted to incorporate a radial area chart design that uses the average as its baseline.

As for the data, I did all the wrangling and analysis in R. My main bit of code simply whittled down the massive dataset and summed by date.

library(tidyverse)
library(lubridate)
# identify unique records
fireData_unique <- fireData %>%
group_by(INC_IDENTIFIER) %>%
filter(n() == 1) %>%
ungroup()
# make field selections and convert dates
fireData_sel <- fireData_unique %>%
select(INCIDENT_NAME,
DISCOVERY_DATE) %>%
mutate(day = yday(ymd_hms(DISCOVERY_DATE)),
week = week(ymd_hms(DISCOVERY_DATE)),
year = year(ymd_hms(DISCOVERY_DATE))
) %>%
# remove records with missing or erroneous dates
filter(
!year %in% c(“2011”, NA)
) %>%
# get count by day
group_by(day, year) %>%
summarise(count = n())
# create average df
fireData_avg <- fireData_sel %>%
group_by(day) %>%
summarise(mean = mean(count))

While I did a decent amount of work on this in Illustrator, most of the heavy lifting came from the following code using ggplot.

# function to shift baseline by mean
shiftBase = function(x) {
x — mean(fireData_avg$mean)
}
# Make the plot
ggplot() +
geom_area(data=filter(fireData_sel, year==2017), aes(x=day, y=count-mean(fireData_avg$mean)), fill=”#FFBF3F”, alpha=1) +
facet_grid(~year) +
geom_area(data=fireData_avg, aes(x=day, y=mean-mean(fireData_avg$mean)), fill=”#547C8E”, alpha=0.2) +
theme_minimal() +
geom_hline(yintercept=0, color=”#FF9843") +
coord_polar(start = 0.1) +
scale_y_continuous(
breaks=shiftBase(c(0,100,200,300)),
labels = function(x){
round(x+mean(fireData_avg$mean),-2)
},
limits=c(-150,max(fireData_sel$count))
)

Standalone chart

--

--

data journalist + developer at The Delaware News Journal / prev: Inquirer, Billy Penn / R, JS + D3, QGIS / Temple alum / USAR officer