The world’s leading publication for data science, AI, and ML professionals.

5 Data Plots I Made That Are Completely Useless

Your analysis is supposed to reveal something new. It should reveal an insight into a trend, or an abnormal pattern. Employers for data…

Using the cartography package in R to create plots without a purpose

Your analysis is supposed to reveal something new. It should reveal an insight into a trend, or an abnormal pattern. Employers for data science jobs like to see that your analysis is useful and impactful.

But that’s boring. Instead, we can make graphs that are completely, totally useless, which is what I spent the last couple of hours doing.

The cartography package in R helps you turn your data into eye-catching maps. For example, R comes with a dataset of SIDS death in North Carolina. Using a shapefile, which gives geometric data on North Carolina’s county borders, you can create a map of the counties with the highest SIDS deaths, the lowest proportions of SIDS deaths, and so on.

Maps like those are useful. I’m about to do the opposite, and show you a bunch of graphs that I made using the R cartography package that are useless.

1. California counties by number of letters in the county’s name

Getting a shapefile for California (my home state) is very easy. It’s on the California government website, along with some useful datasets on everything from crime to COVID. But we’re not interested in those datasets right now. Thankfully, we can use this shapefile to make useless plots as well, such as a graph of counties colored by the length of the county’s name. All we need is to make a new column for name length:

# add new column based on county name length
library(stringi)
ca$NAME_LENGTH <- stri_length(ca$NAME) - stri_count_fixed(ca$NAME, " ")
#using stri_length and stri_count_fixed removes spaces from the count. For example, "Santa Barbara" is counted as 12 letters instead of 13.

From there, I used ggplot to visualize the data. The geom_sf graphic was added to fill in the counties, and the theme_map graphic from the ggthemes package was added to make our map a little nicer to look at:

ggplot(data=ca) +
  geom_sf(aes(fill=NAME_LENGTH), size=.5) +
  scale_fill_gradient(low="#98BAFF", high="#354057") +
  labs(title="california counties by number of nletters in the county's name", fill="length") +
  theme_map() +
  theme(panel.background = element_rect(fill = '#F0F0F0', colour = 'white'))

Now we have our plot, and we can see that San Bernadino county is killing it, while Yolo county needs to pick up the slack on its name length.

2. California counties by number of San Diego Zoos

Finally, we have an answer. For decades, we Californians have wanted to know which county has the most San Diego Zoos, and it turns out the answer is San Diego county, with one San Diego Zoo.

For this graph I made a new column called SD_ZOOS, gave every county a value of 0, and then manually changed San Diego county’s value to 1:

#initialize all SD_ZOOS values to 0
ca$SD_ZOOS <- 0
#set SD_ZOOS value to 1 for san diego county
ca <- ca %>%
 transform(SD_ZOOS = if_else(NAME=="San Diego", 1,0)))

As far as I am aware, this is the first time data on San Diego Zoo locations has been visualized, and hopefully if you’re thinking of travelling to California, this will help you figure out where to find your nearest San Diego Zoo.

3. California counties by number of Olive Garden restaurants

This plot took the longest to make, because I couldn’t find a raw dataset on all the Olive Garden locations in California. I found the numbers for each county manually by going on one of those "find the nearest Olive Garden near you!" websites and tracking the cities of all 72 California locations myself. I then added up the number of Olive Gardens per county, and exported the results as a csv file so it was a little easier to add manually to the data frame in R. A new column called OG was created, and the plot was created.

Still, it was a tough decision putting this plot in here, because it’s almost useful. If anything, we now know that Los Angeles county is brimming with breadsticks (it has 17 Olive Gardens). And by inserting the command sum(ca$OG==0) we find that there are 32 California counties without a single Olive Garden.

4. California Counties by number of Olive Gardens plus number of letters in the county’s name

Since the Olive Garden graph was a little too impactful, I decided to make it useless by adding in arbitrary math. Thankfully, R lets you create new columns based on some simple arithmetic. So if you have a column for Olive Garden locations (ca$OG) and a column for name length (ca$NAME_LENGTH) you can do this:

ca$OG_LENGTH <- ca$OG + ca$NAME_LENGTH

Los Angeles county is still strong from its 17 Olive Garden locations, but the impact of the name "San Bernadino" is nothing to sleep on.

5. California counties where each county is given a random color except Alameda County’s color is from a completely different palette

Quite simply, there is no reason this data plot would ever be needed. This plot is absolutely useless. It’s exactly what I wanted.

I found the index of Alameda County in my data table using the which command:

which(ca$NAME=="Alameda")

R comes with some built-in color palettes which allow you to make randomized color schemes. I used the heat palette, reassigned the color for Alameda County, and plotted accordingly:

cols <- heat.colors(58, a=1)
# reassign alameda county's color
cols[42] <- "#0D26FF"
#plot
ggplot(data=ca) +
  geom_sf(fill=cols) +
  labs(title="california counties where each county is assigned a randomncolor except alameda county's color is from a completely ndifferent palette") +
  theme_map() +
  theme(panel.background = element_rect(fill = '#F0F0F0', colour =   'white'))

This graph purposely makes Alameda County look out of place. There is no reason why you would ever need to do this. This graph serves no purpose.

I am grateful if this post brightens your day, or even helps you learn a little bit about the cartography package, although I sincerely hope that you use your own Data Science skills to make graphs more useful than these. Perhaps you can even find a reason why I would need to make Alameda County blue. I can certainly say that would have an impact on me. And who knows, maybe one day someone will need to know where all the Olive Gardens are. We’ve got them covered.


Related Articles