Data Curious 22.05.2017: A roundup of data stories, datasets and visualisations from last week

Welcome back to my weekly roundup of data-driven things I noticed on the web last week. This is week 6 (last week’s post is here, and special thanks to Towards Data Science for publishing the post!).

Every week I clip, save and bookmark tons of cool things I find on the web that relate to telling stories with data. So here’s what caught my eye the week of May 8. In typical newsletter fashion, I’ll include a bunch of links for you to click on, save for later and then never return to again (it’s ok, we all do it). To catch next weeks post, follow me here on Medium for an update. I’m also on Twitter.


Good Reads, Analysis and Tutorials

I found this brilliant article called “Design Better Data Tables” in my Medium Daily Digest last week. If companies (and governments especially) followed some of these guidelines, the job of a data journalist and/or scientist would be so much easier. As the author writes: “data is meaningless without the ability to visualize and act upon it.” Must read this week.

In UK-based data news, Britain is set to fall to 3rd place in global open data rankings, behind Australia and Taiwan. The National Association of Estate Agents recently said that data on UK air quality may soon be a mandatory part of property adverts. And finally, this data analysis by Buzzfeed shows exactly why most people in the UK don’t bother entering the housing market. The charts speak for themselves, but I also thought that a bit more creativity could have been put into visualising this data than a series of the same bar chart over and over.

Here’s a really interesting debate-starter. This data analysis by Bloomberg brings to mind lots of questions from a single hypothesis: “Amazon Doesn’t Consider the Race of Its Customers. Should It?”. But I think a good data analysis sometimes brings up more questions than it answers. The maps in the piece are interactive, and allow the user to filter between races in major cities with Amazon one-day delivery. Spoiler: the areas with one-day delivery are almost entirely populated by white people. But back to the additional questions: is Amazon targeting areas of wealth or race? Or both? Lots of great starting points in this article, but would be interesting to do some more analysis.

From Bloomberg analysis piece.

Here’s an interesting post from the band Pomplamoose describing how much money they made (and lost) on a 28 day tour. It’s a great peek into the data behind touring as an indie band and could be a start to some insights behind the economics of being a touring musician.

I found this Quora thread on the best resources for data scientists to be interesting. It’s a mix of tools and advice, but most answers included some sort of recommendation to work on practical projects rather than theoretical exercises.

Mike Bostock just released d3.express, something he is calling an “integrated discovery environment”. I haven’t managed to dive into all the docs yet, but since it’s coming from one of the godfathers of data vis, I’m sure it won’t disappoint!

This is a great data analysis piece from FiveThirtyEight on life expectancy in the US by county. Since 1980, most states in the US have raised overall life expectancy levels. But at the county level, it becomes clear certain areas are going against this trend. I also really liked the US-map-shaped line charts as a style of data vis presentation.

FiveThirtyEight

Open Data Institute Leeds published a blog post promoting the use of hexagon-based maps for visualising election data. It’s a great introduction to the tool and it’s also quite timely (t-minus 16 days to UK General Election!).

Lots of people have been talking about this NYT Upshot article showing how many people can’t find North Korea on the map. But even more importantly, surveys found a key difference between those who could and could not find the country geographically: the 36% of Americans that could successfully locate North Korea were “much more likely to disagree with the proposition that the United States should do nothing about North Korea”. Geography is important (although, arguably so are other things that come along with common geography knowledge, so let’s not get into the correlation/causation debacle just now).

How happy are people today? This recent Happiness and Life Satisfaction report from Our World in Data seeks to find out. The full report includes lots of quick charts and maps, plus you can download all the raw data yourself if you like.

Did you know there’s a new zine devoted entirely to data visualisation? Pretty cool.

Datasets and other resources

There was some really interesting data sources coming out last week. Let’s start chronologically in order of appearance.

EU exports to the rest of the world are up 13% compared to the same time lats year, at 202.3 billion euros. This means a €30.9 bn surplus in trade goods. The Eurostat news release contains more historical data for international trade in PDF format.

Have you been reading about the #WannaCry ransomware at all? This may not seem like a dataset at first, but there’s a Twitter bot watching the bitcoin wallets tied to the #WannaCry ransomware attack and tweeting the transactions. Really, it’s begging to be mined and analysed. A good start might be scraping the account using Python, or trying a quick Google Sheets add-on tool like TAGS or Twitter Archiver.

Data.world hosts a dataset of UNESCO languages by degree of how endangered (or just extinct) they have become. Did you know that in the UK and former British colonies, 906 endangered languages are spoken?

Bird watchers will be interested in this dataset of bird species in the UK since 1970.

Google is hosting a dataset of doodles from QuickDraw that can be downloaded below.

On a less light-hearted note, this database of missile tests from North Korea would make for some interesting mapping. The data includes the missile launch site, highest altitude, distance traveled, landing location, success/failure of launch and more.

Here’s a dataset of Global Food Prices from the UN World Food Programme. It contains data on food prices from 1,000 towns in 70 different countries, and is updated monthly.

Climate hawks, this is a dataset and report with historical data on global sea levels. You can download the data on this page or preview it on a map here.

If you’re interested in analysing gun violence, this bombshell of a database uncovered by the Chicago Sun-Times shows nearly 400,000 people scored from 10 to 500 on how likely (or not) they are to be involved in gun violence.

Did anyone else know about the WikiPlots corpus database? It contains a one sentence summary of over 120,000 Wikipedia entries that contain “plot” in a subheading. That means movies, books, plays, TV shows — you name it. The dataset requires some language processing and analysis, but here’s a good blog post from David Robinson displaying how it can be explored.

I couldn’t make it to #EIJC2017 this year (European Investigative Journalism Conference and Dataharvest), but I managed to shore up some resources on Twitter from the talks over the week. Here are two data-tool highlights: a presentation of “Data tools for everyday journalism” form Maarten Lambrechts and a tool called Map Stack to design better looking maps.

Did you know there’s a tool to search and download Wikipedia pageviews? I stumbled across this handy tool after an editor from Motherboard tweeted a screenshot of the increase in searches for impeachment-related Wikipedia pages recently.

Data visualisations

Not quite as much data vis to speak of from last week. Apparently I had my head buried to far in the backend of data, but that’s not a bad thing I suppose. Here are a few things that caught my eye.

I liked this graphic from the Washington Post showing how the U.S. could leave the Paris climate agreement. As the second highest producer of CO2 emissions, America would be joining Syria and Nicaragua as countries outside the EU not following the regulation in the climate agreement.

Shout out to Elijah Meeks for this lightbulb data vis moment on Twitter:

Here’s an animated pie chart FTW. Last week I wrote a bit about how pie charts get a bad rap, but I think showing change over time with an animated pie chart is a really smart choice.

I love this Tableau dashboard of Beer in Europe. Using a tap for the top of the columns is a brilliant design choice, and the filtering features let you explore the data with ease (you can also download the raw data by clicking the “Download” button at the bottom!).

Solar jobs are beating coal, ICYMI.

OK this data vis isn’t at all impressive, but I got a kick out of the message nonetheless:

Here’s a nice Tableau interactive of the classic poverty bubble chart, but with a twist: instead of global poverty measured against country GDP, it compares US county-level with levels of internet-access.

That’s it from last week. Did you see something I missed? Or maybe you just want to give me a digital head-nod/high-five? Tweet me or leave a comment below. More data stuff coming in hot next week.