Data Curious 07.08.2017: A roundup of data stories, datasets and visualizations from last week

Welcome to my weekly roundup of data-driven things I noticed on the web last week. This is week 16 (last week’s post is here).

Every week I clip, save and bookmark tons of cool things I find on the web that relate to telling stories with data. So here’s what caught my eye the week of July 31. In typical newsletter fashion, I’ll include a bunch of links for you to click on, save for later and then never return to again (it’s ok, we all do it). To catch next weeks post, follow me here on Medium for an update. I’m also on Twitter.


Good Reads, Analysis and Tutorials

Last week, as I scanned the web for interesting and innovative data-driven things, I felt reassured that I’m in the right field. I found tons of great content, visualizations, stories and academic insights to share with you all — so let’s dive in.

First insights from the Global Data Journalism Survey were published last week. Some key takeaways I walked away with: data journalists educated in journalism are slowly shifting closer to the fields of data analysis and data science, but without the formal training. Perhaps the order should be reversed? Data science → Journalist.

Grouped bar chart showing level of formal training in survey respondents by topic. Full post here.

The AP’s data editor, Troy Thibodeaux, talked to the Global Investigative Journalism Network on why data skills are essential for every reporter.

After news broke of Justin Gatlin beating out Usain Bolt in his last 100m race, John Burn-Murdoch tweeted out a chart he made back in 2015 that reveals Gatlin as an interesting outlier compared to his competitors. You can read his full analysis on the fastest men in the world here.

Apparently, “shrinkflation” is an actual term coined by a government agency. The International Business Times revealed through this data investigation that many food products are slowly shrinking in size, but not getting any cheaper.

Paul Bradshaw wrote a post on the 10 principles for data journalism in its second decade. The guidelines are inspired by the 2007 book The Elements of Journalism by Bill Kovach and Tom Rosenstiel. Here’s a preview of the 10 principles in a quick graphic.

Full post here

Bradshaw also has another recent write-up on the next wave of data journalism. The blog post takes a retrospective look at the early days of data journalism in the era of Computer Assisted Reporting (CAR). Moving forward, Bradshaw sees potential in the use of ‘robot journalism’, augmented journalism and computational thinking within the newsroom.

I was both surprised and relieved to see this blog post from pudding.cool on how many people actually resize their browser windows while reading a story. For one of their visual essays, they embedded a script to measure how many people resize mid-story. TL;DR: only 2–3% of users resize windows. If this holds true across other media formats, this could have huge implications for designing web graphics. Instead of focusing so much on fluid responsiveness, perhaps we should be designing data visualizations for specific devices to appear onLoad().

I really liked this Medium post on how ‘data’ has become such an overloaded term. This can be true especially in the world of content marketing and creative agencies. Key takeaway: data is not a buzzword. No matter what it is being used for, we always need to consider things like origin, application, treatment and sensitivity.

The Pudding published a text analysis of every line from The Office series: ‘The Office’ Dialogue in Five Charts.

If you’re on Twitter at all, you probably saw the flurry of outrage on the Google diversity memo that’s been circulating. There were plenty of good responses as to why the memo on how computing/software “abilities between men and women differ in part due to biological causes” is complete bullshit, but this one was my favorite (and not just because it uses data). Remember Ada Lovelace?

I was emailed a few stories last week that used some impressive data visualizations. This one from the Hindustan Times on death sentences in India starts with a brilliant use of slide-based narrative storytelling. I love how the waffle chart corresponds with the colored text above. It really sets the story in the context of the opening number.

I was reminded last week of the landmark academic essay from Stanford University researchers Edward Segel and Jeffrey Heer ‘Narrative Visualization: Telling Stories with Data’. If you haven’t read it before, I highly suggest you do so. The above Hindustan Times piece uses one of their techniques (single-frame interactivity), and the essay cites other examples of different ways to visualize data in a strategic way.

Nadieh Bremer wrote a blog post on how to use her brand new D3.js plugin to create a loom chart.

Here’s a nice tutorial on combining shapefiles in Tableau and QGIS for mapping:

If you’re looking for inspiration on how to find stories using Python, I highly recommend starting with this Medium post as an example: How I Used Python to Publish a Nice Article on My Niche Website. The post explains how the author wrote a Python script to access the Yummly API and analyze all of their smoothie recipes to find the most common ingredients for smoothies.

Here’s a brilliant (if not depressing) example of how framing survey questions can produce very different responses.

Datasets and other resources

In an independent study, Director of Marketing at data.world Ian Greenleigh published a post claiming that “78% of U.S. adults would have more trust in online news if they could easily access the data behind its claims”. You can explore the data behind his analysis for yourself on data.world here.

This dataset lists all immigrants apprehended, removed, or returned by the U.S. from 1925–2015. Questions worth considering: has enforcement increased/decreased? Is this amount normalized to account for population size? etc.

The Utrecht Data School has created a Data Ethics Decision Aid (DEDA) tool to help journalists, data analysts and policy makers to recognize ethical issues in data projects. Users can either fill out a PDF or interactive questionnaire in order to take a systematic approach to screening data for ethical problems.

The Seattle public library allows you to view every checkout of every physical item since 2005. Warning: the dataset contains over 90 million rows, but you can also access it through querying or using its API.

I discovered the website LobbyFacts.eu last week (h/t Jeremy Singer-Vine) and have mentally bookmarked it as a great place to start an investigation. This site takes public data from the European Parliament and makes it available through a public API. You can also find data on lobbying meetings (related site: IntegrityWatch.eu has data focused on these meetings).

Remember in 2014 when news broke of the MH370 Malaysian Airlines plane disappearing? Well the Australian government just released its first batch of data from their ambitious seafloor mapping exploration in an effort to find the remains. The data contains 278,000 square kilometers of seafloor topography.

August 4 was International Beer day. Have a belated celebration by cracking a cold one and diving in to this craft beer dataset.

Filipe Hoffa analyzed 3 billion Reddit comments to find the most mentioned reddit users on the web. You can read how he did it, and query the dataset for yourself on BigQuery, by checking out his Medium post.

Data visualizations

Remember all the solar eclipse visualizations from the past two weeks? Seems a bit overdone at this point. Well, maybe not entirely…you haven’t seen a #sunsquatch map yet.

Naturally, Twitter was quick to jump in on the joke. Here are a few of my other favorite spoof eclipse maps:

Data viz doesn’t always have to be live election maps and political coverage. The NYT published a nice longform piece on London’s new Crossrail scheme last week, with an interesting map choice for mobile phones:

Flipping the east/west running Crossrail vertically allowed the NYT to still use annotations on mobile screens. But the choice seemed to spark a bit of controversy among the Twitter cartography and data viz bubble.

Naturally, examples of disorienting maps followed:

My take: the NYT map flip was absolutely the right call. We should challenge viewers to rethink maps, even if it takes some directional rethinking.

And while we are on maps, I found this beautiful annotated map last week of one refugee from Yemen to Austria.

Axios produced a great interactive flow map showing exports of goods between U.S. states.

Best data viz of the week goes to: Antti Lipponen and his animated chart of temperature anomalies.

BBC Graphics showed that Sankey diagrams are still the best choice for showing how voters change their party allegiance between elections.

The Guardian introduced the rotated scatterplot to their data viz repertoire lat week in this interactive data analysis on Usain Bolt. The annotations here are really sharp, and I think that rotating the scatter plot in this case actually makes the chart’s meaning easier to understand.

The Financial Times again pushed the boundaries of chart annotation (with great success, I believe) by adding quotes to this vertical line chart of economic bubbles.

‘Despacito’ is everywhere. I mean, everywhere.

Another data viz Twitter discussion worth mentioning is this thread started by Guardian U.S. Data Editor Mona Chalabi. The full discussion is worth a look to see why these charts are terrible, but the highlights include Mona debunking studies that use incredibly small sample sizes and distort statistics in order to back-up an ideological/racial bias. Some people think of stats and data visualization as inherently truthful just because they are based on numbers. These charts are evidence that this is a dangerously inaccurate way to think about it.

That’s it from last week. Did you see anything I should have included? Or maybe you just want to give me a digital head-nod/high-five? Tweet me or leave a comment below.


If you appreciate this weekly roundup, slap a ❤️️ on it or share with your friends. I’d also love to see what you’ve been working on lately so get in touch. More data stuff coming in hot next week.