Make Your Data Science Projects Meaningful

(also, a guide to geoscatter plots)

mattmecoli
8 min readAug 30, 2018

Making Projects Meaningful

I’m going to talk briefly about the value of pro bono to a profession both internally and externally. In doing so, I’ll focus on a project involving building a geo-scatter plot, so if that’s what you’re here for, stay tuned.

Source.

Anyway, I come from a background in law before joining the tech work. In the legal field, pro bono is service is considered fundamental. While not usually required, it is strongly encouraged and many attorneys do participate.

The American Bar Association’s Model Rule 6.1 even states that “a lawyer should aspire to render at least (50) hours of pro bono publico legal services per year.” Most large law firms have similar requirements, with good reason.

Using your professional skills both improves your skills by giving you a chance to work on a variety of projects you likely wouldn’t otherwise tackle (thereby improving the profession internally) and provides the value of your profession to those who might not otherwise be able to afford it (thereby improving the profession externally).

Specially, I would point out several advantages to tackling a real world project for a nonprofit over artificially constructing a project:

  • Your build value: At the end of the day, you don’t just have a nice repo for your GitHub. Your work will be used. It has value as more than just a demonstrating of coding skill.
  • You build contacts: There’s no better form of networking that building relationships with people who have first experience with your passion, skill, and work ethic.
  • You build experience: The problems you’ll encounter in the real world do not have tailored solutions. You’ll be significantly more challenged by what a real nonprofit asks of you then what a cleaned dataset does.
  • You build skills: Not only the hard skills like the actual programming techniques and libraries you used, but also the soft skills of managing a project, communicating with a team, and understanding a boss or client’s vision.
  • You build a better world: Cheesy? Yes. True? Also yes.

Groups like DataKind can connect you with existing need or existing projects, but you can always reach out on your own. My classmate Paulina Zheng also wrote a fantastic blog post on this exact topic here, so I won’t cover it much more in depth.

Much of tech’s ‘pro bono’ so to speak takes the forms of contributing to open source projects, and I applaud that. I offer this method as an alternative for those interested in using their skills to do some good.

Onward we go!

By far the best Google Image search result for “onward.” Source.

The Project

One of the best ways to find a project is to look for one. When I (quickly) got tired of examining the sepal lengths of irises or working on contrived academic exercises that would likely never see a world outside of my GitHub page, I decided to seek out my own project with a worthy non-profit. If I was going to build something, it should be useful.

I reached out to a few friends at various nonprofits. One of those friends was a former law professor of mine who works at the National Law Center on Homelessness & Poverty (NLCHP). Did he have a project he needed help on? The answer was a resounding “Yes! …Two actually.”

The first project, and the one this blog post covers, involved the NLCHP’s Panhandling Campaign. The NLCHP coordinated with about two dozen other nonprofits for a coordinated push to overturn anti-panhandling laws that, in essence, criminalize homelessness. The campaign takes advantage of new favorable Supreme Court precedent that has seen 25 of 25 challenged anti-panhandling ordinances (laws) struck down as unconstitutional to encourage municipalities to voluntarily repeal these ordinances.

At the time of this writing, this campaign is still active and you can find the link (and a link to the map I made) below:

As I parenthetically mentioned above, I made a map (sort of). More precisely, I made a geo-scatter plot. I’ll cover what this is under “The Tool” section and how I did it under “The Code” section.

Essentially, the NLCHP wanted a map that made it easy to visualize the cities they were targeting in the campaign and that would provide basic information on whether their efforts had been successful. To that end, the cities were mapped out with some hover text information. Additionally, the color of the markers would be updated to indicate the status of the law: active with no response enforced (red), active with response indicating no immediate change (pink), active with commitment to review (orange), halted with commitment to review (yellow),and repealed (green).

They also wanted media covering the campaign to be able to easily utilize the map. If the NLCHP receives media requests that suggest a desire for a certain type of information (like ordinance citation and name), we’ll update the values of all cities on the map with that information.

The Tool

For this project I used the Plotly library for Python.

For your reference, docs are here. More docs on layout are here. Sample code is here. An additional example is here.

Geo-scatter plots tend to be much simpler and more accessible, although less powerful, than a GIS, or Geographical Information System programs.

The Map

Here is the map itself. I’ll explain a bit about how we got here in the code.

Active map can be found here if you want to see it in action: https://nlchp.org/images/panhandling_cities

Each city is plotted by latitude and longitude (generated thanks to the excellent service Geocodio). The current status is displayed and will appear on mouseover. When I update the status, the text and color change accordingly. The legend categories can be toggled on and off so you could display just those cities that have repealed, for instance. It’s a fairly simple map, but an interesting one.

The Code

Here’s the full code. If you’re not comfortable with Plotly, don’t panic, we’ll break it down a bit.

Plotly and Dash are “leading open source software for Web-based data visualization and analytical apps.” This means they allow you to build beautiful data visualizations and analytical tools (respectively) that are readily shareable, primarily through their ability to generate web-ready html.

In the above code, data is our actual information and layout is how we want to present it. Each “type” of plot or graph on Plotly has different attributes you can set. Here, we’re plotting our points (using lat/lon) as markers (using all the marker attributes like opacity of the marker and color of the marker) with text that hovers over them. The text is generated right from the data as below:

Pulling city and state from the dateframe. Status text will update as we update the status of each city.

Additionally, since we want to show the success of this campaign, we want to be able to update the status of a city’s ordinance, as I mentioned earlier. To do that, we do the following:

Above you can see an example of a row of the data. I then create a “status” column and initialize it with 0 values. All of the ordinances are currently in effect. For both the text I want to display and the color I want the marker to be, I create a dictionary and then “map” those dictionary values to the status 0,1,2 and 3. The outcome is the bottom row with the newly added columns.

This way, when I update my underlying data (say Aurora repeals their law, making their status a “4”), the text and color associated with that marker will automatically change to reflect that when I re-generate the html.

If you look back at the original code, you’ll notice that the marker color attribute is set to equal the value (the rgba code) in that column of that row of the DataFrame. Nifty.

The best way to learn how these maps work is to build one (feel free to grab the code from my repo and make it your own, its an MIT license) and play around with the attributes to see what happens. Rather than generating html, you can have the map display in line in Jupyter (the notebook in the repo explains how; it’s very easy).

The key takeaway here is that not only did I get a chance to learn a lot about how to construct a plot like this (gained a skill), but this plot provided a concrete benefit to the NLCHP instead of sitting stagnant in my GitHub.

An Appeal To Scrappiness

One final note. In the above map, you’ll notice I’ve added something that appear to be a text box where it says “Note:…”. This is, interestingly, not a text box. I could not get a text box to show up the way I wanted it to for the life of me, so I switched all my rgb values to rgba values (the a lets you set the ‘opacity/transparency’ of the color where 0.0 is no color and 1.0 is full color). I then created an invisible ‘phantom’ trace with only one point that had its opacity set to 0. There’s a point on that map that you can’t see and it only exists for one purpose:

To let me set text on the side where the “legend” value for that invisible trace would be to the ‘subtext’ that I wanted.

Point being, don’t be afraid to be scrappy and find a way to make things work the way you want them to.

Plotly is also available for R and JavaScript, in addition to Python

The full project code and resources can be found on my GitHub:

Conclusion

If you’re looking to build new skills, I encourage you to reach out to your network or to a nonprofit you particularly like and see if they have any use for your skills. Not only can you end up with a great project for yourself, you can end up with a meaningful and impactful project that resonates well outside the annals of your GitHub repos.

Additional Resources

Working with geographic data in Matplotlib using built-in Basemap:

--

--

mattmecoli

Data nerd. Science geek. Data science grad from Flatiron School.