Community Spotlight

Adapting Data Science Tools for Social Impact in Philanthropy

Data Science at the Rockefeller Foundation

Published in

Towards Data Science

8 min readSep 8, 2021

In the Community Spotlight series, TDS Editors chat with members of the data science community about the exciting initiatives that help push the field forward. Today, we’re thrilled to share Elliot Gunn’s conversation with Madeline Lisaius, Lead Data Scientist, and the Statistics and Machine Learning team at the Rockefeller Foundation, a philanthropic organization founded in 1913 that works to solve global challenges related to health, food, power, and economy mobility.

What does data science look like at the Rockefeller Foundation? What does a typical day look like for the team?

Data has actually been at the core of the Rockefeller Foundation’s approach to realizing impact since its founding in 1913. Taking a hypothesis- and data-driven approach was called “scientific philanthropy” at that time. The Rockefeller Foundation Statistics and Machine Learning team was originally envisioned to take advantage of the newest analytical techniques to enhance our data-driven approach to our existing philanthropic activities while adding data science as a consultation service or tool that we could offer alongside monetary resources.

Building on that vision, the team now sits under the broad umbrella of the Innovation Team and is only about two years old. While that might sound like a while, the reality is that we are still figuring out what we want data science in philanthropy to look like for the Foundation — having a stable source of funding and a mission to improve the well-being of humanity creates an incredibly privileged space to test, fail, and imagine in our work.

Today, the team is made up of three full-time data scientists as well as a group of consultants who support us; we do everything from supporting data-related inquiries at the Foundation to leading independent projects that are aligned with the work of our initiatives.

Because our work can vary so much, there is no typical day — I’ve had days crunching with consultants to get materials ready for an Initiative’s project launch, but also days without a single meeting where I am debugging my model. Currently, our tech stack includes Domino as well as specific platforms for niche work — for example, I use Google Earth Engine primarily for my remote sensing work.

How does the team select and proceed with projects to work on?

There are two main avenues through which we identify projects: through a proposal or request from an internal initiative or through an idea from within our own team. After discussing a potential project internally and with trusted partners, and if the project is approved, we move forward with a projected timeline and budget. From there, every project is very different.

For projects that we are pitched on or asked to help on, we usually jump in at a phase where we know the work is a reasonably good idea and we pass the results off immediately — these projects might even take less than a day.

For the projects that we envision and develop ourselves, the pitching and developing and re-developing can take a few months to a year. Since everyone associated with the team juggles many roles, being such a small team serving such a big organization, these workflows and timelines include a lot of unrelated work and figuring-it-out as well.

Image courtesy of the Rockefeller Foundation

Could you tell us about the different kinds of data science initiatives or projects at the Rockefeller Foundation?

I am extremely proud of the hand-harvested crop mapping project that I recently published about on Towards Data Science. Last spring, as Covid-19 reached the U.S., it became clear that there were increasing barriers for farm workers and, by extent, a coming exacerbated labor shortage. I launched the hand-harvested crop mapping project to try to detect, at a 10m by 10m plot level, every single hand-harvested crop in the Central Valley of California to try to track how and where producers might change their growing decisions due to the labor shortage and other stressors.

The Rockefeller Foundation thinks a lot about access to protective foods — foods that help prevent diet-related disease — and understanding shifts in production of produce is essential to that conversation. The largest barrier and opportunity space was not yet having any Foundation platform or policy around remote sensing — I had the chance to shape our approach to remote sensing, our values, priorities and goals in remote sensing work, and also lead our tech stack development. Aside from the challenges and joaveys of all this organizational development work, it was incredible to see new, simpler, methods be successful for an under-addressed task. There are very few organizations that would create an environment for the amount of experimentation that was needed to produce the hand-harvested crop mapping model and it was such a privilege to get to bring our original vision and hope to fruition.

Other projects from our team include mapping areas of economic opportunity in New York using satellite imagery, creating indices to capture the diverse challenges to wellbeing in U.S. cities (in submission as a separate blog to TDS), and supporting a model to predict the locations of lead-containing pipes in cities.

What challenges has the team faced?

One of the greatest challenges that the Statistics and Machine Learning team faces is that there aren’t models of applied statistics or data science teams for us to draw from as we create our frameworks and approaches to the work. It’s very exciting to get to pull from the best parts of public, private, and academic realms to create a new way of thinking about and doing data science but it takes time and intentionality.

Secondarily, we think and work a lot on the question of data science for social impact — most of the frameworks of data science are oriented towards profit, but “social good” is not so clearly defined or measured except by proxy. The associated challenge (and opportunity) is that not all approaches developed in data science apply to the social questions that we are interested in exploring and we are left investigating the possibilities of proxy datasets and new ways of imagining applied statistics. In these ways, our challenges often veer into the philosophical realm as much as the technical.

What made you decide to write about the Satellite-Based Mapping project? What do you hope for readers to take away from it?

One of the ways that the Statistics and Machine Learning team thinks about impact with our technical work is how to reach as many relevant people as possible. Unlike in academia, we aren’t compelled to publish exclusively (or at all) through journals which means we can get work out more quickly and more informally — allowing active discussion on our projects as they happen instead of years after and hopefully get useful tools into the hands of folks who can use them now.

In publishing a blog with Towards Data Science, we see a pathway to reaching technical peers and starting conversations — as we explore the role of data science in philanthropy, having two-way communication about our approach and work is essential. For that reason, writing and communicating about our work is a core component of how and why we do what we do. In the hand-harvested crop mapping work, it has always been clear that what makes the work exciting is not only the possible applications for questions of labor, agriculture, land use and more, but also for the particular technical approaches used — Towards Data Science was a natural home for this discussion.

One consideration that I grappled with in my work and in preparing a piece for Towards Data Science was how to make sure the work was diligent and informally peer reviewed without the framework of an academic journal — building an approach to transparency and rigour is something that I believe will be essential to the future of data science in philanthropy. Overall, I’m hopeful that readers are inspired to engage with, learn from and challenge my work.

What kind of writing in data science do you enjoy, and what would you like to see more of?

Our team enjoys engaging with all sorts of writing and non-writing sources for our data science and machine learning purposes. Our team is broadly interested and invested in ways to amplify previously underrepresented and silenced voices in data science and machine learning, and we always want to see more of these perspectives and opinions. We appreciate reading about what hasn’t worked and wish that there was more conversation about failures and disappointments on the way to success. Finally, we believe on our team that fancier isn’t always better, and we enjoy learning how “simple” approaches are used skillfully and elegantly to address a challenge.

What are your hopes for the data science community in the next couple of years? What role(s) do you see the Rockefeller Foundation playing as a leader in the data science non-profit space?

I have many hopes for the data science and machine learning communities around the world.

The first and greatest hope is around the newness of “Data Science” as a field and calls for “data-driven” work: I see a lot of room for folks working in DS + ML to coalesce around a shared identity and set of principles. Across industries, the term “data science” is used to refer to data visualization, applied statistics, some software engineering and so much more — in the future, I am hopeful that we can help define and then outwardly share what is and isn’t part of data science.

Additionally, people in the field are sometimes asked by well-meaning colleagues to “prove” or “show” relationships that don’t exist in the available data — I’m hopeful that with time, there will be community standards for what types of scenarios we ought and ought not respond to.

More aspirationally, I am hopeful for the ways that advancements fueled by for-profit companies in AI can be adapted and repurposed for social questions. When it comes to the Rockefeller Foundation’s role in these futures, it’s hard to say at the moment. We are excited by some projects that are realizing meaningful impact, but still in a learning and listening mode and are thinking a lot (still) about how to lead data science and machine learning in philanthropy let alone the world.

It should also be noted that our applied data science shop sits under the greater Innovation Team which has provided funding to leverage data science for social impact. Some of our funded partnerships have included Atlas AI, which advances AI methods to develop estimates of population characteristics, economic conditions, agricultural productivity, and infrastructure access across the emerging market, and data.org, a platform that harnesses the power of data to tackle society’s greatest challenges.

Curious to learn more about data science at the Rockefeller Foundation? Follow them on LinkedIn and Twitter. Here are other articles that share case studies of projects that utilize machine learning for social good.

“Using Satellites to Map Economic Opportunity in New York City” (May 2020, TDS): Madeline Lisaius shares how the team used satellite imagery to identify economic opportunity zones in New York City.
“A More Accessible and Replicable Method for Satellite-Based Mapping of Hand-Harvested Crops in California” (August 2021, TDS): Madeline Lisaius shares how the team created an accessible and replicable early mapping tool for hand-harvested crops.
“The Race to Eliminate Lead-Contaminated Drinking Water” (May 2021, the Rockefeller Foundation): The data science team built a Lead Pollution Dashboard in partnership with BlueConduit’s mapping algorithms to accurately predict the location of lead pipes, which helped lead removal efforts.

Community Spotlight

Adapting Data Science Tools for Social Impact in Philanthropy

Data Science at the Rockefeller Foundation

Written by TDS Editors