The world’s leading publication for data science, AI, and ML professionals.

My Story as an Online Volunteer with UNV on COVID-19 Data Projects

#Data for Social Impact: Reflections on Open Data and Crowdsourcing for Good

Data for Change

By Hua Deng, Online Volunteer with the UNV online volunteering service for Regional Innovation Centre, UNDP Asia-Pacific.

Covid-19, the most important topic of 2020, has been capturing everyone’s heart, and I am no exception. As a graduate student with expertise in data science, despite my eagerness to contribute to this worldwide battle against the epidemic, I did not know how I could get involved. Fortunately, through the opportunity of Online United Nations Volunteering, I am engaged with the Regional Innovation Center at UNDP Bangkok Regional Hub to provide support on COVID-19 data analysis for Southeast Asia.

I was wondering if there were also students or professionals like me, who would love to volunteer their expertise in data science for social good but found it hard to get an opportunity.

When I was searching for opportunities, I did find a few. For example, doing an internship in non-profit organizations; joining academic research; DrivenData, holding Data Science competitions in areas of social impact; DataKind, providing volunteering opportunities on a project basis. But still, overall, there were not enough opportunities available, or the ways to approach those opportunities were not widely known.

Why?

Data sharing must be an issue, without an established data governance system. In this case, projects using open data, or projects starting from collecting data will cause fewer concerns. For example, according to a report in The Telegraph, Indonesia Government’s national Covid-19 task force spokesperson Wiku Adisasmito said the Indonesian government "appreciates" the reports from independent parties like Report Covid-19, a citizen-led data science campaign which asks citizens to report suspected COVID-19 deaths and makes analysis based on that.

Besides, nonprofits may lack the experience of organizing such "data for good" opportunities for people who have expertise in data science and passion to devote their time. They may have hardly seen past cases of engaging volunteers on data science projects, hence having little or no idea what could be done and how it could be arranged. This motivates me to write this blog to share my experience working as a volunteer on data-oriented projects, and my reflections on the core steps of "data for good" projects. You can find the blog about my project here and the tableau dashboard here.

Tableau Public


Reflection #1 – How to scope a data project?

To be honest, it took us a very long time to scope the project. We were very flexible at first and did not have a must-be-done, as this project was essentially exploratory. As a result, we spent a lot of time deciding what to do, vacillating among multiple potential proposals, rather than actually doing it.

We finally decided on making an interactive dashboard with our analysis on quarantine policies in Southeast Asia. We believe the dashboard could be an amplifier that could inspire more work, and our analysis could be a demo or spark. Besides, considering our potential audience, we reckon data visualization as the form that is the most widely acceptable.

To reflect upon my experience – when the scope of a data project is vague, it would be helpful to first determine any of the following aspects: question, data source, and category of analytics. Then it would be an iterative process to evaluate the accessibility and adjust the plan accordingly.

First, a good data project starts from a good question, especially a question emerging from the work. Identifying a clear "question owner" works best, or at least having an understanding of your audience and the topics they care about. Second, if someone on the team is familiar with the available data sources, it would save searching time and help you quickly assess the feasibility of your proposal. Third, we should recognize the category of analytics that the question falls into – descriptive, diagnostic, predictive, or prescriptive. Those four categories separately answer the question of what happened in the past, why it happened, what is likely to happen in the future, and what action should be taken to affect the outcomes. The essence of the question already largely determines the methodologies to be taken and the form of deliverables.

Sometimes, the question is not clearly defined, as the client or the stakeholder may not have a specific question in mind, and they may not have the knowledge of what data analytics can help with. In this circumstance, the project could start from understanding the context, objective, and existing projects. And in-depth communication is needed to spark a brainstorm and develop actionable proposals.


Reflection #2 – How to search for open data effectively and efficiently?

When typing in "COVID-19 open data," one is likely to be excited and then be overwhelmed by the number of data sources. It is not like academic publishing, which has been a very mature process, thus published papers are well-documented and can be easily searched. So far, data publishing is still quite decentralized, and there are no widely-accepted standards. Open data sources are scattered over the internet, and you have to find them and understand them case by case. I sincerely hope that there will be good practices emerging, so that we could suffer less from looking for the data by "boiling the sea."

Here follow five tips for searching for open data effectively and efficiently.

  • Targeted Keyword Search – Instead of a general term, use targeted keywords to get sources that meet your needs. For example, "COVID-19 data" + topic + other specification.
  • Collection Repository – When you don’t have a clear target, look for a repository, or data portal, where you will have exposure to abundant sources immediately. The challenge is to find the source that fits your projects among so many choices. Some examples include #Data4COVID19, Our World in Data, AMiner, etc.
  • Linkage and Citation – You can also find good data sources by tracing from references. Usually, authors of graphs, analysis blogs, and papers cite the data source. And some data owners also recommend other good data sources on their sites.
  • Relevant Institutions – If you know specifically what data you want and who might be its first-hand collector, you can directly go to their website and look for it. The major provider types include government, NGO, private sector companies, and academic institutions. And it would be helpful to know what types of data they provide in general, which is very intuitive if you consider where their data comes from.
  • Social Network – I didn’t expect this to be helpful until we actually found some good resources there. I then realized that those data owners also need a channel to advertise their data for better reach, especially for those data-for-good programs from private sector companies, projects from academic institutions, and NGOs committed to promote the use of open data. Once you follow those institutions or representatives, it is easy to get updated on new, innovative data sources, and good projects based on them.

Reflection #3 – How to make a differentiated analysis?

COVID-19 is a global crisis, and the community is working together to combat the pandemic as a whole. To prevent redundant work, it is important to check what has been done and what remains to be done. And here are some aspects that you can start from.

1. Combine different data sources.

Data publishers usually accompany their data with simple visualizations or reports, but hardly associate their own data with other data sources. If you can find associations between different data sources, it is likely that you will find angles others haven’t thought of before.

2. Drill-down or roll-up for different granularity levels.

Based on the context of analysis, there may be different granularity levels that turn out to be appropriate. For example, in our project, it is less appropriate to look at the Philippines’ mobility index on the country level, as the Philippines enforced its quarantine policy differently on sub-region, province, and city level.

3. Narrow the scope to make meaningful comparisons.

Many data sources are extremely comprehensive, which provide flexibility for different analysis purposes. In fact, it is not necessary to use all available data. Rather, narrowing the scope could clear out the noise and make patterns obvious. It is important to only make meaningful comparisons based on considered judgment.

4. Introduce context information and cross-disciplinary domain knowledge.

This is the most important lesson I learned from our project. Data is very informative, but it also has limitations. There are certain insights that you can only derive with in-depth understanding of the context and domain knowledge. They are hints and clues, helping structure your exploration in data and make sense of the result. For example, in our project, without the understanding of Philippines’ quarantine policy, we could never have explained the mobility trends. Making solid analysis depends on both rigorous quantitative assessment and insightful qualitative judgement.


Thank you for reading! We hope that this blog allows people to learn from our experience of utilizing open data and will encourage more people to join the community to work on data for social impact.You can find the previous blog about the project here and the tableau dashboard here.


Related Articles