The real meaning and process of data democratization.

How to do it and why to do it right now…

Simonas Žilinskas — Inta
Towards Data Science

--

The scene: a small village in rural India. The whole of the village has gathered to listen as public records are being read out. A villager is listed in the public record as having rented out his plough to the government-sponsored irrigation project. “No,” he says, “I did not do that. I was away in Delhi at my cousin’s wedding at that time.” There is laughter, as well as outrage, as people immediately discover how they have been tricked and how public money has been siphoned away from them and their village. More false information is revealed: Examples such as items for bills for transport of materials for 6km when, in fact, the real distance is just 1km. A worker, employed according to government records on the construction of a new canal, stands up and asks: “What canal?” Workers involved in the building of houses confirm that fifty bags of cement, not one hundred, were supplied and used. At the end of the public hearing the chant goes up: “What do we want? Information. What do we want? Information: a story told by Dr. Richard Calland

Over the last fifty years, many states and their governments have formally acknowledged a right of access to information responding to these claims. More and more nations, for example, South Africa, have entrenched a right of access in their constitutions. Formal recognition of access rights is a no-brainer if institutions strive to maintain popular legitimacy. The road to more democracy is the road of data democratization. A very rigid correlation exists between access to information and the existence of democracy. Simply said: to make the right decision and to tick the right box on the voting ballot, the citizen must be informed. Access to information is essential for persons to realize their basic right to participate in the governing of their country and live under a system built on informed consent of the citizenry. In any state, and particularly in states where the policy analysis capabilities of civil society are poorly developed, political participation rights cannot be exercised effectively without access to government information. Sounds obvious? Great: let’s move on since apart from talking about that “ambitious ambition” of mine, I would like to cover a little bit of theory on data democratization in general, answering two main questions: why now? and how? in the opposite order.

How to democratize data?

Well the solution I found, is building a search engine. It is obviously not the only way to do it — far from it. But it is one that is flexible and at the same time powerful thanks to the emergence of data science.

I see three main steps in this process I chose, that I’ll cover one by one:

1. Making data easy to find

Most of the data that could be democratized is available somewhere. But the problem is that “somewhere” is more or less the same as “nowhere” when it comes to data usefulness. If only political campaign managers see the data you cannot call this democratization. The problem lies in the wrong KPIs (key performance indicators) of the people that publish data. Most of them focus on how to push as much data as possible to the internet instead o focusing on how many people will see this data.

If we finally agree on changing this focus, how will we reach the performance? This needs a good search engine with good search engine optimization. A good search engine is an intelligent one. In opposition to a simple search engine which only looks for keyword matches, intelligent search engines interpret the query by identifying or deducing the nature of the result wanted by the user, identify important parameters such as time and location mentioned in the query and finally take into account the context of the search. Once it is usable it also needs to be used. Reaching users can be done in numerous ways. Statistics show that data is mostly searched through search engines like Google. Thus, decent SEO is a crucial step towards the goal to facilitate data search.

2. Easy, understandable and high-quality data representation

Once the search engine is working and accessible, the next step is to display the results. This part seems simple, and at its core it is simple. But try to design and build something that looks & feels user-friendly but is complex enough to understand the whole picture and you’ll see that the task isn’t easier done than said. In fact, most of the wannabe solutions for data democratization are far from matching these criteria. Here are the main problems that solutions face in displaying data in the best possible manner:

  • The lack of directness is caused by the way the solutions treat their data: many solutions group data in tables and thus fail to display individual values and lose directness;
  • Factual imprecisions kill the whole deal. Can be tough to guarantee full precision, but a fall into this trap is definite;
  • Not answering the question is also among the very frustrating experiences a user can get;
  • Sources are necessary;
  • Data context is a must;
  • Representative visuals are a big plus (like infographics) for comprehension
An example of poor quality data representation

An article with a more in-depth overlook of data representation experience is coming soon…

3. Repeating the two first steps while augmenting the quantity of data

The third step is the one that gets ignored all the time in two ways.

Either a solution rushes in, adds all the possible data, and can’t handle it, or in the opposite case — adds too little data and the users don’t get enough value from the product to use it.

Illustration by Katerina Limpitsouni

The first option is the most common one. We see Statista.com that have lots and lots of data pilled up, mismanaged, and thus with poor search functionality and UX. They have data ranging from GDPs per capita, to Fortnite frag counts by esports professionals. And as it is common with almost everything (with a few exceptions): choosing quantity hurts quality.

The opposite problem is self-explanatory.

Data democratization needs to find it’s speed equilibrium: the intersection between the quality of search & data curve and the quantity necessary to have users.

Why democratize data now?

This sounds like an absurd question knowing that in the introduction I advocated for data democratization. And, let me assure you, I did not change my mind. I just believe that right now is the best time to accelerate the process of making information spread faster and reaching more eyes.

First of all, we have an immense amount of data coming out every day, progressively readier to be used. Governments are collecting more data than ever, opening it up to the public, and the quality of this data is increasing, although the speed and extent of this positive trend depends fiercely by country and region. Old-fashioned, corrupt mayors and government officials may create a short-term blockade, but on the long term and on a large scale this does not cause problems.

Data interpretation is also on the rise thanks to the progress in data science. I will publish an article (in collaboration with a few great data science and open data experts) on how AI can be used to get interesting insights from data.

Photo by Elena Koycheva on Unsplash

Second, we have the resources to make people’s eyes follow this data. The progress in user experience (in its vast definition) makes me believe that it is possible for “boring”, but important for decision-making data to reach large audiences. In the last 10 years, the collaboration of designers and engineers has resulted in the creation of new types and new designs of interfaces that are more convenient, better looking and in general — they deliver better user experiences. Although attention spans decrease, we learn how to get this attention more frequently. Almost no one spends 2 straight hours on Instagram, yet if you sum up the daily time spent on it — you might discover a different story. Tell me if I’m naive, but I think that Instagram isn’t privileged by its wonderful user experience and other products, like ones representing statistical information, can deliver equal competitors for attention.

Last, but probably most importantly, now is the time to do it. Not tomorrow. We have upcoming elections, we have numerous crises and enormous amounts of decisions to be taken. If we follow our gut and not our data, we are going to lose in most of these situations.

Here’s a question for you. If two candidates are running for reelection. Would the more rational decision be based on new campaign promises or the track record of their participation in sessions during the last mandate and their voting history?

I would choose the second option. And I believe that if more people chose this second option, the society would be better represented.

I think that’s a cause worth attention. I even think that’s a cause worth action.

In a public diary that I recently started writing, I announced the kickoff of a brand new project I needed a team for. At the moment of writing this, I already have a substantial part of the team formed and I’m in the process of clearing the dust of-of my idea in my own and in other’s minds. If you feel like you can contribute, feel free to write me at simonas.zilinskas@sciencespo.fr .

This is the HIVERY trademark tagline

--

--