Air-pollution monitoring

Air Pollution is responsible for 4.2 million deaths per year according to the World Health Organization (WHO). No wonder we should dedicate resources to understand and monitor air quality in our cities and neighbourhoods. This should help authorities in urban planning as they can decide where to plant trees, build green spaces and manage traffic. Also, it can make us all aware of the impact of air pollution in our everyday life, which is critical to our health.
In this article, we approach air pollution from a different angle. We want to introduce and discuss the concept of crowdsourcing air quality monitoring. For those unfamiliar with the concept of crowdsourcing, it is about engaging the public to achieve a common goal. We can achieve this by dividing the work among participants in small tasks; in this case, collect air quality measurements. Our aim is to use crowdsourcing in a smart way to build up an accurate air pollution heatmap with the use of Artificial Intelligence (AI).
We argue that crowdsourcing can potentially be a better approach than the static Air Quality sensors placed in cities at the moment. First of all, not every city or town has one, and when they do, they are typically placed in a way to capture the average air quality of that area. This means they do not necessarily reflect the pollutants we will breathe in when we are walking in the city centre, for example. Also, we need to consider the cost of acquiring, maintaining and using those static air quality sensors.
On the contrary, the crowdsourcing proposal relies on our willingness to participate. This, however, would also require the use of low-cost mobile air quality devices to take readings in our whereabouts.
Importantly, we do not want to actively be taking measurements everywhere at all times. This relates to practical reasons as well as data privacy ones. For starters, the sensor would have to operate continuously. It would also have to track our location and taking timestamped air quality measurements. That is even if we are indoors or it is in our bag or pocket. As a consequence, the battery life would be easily depleted. To top it all, that this tiny sensor would know more about your movements than your significant other. Not very good.
Thus, the challenge is to
identify when and where air-quality measurements should be taken to efficiently monitor our city.
Hopefully, the optimisation problem here is starting to become clearer by now. Given constraints about how often we are willing to actually contribute and how much battery the device has available, we want to identify the best places to take measurements such that those measurements are most useful to facilitate an efficient environment exploration and consequently improve our understanding of it.
To solve the problem, we need to answer some key questions first. Specifically, how is our environment represented? How much each measurement contributes to the overall picture? How do different measurements impact our understanding of air quality?
In essence, we need a model for the environment. This will help us quantify the information entailed in each of the readings. It can also help us understand how each measurement affects the overall information collected over time.
A good candidate approach is the use of Gaussian Processes. For the more technical readers, a Gaussian Process is a regression technique that naturally provides a predictive mean and variance over its estimates. In practice, this means we can use this technique to interpolate air quality over the environment over time, given that some measurements are taken. Or put differently, we can predict the air quality at unobserved locations (locations where no measurements are taken) as well as predict the state of the environment in the future. Importantly, however, a Gaussian Process can also be used to provide how certain we are about air-quality in each location (by utilising predictive variance).

In statistics and information theory, the variance is a measure of uncertainty. Our goal is to eliminate uncertainty from our heatmap. We want to know (be sure if we can) of the air quality in our area. To help us understand visually what we mean by interpolation, have a look at the figure above. In that figure, we can see the application of a Gaussian Process as a regression technique. Specifically, it interpolates the air quality in the city of Beijing.

As we claimed, however, importantly, Gaussian Process provides you with an understanding of the uncertainty in the area. The figure above shows that uncertainty around the static sensors is low while in between those starts to rise. Further away from the sensors, the uncertainty is very high.
Now, back to our crowdsourcing approach. How does this modelling approach even help us?
Let’s think about it for a moment. How the equivalent figures look like where instead of the static sensors we are modelling measurements taken by volunteers. As we said, it’s tough to be taking measurements at all times. We are more likely to take measurements when we are outside or when we just feel like it. Probabilistically, however, more people are clustered to the city centre and perhaps near popular attractions. No need to overthink this part. Here is an example of the setup below. There are literally some people in the city centre and some near the Summer Palace.

What we can tell from this figure, however, is that maybe the static sensor setup was not that bad after all. This is because the variance or uncertainty about the environment is high in all areas (as seen with yellowish in the figure). On the bright side, people in the city centre or popular attractions can be confident that they know what air quality is like where they are. But this is not very useful for urban planning.
We can do better. We haven’t really solved the problem yet as we haven’t guided or alerted people when and where to take the readings. It was merely a possible default setup given how people move or where they are more likely to be willing to take measurements.
Nevertheless, we have managed to evolve our understanding. We can now formulate the problem in the context of our modelling method. Specifically, we need to take a set of measurements such that the information, as measured by the use of Gaussian Process, is maximised.
A solution to this problem requires the use of algorithms from the broad area of Artificial Intelligence. Specifically, we need an intelligent system to decide when and where measurements should be taken to maximise information gained about the air quality, while at the same time minimise the number of readings needed. The system can employ greedy search techniques combined with meta-heuristics such as stochastic local search, unsupervised learning (clustering) and random simulations. These are just some example potential techniques and I am not going to expand further on them but rather give an overview of how an algorithm may look like.
The main idea is to simulate the environment over time, asking what-if kind of questions. What if I take a measurement now versus one in the night. Which one would give me the best possible outcome? Are both necessary? What if I take a measurement downtown or what if I take it near my home?
Think about all the possible what-if questions we can ask. That is a what-if for each person, at every moment. This problem is very hard to solve even for the most powerful computer. It would require to run many millions of simulations to cover all possible scenarios for each of our volunteer. Fortunately, we have more algorithmic approaches in our pocket as mentioned above. Clustering, for instance, can be used to group people in the same location and essentially treat them as being a single entity. So the intelligent system does not need to consider different simulations for each one of the volunteers but rather for each for the entities. Not only we can group people in space but also in time. People who took measurements at a similar location at a similar time can also be seen as a single entity.
Now that we reduced the simulations required, we can use meta-heuristic approaches that basically evaluate each combination in an order such that we progressively get closer to a better configuration but not necessarily the best one. One that is better than the above might be useful for both the participants and urban planning. For instance, a greedy algorithm will firstly iterate over all the possible entities over time and evaluate how the information collected varies if a single measurement is taken by any one of those entities. In the same fashion, progressively the algorithm can keep selecting the best places without actually having to evaluate all the possible setups.

Finally, given the algorithmic process above, the intelligent system gives us a good possible solution. These means we have a recommended set of locations to take air-quality readings. As we can observe in the figure above, it is a much better setup than what we previously had. Also, it is comparable to the static sensor placement we discussed earlier. Just imagine having more volunteers though. This can only get better and make the map above greener. Overall, air quality monitoring can become easier and more efficient with the use of AI algorithms and Crowdsourcing.
This article and figures are based on a published journal: Zenonos, Alexandros, Sebastian Stein, and Nicholas R. Jennings. "Coordinating measurements in uncertain participatory sensing settings." Journal of Artificial Intelligence Research 61 (2018): 433–474.