Hands-on Tutorials
Analyzing Singapore’s Traffic with Python and OpenCV

Covid-19 profoundly impacted every aspect of our life, from global supply chains to the way we work. In most countries during the pandemic, some form of lockdown was implemented in varying levels of restriction. An interesting aspect of these lockdowns is the effect on day-to-day life in different areas of the world with one of the most common preventive measures being a stay-at-home order. Singapore provides an interesting test case for observing these impacts, as a densely populated city-state and island. These two factors, mixed with a strong government presence should provide ample evidence of these effects. By using the government’s traffic camera database and Computer Vision, we will attempt to understand how traffic near the city’s center was and is still being affected by the pandemic as a whole.
The focus of this post is to answer the questions:
Did COVID-19 have an impact on the traffic in Singapore? If so, what was that impact, and what may have contributed to it?
The code in this article is meant as a supplementary tool for explanatory purposes that has been abbreviated and modified slightly from the original; it is not a walk-through but a possible scaffolding for how to approach the problem. If you are just interested in the data, feel free to skip to the results section.
DATA COLLECTION
Singapore has an incredible public database covering all aspects of the city and its functions. You can visit it at: https://data.gov.sg/. This project leverages the traffic cameras portion managed by the Land Transport Authority. The API allows the user to specify a timestamp and returns all of the data from each camera around the city. We can use the request library from python to specify a timestamp and receive a JSON object containing the location and a video frame of all cameras.
The JSON object returned is a listing of each camera and various data including location, link to the video frame, and some metadata about the image. We are most interested in three aspects:
- camera_id – nested in an array of cameras inside of camera key
- image: a URL that has the image of what the camera is seeing at that time
- location: GPS cords for the camera, nested in the camera object
In order to do this, the JSON response needs to be normalized and parse. This can be done with the pandas’ library, which will be used to manage our data.
This will leave us with a Pandas DataFrame of 87 rows x 8 columns.
This DataFrame is all of the cameras around the city for a given timestamp. In order to reduce the amount of data collected and control the scope, we are going to limit the number of cameras down to just cameras 1702 and 1711. These are located on CTE expressway between the districts of Bishan and Serangoon. Being on a major road, any significant changes in traffic should be easy to spot. Below is an example of what each camera sees.


The next step is to establish a time frame in which to measure the traffic patterns. A two-month period from March 1st to May 1st for 2019, 2020, and 2021 in increments of every three minutes was chosen. This was selected for three major reasons:
- The inclusion of a three-year timeframe enables a pre-, mid-, and post-COVID evaluation with 2019 serving as an example of a normal traffic pattern.
- This time frame is also important in terms of lockdown – as the strictest measures were introduced in April 2020 and the lifting of restrictions in 2021.
- Limiting the timeframe to every three minutes allows for a more manageable data set of images to label.
Using pandas, the framework for the bulk request can be established. First, make a DataFrame of timestamps with the date_range function:
Second, format the timestamps into a format that can be passed to the API.
Now that we have dates to loop through, we can leverage the concurrent library to make concurrent requests using the futures function. The future library makes it a lot easier to make a bulk request for data by creating different workers that can all make requests in parallel to each other. A great guide on how to use futures can be found here.
Below is a basic outline of the function that will be used. Fair wairing: Each call for this still takes about 2 hours to complete.
This will produce a massive list containing all of the responses. The next step is to sort the response by the camera_id column and save them as their own .csv. I also would recommend saving the original list as a DataFrame so it can be referenced later if needed. (So we do not have to redo the API calls for 2019 if we wanted to look at other cameras in the future.)
These steps were repeated for 2020 and 2021 leaving a total of 6 separate files, 3 for each camera. The total runtime for the request section was about 6 hours. We now have the links to the images and can move on to snatching the images from the database. For grabbing the images, we will use the same structure as before to make concurrent calls, but first, load in the DataFrames made in the last step.
With the DataFrames loaded in, we can write our new request function and concurrent request framework.
There are a couple of things to note about the above code block and its output. First, we are grabbing A LOT of images; So make sure there is enough space to store them. Second, each call takes about 1.5 to 2 hours depending on the number of workers defined.
With our images gathered, we are ready to start generating data. We will be using OpenCV and leveraging the GPU power that google offers to Colab users. In order to do this, I suggest following this tutorial for setting up OpenCV to use GPU leveraging within the colab environment in order to speed up the labeling of images.
After setup, thanks to OpenCV, labeling images is fairly straightforward.

A couple of important notes on how the function detect_common_objects is setup:
- You may have noticed that it did not label the blurry car in the front of the image nor the cars off to the side. This is due to the default filtering level of .5. Any object with a confidence value lower than .5 is not considered to prevent false positives. This level can be adjusted to whatever level is wanted by defining nms_thershold in the function call. More information can be found here.
- You can modify the tagging dataset in which the model was trained on using the model keyword. However, YOLOv4 is extremely well suited for this task. If you are interested in learning more, here is a link to the YOLOv4 structure and design.
Our basic workflow is now set up, and we can write the function that will handle all the images in our 2019 dataset as well as establish a method for keeping track of labels and adding them back to DataFrames.
Next, we can use this function to loop through all of the data gathered and generate a label DataFrame that can be used for analysis. Below is what a call would look like for camera 1702’s 2019 images.
We then repeat this step for each year and camera. FINALLY, we are ready to start doing some analysis! If you have followed along through the data acquisition process, I would like to clarify a few questions that may have popped up.
- The dataset is limited to whatever is returned by calling api_imageRequest function. I used a try-expect block so that even on failed calls, it would not throw an error, and not all timestamps have a valid image to pull. So your dataset may vary from the one I generated, but overall trends in traffic data should still be relatively similar. Also, grabbing images from 2021 resulted in ~35% fewer images, this was most likely on the database’s side.
- Why did we not define what labels to count in the get_label_count block? We wanted to keep track of motorcycles, trucks, and any vehicle that might be on the road. While there is a chance a person or other object may be counted, it should only have a minimal effect on the overall trends in traffic as each dataset contains thousands of images.
- When downloading the images, 2019 for Camera 1711 on March 17th and 18th are both incomplete. There was a serve lack of daytimes images leading to inaccurate data for those days. After investigating, there was scheduled maintenance during that time, which may explain the anomalies. We account for this in the data analysis portion by replacement.
DATA ANALYSIS
When conducting exploratory data analysis, it’s important to remember the "big picture" questions we had at the start of this project and use those questions to guide the process.
Did COVID have an impact on the traffic in Singapore? If so, what was that impact, and what may have contributed to it?
The first step is to load all of the previously collected data and begin to aggregate it into something we can work with.
After the data is loaded in, we can combine the labels DataFrame with the original by adding a new column named "num_cars" – representing the labels that were generated.
We also can go ahead and modify the "timestamp" column, which will be important for resampling later on.
By running the .describe() on each of our DataFrames, we can get an idea of the data we are working with.
The next step is to get an idea of the shape of our data, we can do this by looking at the frequency of the labels. This gives us a general understanding of traffic patterns in terms of how they are distributed. We’ll use the count_values() function from pandas to generate this data.


We can see from the graphs that both 2020 and 2021 saw a sharp decline in the larger amounts of traffic and a substantial increase in empty or sparsely populated frames. These graphs provide a great overview and show a strong difference. However, to draw meaningful conclusions, it is important to handle some of the issues with the current dataset.
The difference in the number of data points between each year and the skewed distributions make it difficult to know what exactly is happening. We can resample the data to get a better idea of the population as a whole and conduct a time-series analysis. As our data is time-stamped, panda’s built-in resample tool is an effective way to get time series. In order to handle the discrepancy in count between years, we will avoid any measures that rely on the total count and instead, use measures such as mean, max, and median that describe the data.
As mentioned above, there is an issue with the 2019 dataset. March 17th and 18th show abnormally low values. We can use column indexing to compare it to other days within the dataset.
An important part of determining whether an outlier is a true data point is looking at the elements that generated them – in this context, it is the images. After some examination of the images, it was apparent that on those days, a significant amount of images were missing and serval were actually maintenance messages instead.

Since we established that the two data points are most likely not representative, this missing data can be addressed in a couple of different ways. One way is to try requesting the images again for just those days, but if maintenance is the reason they are messing, this will not work. Another approach is using the other Sundays and Mondays within the data set to generate the values. This method will give a decent approximation in place of the missing data and can be done by taking the average of those days along each measure and using that as a stand-in.
With the datasets prepped, the next step is to visualize the data. As with above, we should get an idea of the shape of data as it will direct future steps and statistical tests. We are going to use box plots to understand the shape of the data along with pyplot’s subplot function to generate an informative compact visual for both cameras. With the visualizations generated, it’s important to apply the context back to the data to really understand what each graph is saying.


- Median – A good measure of how many cars you’d expect to see at any given time on the road.
- Mean -Also a measure of what you’d expect to see on the road at any given time, but it is more susceptible to outliers.
- Max – The maximum number of cars in a frame on that day, a good measure of the peak traffic.
We can gather that something (COVID-19) caused a dramatic change in 2020, spreading out the data: an overall decreasing trend for each measure. A line graph of the mean can be used to observe this trend.


Understanding the Data
It is clear that both locations show a sharp decrease for 2020 around April 7th, this coincides with the prime minister’s "Circut break" which greatly limited travel and called for all individuals to limit gathering in public places. Each camera shows a massive 50% reduction in the average amount of traffic that lasted the rest of the month with the exception of Monday, April 27th. While no significant news or events can be found for the date, the data is in line with other Mondays. This reduction supports that the pandemic and policy around it greatly reduced the amount of traffic.
It also appears that traffic hasn’t returned to pre-pandemic levels in 2021. This hypothesis can be tested using a one-sided Welch’s T-test. This statistical test will help to understand if the levels of traffic in each year are, in fact, changing and not within the realms of variance. We will be testing both the mean (average number of cars on the road) and the max (the peak business) of the datasets to get a full picture. Scipy library’s .ttest_ind() function provides a compact way of doing this.
Running the T-test function gives us two outputs: A T-stat and P-value. The T-stat indicates the direction and level of difference in the two sample groups with the P-value indicating the level of certainty. In this context, the negative T-stat means that the 2021 group is less than the 2019 group. The certainty is the P-value minus one, or the degree to which we are certain that there is a difference. A value of certainty > 95% is considered significant. Putting all this together, for each measure, the two groups show a statistically significant difference in terms of the average cars per frame as well as the max amount.
In this project, two traffic cameras were used in combination with OpenCV to understand the effect of the COVID-19 pandemic on Singapore’s traffic, including a 50% reduction in traffic for April 2020 as well as statistical testing to provide evidence that Singapore’s traffic still hasn’t returned to pre-pandemic levels. While it’s impossible to attribute everything to the pandemic, there is strong evidence that policy and public response around the pandemic have changed the shape of traffic on one of Singapore’s busiest roads.
Mentions and Final Notes
This project was inspired by an Uplevel coding project that explored a more limited range of dates and time frames. The projects are guided but do not give the user any of the code, making it a great way to practice while still providing a decent challenge. I would also like to say that if you do this project on your own, your dataset will most likely be different than mine, just due to the number of calls and the communication between the end-user and database. That being said, the overall trends and distribution of the data should be similar as I tried to stray away from measures that depend on counts. Also, for those curious about using Singapore’s database, you can find the link to terms and services here! Finally, thank you for taking the time to read this article and I hope that it inspires others.