The world’s leading publication for data science, AI, and ML professionals.

Doing Data Science from Scratch

Building measurement devices, measuring, analysing and drawing conclusions

Doing Data Science from scratch task by task

A Raspberry Pi 3B based sensor pack - Image by Author - Author's workshop, Dec 5th, 2020
A Raspberry Pi 3B based sensor pack – Image by Author – Author’s workshop, Dec 5th, 2020

I am an avid reader of Data Science articles, especially here on Towards Data Science, and I really love doing tutorials and learning new things. Over many years I have observed a tendency in such articles, blogs, or tutorials to start with specific data sets. Consider how many pieces are written about the Titanic or the Iris dataset as an example. Rarely do I find articles where measurement devices are used, measurements are taken, a dataset is collected and then used in the generic Data Science workflows subject of all those articles. Rather than generally bemoan this lack of perceived richness, I decided to contribute some articles and to advocate for a more in-depth discussion.

Doing Data Science from Scratch is a journey. Any such journey starts with defining a real-world event we want to measure. Next, we describe our objective and measures of success. Based on those goals, we can select or build a measurement instrument. Testing the strategy and validating the process is always extremely important. Planning the running of the experiment and eventual collection of the data should never be oversimplified. All going well, we arrive with the data set, and that feeds into doing that traditional Data Science workflow.

For me, as a writer, there are some risks in this strategy. Defining the objectives and measures of success is relatively straightforward, and you should do this for every project regardless of the complexity involved. Instrumentation of an observable event can include electronics and hardware and could become complicated. Those sort of complications are challenging to write about, and often do not interest many audiences. Similarly, signalling between Sensors, Edge devices, network controllers, and building up the dataset from the wild is always involved, challenging to write about, and again will not appeal to all readers. Naturally, there is less risk in working with the collected data. Many great writers produce good work with Time Series data. Sensors mostly provide us with real-time streaming measurements. Since one of my passions is real-time streaming, I eat drink and sleep it, I offer this article as an illustration of doing it all from scratch.

Objectives and measures of success

The first step is always to clarify and define our objectives, goals, and how we will measure our success. I doubt anyone could stress this one enough, if you are driving across states, camping in the wilderness, or visiting a friend, you will always have a map, a target arrival time, and some expectations for the adventure. Doing Data Science from scratch is no different. Our first step aligns with the CRISP-DM method step of "Business Understanding".

The CRISP-DM Process Diagram - from Wikipedia
The CRISP-DM Process Diagram – from Wikipedia

Since there are many books written on Data Mining, I will only say that you should always do the Business Understanding stage really well. Avoid the temptation to skip, or do it quickly to get started. Many many projects are resolved, adding significant business value, just by doing the Business Understanding phase well and honestly.

For the purpose of this article, I have established a small project. I wish to measure the amount of traffic passing by my house. One of my other passions is going for a stroll in the scenic area that I live in. So I would like to understand what time of the day has the least Cars and Trucks on the road. Declaring success requires me to be able to plot the frequency of traffic, map daylight times, and find a slot that is the safest for my stroll. I could then visit places like Fore Abbey.

Fore Abbey

or Lough Lene

Lough Lene

Instrumenting and measuring traffic

Given that we want to instrument the traffic flow in front of my house so that I can measure the frequency of passing vehicles, the first step is to research the general approaches to this activity. Similar to business understanding, it is worth doing the research really well. Some simple and obvious solutions probably exist, and you might not be aware of how researchers generally tackle the task. A careful analysis of approaches is critical with Data Science from scratch.

Research

An interesting article [Retail Sensing] was helpful in providing a summary of the main methods available for counting passing traffic. Those are broadly:-

  • Manual counts. The manual process is based on sampling a specific timeframe, either on-site or via recorded video feeds, and counting the passing traffic. The count provides an estimate which is then used to extrapolate traffic flows for more extended periods. As a sample-based method, the accuracy will depend on how good the selected sample is relative to the expected flows of traffic. Doing a manual count is an option.
  • Computer Vision. Using machine learning models, such as Yolo, to detect vehicles and do the count automatically. Given that trained models exist, such as Yolo, we can use many more samples and therefore we can get a better long term accuracy. Doing a Computer Vision algorithm is an option and mostly the code already exists. See [Bansal]
  • Pneumatic Road tube counting. "Here one or more rubber hoses are stretched across the road and connected at one end to a data logger. The other end of the tube is sealed. When a pair of wheels hits the tube, air pressure in the squashed tube activates the data logger which records the time of the event.". My local authorities, and police, would not allow me to use such a method.
  • Piezoelectric Sensor. "Piezoelectric sensors collect data by converting mechanical energy into electrical energy. The piezoelectric sensor is mounted in a groove cut into road’s surface.". I doubt I would ever get permission to cut grooves in the public highway.
  • Inductive Loop. "An inductive loop is a square of wire embedded into or under the road.". Digging up the road is going to be a hard sell.
  • Magnetic Sensor. "This detects vehicles by measuring the change in the earth’s magnetic field as the vehicles pass over the detector. The sensor is either buried in the road, or enclosed in a box by the side of the road.". Digging up the road is going to be a really hard sell.
  • Acoustic detector. "This detects vehicles by the sound created as the vehicle passes. The sensor is mounted on a pole pointing down towards the traffic. It can collect counts for one or more travel lanes." . This might be an option but mounting equipment on a pole by the roadside would require permission.
  • Passive Infrared. "Passive infrared devices detect vehicles by measuring the infrared energy radiating from the detection zone. When a vehicle passes the energy radiated changes and the count is increased.". Creating an infrared zone would require permission from the authorities.
  • Doppler and Radar Microwave Sensors. "Doppler microwave detection devices transmit a continuous signal of low-energy microwave radiation at a target area and then analyze the reflected signal. The detector registers a change in the frequency of waves occurring when the microwave source and the vehicle are in motion relative to one another. This allows the device to detect moving vehicles.". Pointing such a device at vehicles might cause public health concerns and would require a permit.

Papers and articles from [Airpix], [Qiao Meng et al.], [Pancharatnam], and [Bouaich] demonstrate the wealth of research and implementation of real-world projects in Computer Vision for Traffic planning and management. Their combined work tends more towards Traffic management and Smarter cities. The articles make for interesting reading and certainly help to inform me about this popular topic.

Based on the research, it appears that two options are viable, neither would not require permission, and certainly would not involve me in digging up the road outside the house. Those are:-

  • Do some form of the manual count;
  • Record some footage of the traffic and then process the video with a Yolo3 model to detect and count the traffic. There are plenty of code examples and videos made about Yolo. See [Yolo3 YouTube], [Airpix]

After completing the research, you should have a better idea about how others have tackled your specific challenge, and that provides you with a great starting point for your own work.

Method

If I wished to do a manual count, I could simply stand at the window and count each passing car for a day. It would be incredibly dull, but it can be done. The steps would be:-

  • Stand at the window and wait for something to go by;
  • If something goes by that is a car, mini-van, van, tractor, or truck increment the count by 1, otherwise, ignore it
  • Record the date and time of the event

There would be an audible cue, traffic noise, and a visual signal, the vehicle passing by the viewing angle of my eyes, to help me trigger the count increment. The process seems simple enough, but could we replicate the human activity with a measurement instrument? I believed so and resolved to repeat the process.

Instrumenting the event

Having resolved to build a measurement device for my project, I drew heavily on my previous work. Any such measurement process requires two components:-

  • The sensor(s). A simple camera could replace the human eye and the human ear.
  • The data logger or collector. We could stream events to a central server and then call some Object Detection scripts to perform the count.

The Central Server

I have written several articles and made a video series about my central server concept. You can get the full details from my Blog.

A Raspberry Pi 4b on steroids

For this project, the central server will provide a Samba server giving us a place to stream the live video clips and collect all the motion events. The server also has a RabbitMQ broker service, a REDIS key-value pair database, and a MongoDB instance. We have all the collection services we need right on one server.

The Sensor

As it turns out, I already have a Smart Doorbell I built for another project. You can read the full detail of the entire build process directly on my blog. I choose to write on my blog because some of the pieces are hardware, purchasing, and assembly, and those details might not interest all readers. There is an excellent explanation on the blog for those interested.

PIR Camera: Raspberry Pi 3 Doorbell

Here is an example of what the doorbell normally shows.

An illustration of what my Smart Door Bell provides. Image by author Nov 28th, 2020
An illustration of what my Smart Door Bell provides. Image by author Nov 28th, 2020

The doorbell is a motion detector based on the Raspberry Pi 3b and a Camera. The little device connects to the Samba share (central server) and uses the Linux Motion library to begin monitoring the driveway. When the camera detects movement, a video stream is created, and that is stored directly on the central server as a clip. During a typical day, there can be hundreds (100’s) of small video clips showing movement. The movement can range from a tree blowing in the wind, to a delivery man, or just me coming and going. But how to get a single movie that could be processed by a model such as Yolo? Thankfully I found a solution to this one as well.

A python script, using the moviepy package, allows me to collect all the clips and create one large video file of an entire day of events. You can find a full explanation of the process in the article I wrote.

Video clip processing: Motion detection

A method already exists, from my work, to capture movement in a camera view, to record that activity, and then combine all the activity into a single video file. We start to see the benefit of building your own measurement devices, taking measurements, and setting yourself up for analysing and drawing conclusions that result in real-world insights.

The process also generates a log of all events providing even further sources of data for our experiment.

An example of the Motion detection log - Front doorbell device December 2020
An example of the Motion detection log – Front doorbell device December 2020

Strategy

Given the availability of the Smart doorbell, the central server, the video processing script, and the log file, I considered that pointing the doorbell towards the road would generate a video of passing vehicles which could be counted either by hand or by a detection algorithm. Further, the Motion log can be parsed and counted. We have the event start time, event end time providing duration, and timeframe.

Experiment design

The value to me in building my own measurement devices is that I am free to design experiments to answer questions, which, when analysed, allow me to draw conclusions about real-world observations.

Executing the experiment and doing the Data Science workflow

My next step is to execute my experiment, collect the data, and then move into the regular Data Science workflow. We will be working with Time Series data, and as you would expect, it will be messy and will require the usual Exploratory Data Analysis, Cleaning, and plotting.

If you are interested in reading the results of the experiment tune in next time. I plan to publish a detailed article once the investigation concludes. Meanwhile, I would love to discuss my design and approach with you, the readers. So feel free to contact me or leave your perspectives in the comments. Feedback drives ideas, and that leads to refinement and breakthrough.

References

[Retail Sensing] Vehicle Detection: Ten Ways to Count Traffic

[Yolo3 YouTube] An example using Yolo3 Demo of vehicle tracking and speed estimation at the 2nd AI City Challenge Workshop in CVPR 2018

[Airpix] Traffic Counting ATCC system provider Airpix Solutions uses computer vision applications for vehicle counting and vehicle numbers.

[Qiao Meng et al. ], Qiao Meng, Huansheng Song, Yu’an Zhang, Xiangqing Zhang, Gang Li, Yanni Yang, "Video-Based Vehicle Counting for Expressway: A Novel Approach Based on Vehicle Detection and Correlation-Matched Tracking Using Image Data from PTZ Cameras", Mathematical Problems in Engineering, vol. 2020, Article ID 1969408, 16 pages, 2020. https://doi.org/10.1155/2020/1969408

[Pancharatnam] Pancharatnam, M. & Sonnadara, Upul. (2008). Vehicle Counting and Classification from a Traffic Scene. https://www.researchgate.net/publication/234136865_Vehicle_Counting_and_Classification_from_a_Traffic_Scene

[Bouaich] S. Bouaich, M. A. Mahraz, J. Riffı and H. Tairi, "Vehicle counting system in real-time," 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, 2018, pp. 1–4, doi: 10.1109/ISACV.2018.8354033.

[Bansal] object-detection-using-yolov3-and-opencv


Related Articles