Doing Data Science from scratch task by task
Log file analysis with Python3

This is the next instalment in my series Doing Data Science from Scratch. We are doing a project to count the number of passing vehicles on the road where I live. The question to be answered is fundamental – when is the best time to go for a walk without being mowed down by speeding cars, trucks, or other agricultural vehicles. The research is done, the method and strategy are worked out, and now I am starting to do the project work required. This article will provide a summary of the work necessary to extract the data from the motion detection log files.
Parsing a plain text log file with Python
Log File analysis is now a commercial offering, especially those web log files generated by Apache, or Nginx sites. The topic is popular for learners, and there are many articles, tutorials, and courses available. The exciting thing is that each application creates a unique format, and each one is, therefore, a code puzzle.
Before you reach for the programming spanner, you should search for previous work. Whilst it is satisfying to solve a code puzzle, re-inventing a wheel, is just a goal displacement. The goal is to answer the question and not to solve programming challenges. In my case, I am not able to find items or pages that reference specifically the motion log file.
Options
As I mentioned, I am unable to find previous work, but there are options. Those are:-
- Examine the plain text file and create an algorithm to read the required values from the file. Write the recovered records to a Database and then use Pandas to do some analysis. The Python job can be converted to a demonised service under systemd, or scheduled using CRONTAB.
- The motion-project provides database support. We can choose to compile the motion binaries on our target system and include SQLite or PostgreSQL.
Indeed it is possible to collect the source files and header files for both motion and a database. Building the binaries and installing those might take a little time, and there might be compiler errors, or it might compile and link correctly. That might seem a worthwhile project activity for some well funded retail services, but it is an involved and risky step for my research project. We want to avoid getting goal displaced and going down a Rabbit Hole.

My little camera is a low powered Raspberry Pi 3b, as shown in the photo below, so I certainly want to avoid getting ambitious. I wrote an article about the doorbell so you can read all the details if you wish.

Having ruled out forcing the log entries into an SQL table, I am left with the programming option. So let’s get it on!
Planning the parse script
Once you commit to applying a programming solution to a specific project task, the first step should be planning. Next, perform a careful review and examination of the log, hopefully choosing a sample file that is complete. Finally try to write an algorithm by hand, checking it and then coding the algorithm. Coding is not, or should not ever be, the first step. I created a GitHub repository for this project and uploaded some samples of the log file. Here is one such example. Let us examine a detection session together by walking through a few log entries.

Consider the image above where I made three lines on the file. The black lines denote the start and end of a specific monitoring session. The doorbell might have been switched on, or the service restarted. The red line splits the service start-up logging from the event logging. The filename ‘motion5.log’ tells us which Camera is generating the log data and that provides the context of the set of entries. Camera 5 is my Smart Doorbell.
Looking between the red line and lower bold black line, I added, we can see two distinct events. Each [EVT] has three lines. An [EVT] followed by two [ALL] entries.

[1:ml1] and [NTC] has no information content. [EVT] is the event start tag and [All] combined with "End of event" marks the end of the event. We can recover a data record from the log.
- Camera: map the file name to the string Camera;
- Evt_Start: map the [EVT] record date entry to the datetime object Evt_Start;
- Evt_End: map the [All] "End of event" date entry to the datetime object Evt_End;
- FileName: map the [EVT] record "saved to: %" ;
- Duration: we can use datetime difference to compute the event duration
As BSON or JSON
{"Camera": "Motion5", "Evt_Start": xxxxxxx, "Evt_End": xxxxxx, "FileName": "xxxxxxxxxxx", "Duration": xxxxxxxx}
Once we format to BSON, we can write the entry to the MongoDB database on my central server. So it seems the algorithm involves opening the file, reading line by line, detecting [EVT] and a bit of parsing.
Writing the code and testing
Based on my initial review, I was able to write the script, and I added it to the repository. You can examine my code here. Let’s do some review from my early code execution. I do have to do the testing and add database support. My script creates a list of dictionaries, which I refer to as records, and that contains 118 entries.

Using the Spyder Variable explorer, I can double click on the records list and see all the records. A convenient feature.

Selecting a specific record, you can double click, and Spyder will show you the contents of the entry.

The script is creating the record that I had anticipated. I had a lot of fun this one, and you can examine my code and the log files in my GitHub repository.
Closing
Having defined my project, I was excited to get on and do some project work. Writing a script to parse a log file is a very satisfying activity and helps to keep the coding skills sharp. Since I am doing Data Science from Scratch, I have a lot of project work ahead of me to be ready for the first trial runs. Please tune in to my next article, where I will explain other aspects of the project work. If you enjoyed my article, have questions, see areas for improvements, then please feel to reach out. We only learn through feedback.