The beauty of a hybrid approach is offering different methods depending on the circumstances. Python Pandas could be seen as the petrol or diesel engine; an electric car could be seen as a complete no-code solution. We know the coding limitations with Pandas but are we comfortable with the trade-offs for a no-code approach such as Mito? There are so many operations in the code approach that might prove time-consuming to build with a user interface. Read on and let us enter the hybrid world and see how we get on?
Recap
I am on that mission to expose NLP routines to job hunters, and security is my top priority. If you are tuning in for the first time, then the catchup is I’ve used FastAPI and vue.js to build an application and hosted that over on AWS. The application base structure is in place, and I have the security sorted; reasonably, I believe! I’d worked out the NLP strategy some time ago. Several months have passed since I initially created the server and hosted the site, so I wanted to check and see what sort of traffic patterns have developed. Since that is a data analysis challenge, I committed myself to using Mito in my work as a trial.
Retrieving server logs, parsing, and creating the data frame
In my last article, I installed the AWS command line and figured out a command which copies log files from the server to an S3 bucket. So that is the first step – retrieving the logs from the virtual server to the S3 bucket.

By issuing the command from the /var/log directory, I copy over all files where "log" is part of the filename, a quick and dirty approach but sufficient in a hybrid world. A simple command-line call can save a lot of click & drop operations. Here is a quick screenshot if you are not familiar with S3 buckets.

Now that the files are over on the S3 bucket, I could run my script to transfer the files to my Mac Mini M1. Since the last article, I did refine my script a little.
import os
import boto3
from s3, import AmazonS3 as AmazonS3
s3worker = AmazonS3()
s3worker.show_contents_s3_bucket()
s3worker.download_contents_s3_buckeet()
s3worker.empty_bucket()
Since all the mechanics of moving files from S3 to a local machine are boring, I use a class and abstract away the boring stuff. Creating an instance of AmazonS3 as s3worker and just calling the methods:-
- download_contents_s3_buckeet()
- empty_bucket()
Pulls the log files over to the local machine and clears the bucket. Well, why pay for storage that you do not need. There were no changes to the log parsing strategy and code, but I did introduce a bit of tidying up.
import pandas as pd
df = pd.DataFrame(records)
out = store+"data.csv"
df.to_csv(out)
for log in logs:
fullpath=store+log
if ('data.csv' not in log or 'pickle' not in log):
os.remove(fullpath)
Once the log parsing is complete, I still build the data frame, but then I store the data as a CSV and delete all the files replicated over to the local machine. Keeping the environment clean all the way along is central to Security and Privacy. There is now a CSV file on my local environment which I can use to explore Mito. Let’s go there now!
Exploring with Mito
Mito has a proper home page, and there are open source, pro, and enterprise versions of the tool available. Feel free to check out the site here.
Installation is via the traditional PIP approach, but there is an extra step.
Python -m pip install mitoinstaller
python -m mitoinstaller install
On the Mac, I had trouble using Jupyter Notebooks, but it does work smoothly in Jupyter Lab, though! I use Anaconda, and Jupyter Lab is just a ‘click’ from the Anaconda Navigator.

Cells 7 and 8 deal with some housekeeping. Using mito requires loading the mitosheet package. The ip2geotools package allows the translation of IP addresses to the city. The Nginx logs only give me an IP address, which would be not easy to work in EDA.
Cell 8 sets up a set of variables that point to the CSV file from the previous section. Cell 9, below, reads the CSV into a data frame called df.

Moving to the spreadsheet-like interface is as simple as calling the sheet method passing the data frame object.

Like Excel, we can work with different sheets at the bottom of the screen. Hitting the plus button brings up the file import wizard. I have shown the df and df_pivot sheets loaded. It is straightforward to create visuals.

I can easily see that most of my web traffic or visitors come from US registered IP addresses. Setting a filter on the US, I can drill into the region and note that Californian traffic with perhaps Silicon Valley wanting to buy my site; well, we can pray!

Filtering in California allows me to see if I have a chance or not!

Indeed people from San Francisco have been visiting my site. Hooray! Creating these visuals is super handy, and I love the no-code approach.
But hang on a minute! that all seemed too easy.

In the open-source version, the operations are minimal. Here is the toolbar showing the functions similar to Microsoft Excel ribbons

I had to do most traditional cleaning and enrichment stuff by code as the spreadsheet interface doesn’t support it. Some examples from the code below
- Converting a string containing a date to date & time columns
- Calling functions to retrieve country, region, the city from the IP address
- Dealing with missing values
- Splitting out fields into components to pull out the HTTP method
df = df[['remote_addr', 'remote_user', 'time_local', 'request',
'status', 'bytesSent', 'referrer', 'agent']]
#05/Mar/2022:00:06:55 +0000
df['date_time'] = pd.to_datetime(df['time_local'], format='%d/%b/%Y:%H:%M:%S %z')
del df['time_local']
df['date'] = df.date_time.dt.date
df['time'] = df.date_time.dt.time
df['request'] = df['request'].fillna(" ")
df = df.fillna(" ")
df['action'] = df.request.apply(lambda x: x.split(" ")[0])
#Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
df['browser'] = df.agent.apply(lambda x: x.split(" "))
#df['os'] = df.agent.apply(lambda x: x.split(" ")[1])
df['elems'] = df.agent.apply(lambda x: splitter(x))
df['count'] = 1
ips = df['remote_addr'].unique()
#print(len(ips))
#res = iped(ips[0:10],ip_country, ipsed)
#df = df[(df['date'] > pd.to_datetime('2021-03-01'))]
ips = df['remote_addr'].unique()
print(len(ips))
res = iped(ips,ip_country)
df['country'] = df.remote_addr.apply(lambda x: country(x, res, 'country'))
df['region'] = df.remote_addr.apply(lambda x: country(x, res, 'region'))
df['city'] = df.remote_addr.apply(lambda x: country(x, res, 'city'))
So using Python, Jupyter Labs, Pandas and Mito is a true hybrid approach. The cleaning and preparation of the data set appears to suit the code approach whilst the exploration and visualisation of the clean data appears to suit Mito very well.
Still, I know that people from San Franciso are visiting my site, so perhaps it is time for me to put my elevator pitch together and hang out at some elevators in San Franciso.

If you spend time away from coding, it becomes challenging to remember, and hence these hybrid tools are convenient. I like these hybrid approaches!