
Disclaimer: This is not a sponsored article. I don’t have any affiliation with Mito or the creators of the library. The article shows an unbiased overview of the library, intending to make Data Science tools accessible to the broader masses.
Trying to keep up with the most recent data science libraries is like trying to read with your eyes closed. Nothing new can gain traction without solving a specific problem really well. That’s where Mito will grab your attention.
I’ve already written about Mito a couple of months back, but the library got some new features and updated the existing ones since then. These will be covered today.
First things first, let’s install Mito on your local machine.
Get started – Install Mito locally
The Mito package has two prerequisites:
Assuming you have both installed, I’ll continue by creating and activating a new virtual environment with Anaconda:
conda create --name mito_env python=3.8
conda activate mito_env
And then by installing dependencies:
python -m pip install mitoinstaller
python -m mitoinstaller install
Once done, you can launch Jupyter Lab:
jupyter lab
Create a new notebook and you’re ready to proceed!
New design – 90’s are over
I love when a tool does its job well. But I won’t use it if it looks like shit. That might not be the case for you, but modern design beats unnecessary borders, shadows, and other Windows-XP-looking software every day of the week.
The new Mito version comes with an updated design that you’ll see in a second. However, it’s not technically a new feature, so I’m not counting it into the list.
First, let’s create a new Mito sheet. Execute the following code to do so, assuming you have a blank notebook open:
import mitosheet
mitosheet.sheet()
You should see something similar:

You’ll have to enter your email to continue, but they won’t bother you with too many emails. Even if they do, there’s always an option to unsubscribe.
Once done, you’ll see a blank sheet in the notebook:

Yep – Mito definitely looks better than before, but that’s not why you’re reading this. Let’s continue with the first improved feature – easier data management.
Easier data management – Connecting to a local file system
One thing I dislike about Pandas is guessing how many times I have to write ../
to get at the correct data folder. That’s not the case with Mito.
Mito can now connect directly to your local file system, making dataset loading and management that much easier. We’ll use the Titanic dataset through the article, so make sure to download it if you’re following along.
You’ll see the option to import files after creating a new Mito sheet, like shown below:

You can select your dataset and hit the import button down below. That will instantly load the dataset:

The library will automatically generate the Python code for you in the cell below. Here’s how it looks like for now:

Neat, isn’t it? Let’s see how to calculate summary statistics next.
Summary statistics – One click away
Calculating summary statistics – like mean, median, quartiles, and so on – would typically imply calling a describe()
function on every column, not counting for graphical representation of the variable.
Mito does it in a single click.
Just click on a column of interest and explore the Summary Stats tab on the right side. It visualizes the data with the most appropriate chart type and tells you everything the describe()
function would:

Needless to say, but exploring data this way is a must for any first encounter with the dataset.
Change data types – Just select from a dropdown
Data isn’t always formatted correctly by default. To solve this problem, you can either change the data type or create a derived column. Mito does both with ease.
You can click on the little icon in the header column to open properties and change the data type from there:

For anything more complex, you’re better of creating a derived column. The example below shows you how to convert the Sex
attribute into a binary column, where males have a value of 1:

The previous operation generates the following Python code:
# Set M in titanic_csv to =IF(Sex == 'male', 1, 0)
titanic_csv['M'] = IF(titanic_csv['Sex'] == 'male', 1, 0)
# Renamed M to IsMale in titanic_csv
titanic_csv.rename(columns={"M": "IsMale"}, inplace=True)
This should feel familiar to anyone with basic Excel background.
Pivot tables – Create and edit with ease
One of the easiest ways to summarize data quickly is through pivot tables. In Mito, creating a pivot table creates a new Pandas DataFrame which you can then further modify (e.g., sort).
The best way to explain the concept is through a demonstration – the one that follows creates a DataFrame containing the number of survived passengers by the embarked point:

Here’s the code generated by the previous operation:
unused_columns = titanic_csv.columns.difference(set(['Embarked']).union(set([])).union(set({'Survived'})))
tmp_df = titanic_csv.drop(unused_columns, axis=1)
pivot_table = tmp_df.pivot_table(
index=['Embarked'],
values=['Survived'],
aggfunc={'Survived': ['sum']}
)
# Flatten the column headers
pivot_table.columns = [make_valid_header(col) for col in pivot_table.columns.values]
# Reset the column name and the indexes
df2 = pivot_table.rename_axis(None, axis=1).reset_index()
Easy, right? As mentioned before, you can modify the pivot table after creation. Here’s how to sort it and rename a column:

The previous operation generated the following code:
# Sorted Survived_sum in df2 in descending order
df2 = df2.sort_values(by='Survived_sum', ascending=False, na_position='first')
df2 = df2.reset_index(drop=True)
# Renamed Survived_sum to Total_survived in df2
df2.rename(columns={"Survived_sum": "Total_survived"}, inplace=True)
Finally, let’s cover data visualization.
Graphing – Interactivity included
I like to inspect data visually, but I’m not the biggest fan of writing visualization code. As you would assume, Mito has you covered.
All you have to do is click on the Graph option, select the visualization type and select the columns for X and Y axes – the library covers everything else.
Here’s how to draw a boxplot of the Age
column:

It looks like Plotly is used behind the scenes, so visualizations are interactive by default. Neat!
Final words
And that does it for the top five new/upgraded features in the most recent Mito release. The official documentation isn’t updated yet – judging by the old design – but it’s just a matter of time until it is.
The question remains – should you use Mito?
My answer is the same as in the previous article. As a data scientist, I don’t see why you shouldn’t, especially if you’re skilled in Excel and want to get started with Python and Pandas. Mito can make the transition process that much easier.
To conclude – give Mito a try. It’s free, and you have nothing to lose. I’d love to hear your opinion on the library in the comment section below.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.
Learn More
- Top 5 Books to Learn Data Science in 2021
- How to Schedule Python Scripts With Cron – The Only Guide You’ll Ever Need
- Dask Delayed – How to Parallelize Your Python Code With Ease
- How to Create PDF Reports With Python – The Essential Guide
- Become a Data Scientist in 2021 Even Without a College Degree
Stay Connected
- Follow me on Medium for more stories like this
- Sign up for my newsletter
- Connect on LinkedIn