The world’s leading publication for data science, AI, and ML professionals.

Data Analysis and Visualization Just Got Better – Mito adds 5 Features You Should Try

No-code data analysis and visualization – made possible with this GUI for Pandas.

Photo by Mitchell Y on Unsplash
Photo by Mitchell Y on Unsplash

Disclaimer: This is not a sponsored article. I don’t have any affiliation with Mito or the creators of the library. The article shows an unbiased overview of the library, intending to make Data Science tools accessible to the broader masses.

Trying to keep up with the most recent data science libraries is like trying to read with your eyes closed. Nothing new can gain traction without solving a specific problem really well. That’s where Mito will grab your attention.

I’ve already written about Mito a couple of months back, but the library got some new features and updated the existing ones since then. These will be covered today.

First things first, let’s install Mito on your local machine.


Get started – Install Mito locally

The Mito package has two prerequisites:

Assuming you have both installed, I’ll continue by creating and activating a new virtual environment with Anaconda:

conda create --name mito_env python=3.8
conda activate mito_env

And then by installing dependencies:

python -m pip install mitoinstaller
python -m mitoinstaller install

Once done, you can launch Jupyter Lab:

jupyter lab

Create a new notebook and you’re ready to proceed!


New design – 90’s are over

I love when a tool does its job well. But I won’t use it if it looks like shit. That might not be the case for you, but modern design beats unnecessary borders, shadows, and other Windows-XP-looking software every day of the week.

The new Mito version comes with an updated design that you’ll see in a second. However, it’s not technically a new feature, so I’m not counting it into the list.

First, let’s create a new Mito sheet. Execute the following code to do so, assuming you have a blank notebook open:

import mitosheet
mitosheet.sheet()

You should see something similar:

Image 1 - Mito sign up screen (image by author)
Image 1 – Mito sign up screen (image by author)

You’ll have to enter your email to continue, but they won’t bother you with too many emails. Even if they do, there’s always an option to unsubscribe.

Once done, you’ll see a blank sheet in the notebook:

Image 2 - Blank Mito sheet (image by author)
Image 2 – Blank Mito sheet (image by author)

Yep – Mito definitely looks better than before, but that’s not why you’re reading this. Let’s continue with the first improved feature – easier data management.


Easier data management – Connecting to a local file system

One thing I dislike about Pandas is guessing how many times I have to write ../ to get at the correct data folder. That’s not the case with Mito.

Mito can now connect directly to your local file system, making dataset loading and management that much easier. We’ll use the Titanic dataset through the article, so make sure to download it if you’re following along.

You’ll see the option to import files after creating a new Mito sheet, like shown below:

Image 3 - Import files option (image by author)
Image 3 – Import files option (image by author)

You can select your dataset and hit the import button down below. That will instantly load the dataset:

Image 4 - Titanic dataset in Mito (image by author)
Image 4 – Titanic dataset in Mito (image by author)

The library will automatically generate the Python code for you in the cell below. Here’s how it looks like for now:

Image 5 - Generated Python code (image by author)
Image 5 – Generated Python code (image by author)

Neat, isn’t it? Let’s see how to calculate summary statistics next.


Summary statistics – One click away

Calculating summary statistics – like mean, median, quartiles, and so on – would typically imply calling a describe() function on every column, not counting for graphical representation of the variable.

Mito does it in a single click.

Just click on a column of interest and explore the Summary Stats tab on the right side. It visualizes the data with the most appropriate chart type and tells you everything the describe() function would:

Image 6 - Summary statistics with Mito (image by author)
Image 6 – Summary statistics with Mito (image by author)

Needless to say, but exploring data this way is a must for any first encounter with the dataset.


Change data types – Just select from a dropdown

Data isn’t always formatted correctly by default. To solve this problem, you can either change the data type or create a derived column. Mito does both with ease.

You can click on the little icon in the header column to open properties and change the data type from there:

Image 7 - Changing data types with Mito (image by author)
Image 7 – Changing data types with Mito (image by author)

For anything more complex, you’re better of creating a derived column. The example below shows you how to convert the Sex attribute into a binary column, where males have a value of 1:

Image 8 - Creating derived columns with Mito (image by author)
Image 8 – Creating derived columns with Mito (image by author)

The previous operation generates the following Python code:

# Set M in titanic_csv to =IF(Sex == 'male', 1, 0)
titanic_csv['M'] = IF(titanic_csv['Sex'] == 'male', 1, 0)
# Renamed M to IsMale in titanic_csv
titanic_csv.rename(columns={"M": "IsMale"}, inplace=True)

This should feel familiar to anyone with basic Excel background.


Pivot tables – Create and edit with ease

One of the easiest ways to summarize data quickly is through pivot tables. In Mito, creating a pivot table creates a new Pandas DataFrame which you can then further modify (e.g., sort).

The best way to explain the concept is through a demonstration – the one that follows creates a DataFrame containing the number of survived passengers by the embarked point:

Image 9 - Pivot tables with Mito (image by author)
Image 9 – Pivot tables with Mito (image by author)

Here’s the code generated by the previous operation:

unused_columns = titanic_csv.columns.difference(set(['Embarked']).union(set([])).union(set({'Survived'})))
tmp_df = titanic_csv.drop(unused_columns, axis=1)
pivot_table = tmp_df.pivot_table(
    index=['Embarked'],
    values=['Survived'],
    aggfunc={'Survived': ['sum']}
)
# Flatten the column headers
pivot_table.columns = [make_valid_header(col) for col in pivot_table.columns.values]
# Reset the column name and the indexes
df2 = pivot_table.rename_axis(None, axis=1).reset_index()

Easy, right? As mentioned before, you can modify the pivot table after creation. Here’s how to sort it and rename a column:

Image 10 - Pivot tables with Mito (2) (image by author)
Image 10 – Pivot tables with Mito (2) (image by author)

The previous operation generated the following code:

# Sorted Survived_sum in df2 in descending order
df2 = df2.sort_values(by='Survived_sum', ascending=False, na_position='first')
df2 = df2.reset_index(drop=True)
# Renamed Survived_sum to Total_survived in df2
df2.rename(columns={"Survived_sum": "Total_survived"}, inplace=True)

Finally, let’s cover data visualization.


Graphing – Interactivity included

I like to inspect data visually, but I’m not the biggest fan of writing visualization code. As you would assume, Mito has you covered.

All you have to do is click on the Graph option, select the visualization type and select the columns for X and Y axes – the library covers everything else.

Here’s how to draw a boxplot of the Age column:

Image 11 - Data visualization with Mito (image by author)
Image 11 – Data visualization with Mito (image by author)

It looks like Plotly is used behind the scenes, so visualizations are interactive by default. Neat!


Final words

And that does it for the top five new/upgraded features in the most recent Mito release. The official documentation isn’t updated yet – judging by the old design – but it’s just a matter of time until it is.

The question remains – should you use Mito?

My answer is the same as in the previous article. As a data scientist, I don’t see why you shouldn’t, especially if you’re skilled in Excel and want to get started with Python and Pandas. Mito can make the transition process that much easier.

To conclude – give Mito a try. It’s free, and you have nothing to lose. I’d love to hear your opinion on the library in the comment section below.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Learn More

Stay Connected


Related Articles