The world’s leading publication for data science, AI, and ML professionals.

How to Learn Data Science Interactively with Python?

A JupyterLab extension generates code on the fly while you work on your analysis.

Photo by Jamie Fenn on Unsplash
Photo by Jamie Fenn on Unsplash

As the need for people to be data literate grows, Python’s popularity is growing with it. One of the annoyances of Python, regardless of if you are a Ph.D. Data Scientist or just starting to learn, is that the syntax can take a long time to get right. Often people spend much of their Analysis going to Stack Overflow or Google to look up the correct code.

Is there a better way?

In case you’ve missed my previous articles about Mito have a look here.


Mito generates code on the fly while you work on your analysis

Photo by Alex Knight on Unsplash
Photo by Alex Knight on Unsplash

Mito is a JupyterLab extension that enables exploring and transforming datasets with the ease of Excel.

Mito cuts down "search for the right code snippet" time significantly with its automatic code generation.

With Mito, you call the spreadsheet interface into your Python environment and each edit you make generates the equivalent Python in the code cell below.

Each click operation generates a code snippet on the fly (Visualization by author)
Each click operation generates a code snippet on the fly (Visualization by author)

Most Python users know how they want to manipulate their data, whether it be:

  • merging datasets together
  • filling missing values
  • creating charts
  • adding columns, etc.

Mito allows the users to complete tasks like this without needing to know the perfect code to do it. The user completes the manipulations in Mito, and Mito will spit out the code for you.

Mito autogenerates the code on the fly (Image by author)
Mito autogenerates the code on the fly (Image by author)

Installing Mito

Mito can be installed with these two commands:

python -m pip install mitoinstaller
python -m mitoinstaller install

Once you open the Jupyter notebook, render a Mitosheet:

import mitosheet
mitosheet.sheet()

Here are the full install instructions.

Mito supports pandas DataFrames

To get data into Mito, you can either pass in a DataFrame as an argument,

mitosheet.sheet(df)

or you can use the import option to select a dataset from your local files.

Importing a dataset into Mito (image by author)
Importing a dataset into Mito (image by author)

Mito Functionalities

Photo by Patrick on Unsplash
Photo by Patrick on Unsplash

Mito’s functionalities focus on data cleaning, data manipulation/exploration, and data visualization. Mito’s data cleaning functionality includes:

  • Filling missing values
  • Removing rows
  • Changing column data types
  • Editing specific values
  • Deleting columns
  • Renaming columns
  • and more

Mito takes the ease of Excel and applies those features to Python and pandas DataFrames.

For example, if you need to change a specific value in your DataFrame, you simply edit the value directly in the Mitosheet.

Interactive editing of DataFrames values in the Mito spreadsheet (image by author)
Interactive editing of DataFrames values in the Mito spreadsheet (image by author)

The generated code for this operation looks like this:

ramen_ratings_csv.at[10, 'type'] = "Nissin"

Exploratory Data Analysis

Photo by Franki Chamaki on Unsplash
Photo by Franki Chamaki on Unsplash

Data Exploration and manipulation are some of the most prominent features of Mito. These include:

  • Joining/Merging datasets
  • Creating pivot tables
  • Filtering datasets
  • Sorting columns
  • Looking at summary statistics
  • and more
Creating a pivot table with Mito (image by author)
Creating a pivot table with Mito (image by author)

Visualizations

Mito also integrates with Plotly, an open-source graphing library that allows for interactive charts. Plotly charts allow for features like zooming in on portions of the visualization, exporting to PNG and more.

Mito allows the user to generate these charts without having to write the code for them. Once the visualization has been configured, the user can export the equivalent code by clicking the "Copy Graph Code" button.

The generated code looks like this:

# Import plotly and create a figure
import plotly.graph_objects as go
fig = go.Figure()# Add the bar chart traces to the graph
for column_header in ['country']:
    fig.add_trace(
        go.Bar( 
            x=df2[column_header],
            y=df2['type count Movie'],
            name=column_header
        )
    )# Update the title and stacking mode of the graph
# See Plotly documentation for customizations: https://plotly.com/python/reference/bar/
fig.update_layout(
    xaxis_title='country',
    yaxis_title='type count Movie',
    title='country, type count Movie bar chart',
    barmode='group',
)
fig.show(renderer="iframe")

Conclusion

Photo by Jason Blackeye on Unsplash
Photo by Jason Blackeye on Unsplash

Mito is a powerful Jupyter Lab extension for Exploratory Data Analysis and quickly generating Python code. Its limitations lie in that it does not cover all Data Science functionalities yet. Mito still needs to add more comprehensive visualization options, as well as more statistical and modeling features, unless they want to remain an exploratory data tool. But it’s an extension worth trying.


Before you go

If you enjoy reading these stories, why not become a Medium paying member? It is $5 per month, and you will get unlimited access to 10000s of stories and writers. If you sign up using my link, I will earn a small commission.

Photo by Kelly Sikkema on Unsplash
Photo by Kelly Sikkema on Unsplash

Related Articles