The No-Code Pandas Alternative That Data Scientists Have Been Waiting For

A step towards simplifying data analysis for all

Avi Chawla
Towards Data Science

--

Photo by Robert Anasch on Unsplash

Motivation

Story-telling is immensely critical to the workflow of all data science projects.

In this regard, drawing valuable insights from data is a fundamental skill every organization looks for in a data scientist.

Thankfully, over the past few years, developers across the globe have profoundly contributed towards developing reliable and sophisticated tools that make a data scientist’s job relatively easier.

The most popular open-source tools for Python include Pandas, NumPy, Matplotlib, Seaborn, and many more.

Essentially, these tools allow the users to perform various data analysis operations using coded instructions.

While their immense utility makes them almost indispensable today to the workflow of a data science project, I believe that:

→ #1 Beginners without prior experience often get overwhelmed in an attempt to get hold of these tools.

→ #2 What’s even more concerning is that Experts spend a considerable amount of time and energy daily writing the same code repeatedly to perform data analysis across different projects.

  • To get some perspective here, try remembering the number of times you have explicitly written df.sort_values(), pd.merge(), df.value_counts(), or created different scatter plots by writing the same code over and over.
  • In simple words, redundancy is more frequent than you think, which inhibits work output.

Hence, both groups particularly look for time-saving, no-code, and GUI-based tools that:

  1. Have extremely low entry barriers for beginners.
  2. Help experts eradicate redundant work and do what matters to them.

One may argue that Excel can be a potential option in such cases. I partly agree with that, as the biggest issue with Excel is its max row limit. This inhibits working on projects involving data analytics at scale.

To this end, what I am especially interested in discussing in this blog is a potential no-code assistive tool for data analysis using Pandas, called Gigasheet.

To make tabular data analysis relatively easier, I will perform 15 typical operations in Pandas and demonstrate how you can do them with just a few clicks of a button using Gigasheet.

Let’s begin 🚀!

Prerequisites

To use Pandas, you should import the library first. This is shown below:

To use Gigasheet, you should have a Gigasheet account, and everything comes pre-installed.

Dataset

I will use a self-created dataset of 300K rows and nine columns for this blog. The first five rows are shown below:

Five five rows of the Dataset (Image by Author)

#1 Reading a CSV

Pandas

You can use the pd.read_csv() method to read a CSV file and create a Pandas DataFrame.

Gigasheet

Reading a CSV is pretty simple here too. Just upload the CSV file, and you are good to go.

Reading a CSV (Gif by Author)

You can also upload other file formats such as JSON, XLSX, TSV, GZIP, and many more.

Alternatively, you can leverage data connectors such as Amazon S3, Google Drive, Dropbox, etc., to upload your dataset. This saves time in uploading the file from the local machine.

#2 Dimensions of the DataFrame

Pandas

If you want to print the shape of the DataFrame (number of rows and columns), you can use the shape attribute of the DataFrame.

Gigasheet

Here, the shape is displayed once you upload the file.

Dimensions of the dataset (Image by Author)

Note: It counts one extra column that accounts for the index.

#3 Viewing Top N Rows

Typically, in real-world datasets, you will have many rows to deal with.

In such situations, one is usually interested in viewing just the first n rows of the DataFrame.

Pandas

You can use the df.head(n) method to print the first n rows:

Gigasheet

Once you open the sheet, it shows the top 100 rows by default. This gives you a quick glimpse into the dataset.

Viewing top rows of the DataFrame (Gif by Author)

#4 Viewing the Datatype of a column

Pandas

You can view the datatype of a column with the dtypes argument.

Gigasheet

To view the datatype of a column, click on the specific column header and select “change data type.”

The datatype appears as highlighted text, “Plain Text” in this case for the Company_Name column.

Viewing the datatype of column (Gif by Author)

#5 Modifying the Datatype of a column

Pandas

To change the datatype of a column, you can use the astype() method as follows:

Gigasheet

To change the datatype of a column, click on the specific column header and select “change data type.”

Changing the datatype of column (Gif by Author)

As you may have noticed, the modification is not inplace. Simply put, it automatically creates a new column with the desired data type and hides the original column for future reference.

#6 Deleting a Column

Pandas

If you want to delete a column, use the df.drop() method:

Gigasheet

There are two ways to delete a column from the workspace.

The first approach is temporarily hiding the columns from the sidebar on the right.

Deleting a column (Gif by Author)

The second method is to delete the column permanently. To achieve this, click on the specific column header and select “Delete.”

Deleting a column (Gif by Author)

#7 Printing Descriptive Info about the DataFrame

Pandas

df.info() anddf.describe() are two popularly used methods to generate statistical information about a DataFrame.

Gigasheet

You can view the above information using various aggregations available at the bottom of the sheet.

Printing descriptive statistics of a column (Gif by Author)

#8 Sorting a DataFrame

Pandas

You can use the df.sort_values() method to sort a DataFrame.

Gigasheet

Sorting a DataFrame (Gif by Author)

#9 Renaming Column(s)

Pandas

If you want to rename the column headers, use the df.rename() method, as demonstrated below:

Gigasheet

To change the name of a column, click on the specific column header and select “Rename.”

Renaming a column (Gif by Author)

#10 Filtering a DataFrame

Pandas

There are various ways to filter a DataFrame. These include Boolean filtering, selecting a column, Selecting by Label, Selecting by Position, etc.

Gigasheet

To filter a DataFrame, head over to the “Filter” tab. Select the column and specify the condition you want to filter on.

Filtering the DataFrame based on a condition (Gif by Author)

Additionally, it shows the number of rows after filtering at the bottom of the sheet.

#11 Split a Column

Pandas

If you want to split a column into multiple columns (say Name to First_Name and Last_Name), you can use thesplit() method for a string column.

Gigasheet

To split a column, head over to “Tools” → “Columns” → “Split.”

Splitting columns (Gif by Author)

#12 Grouping a DataFrame

Pandas

You can use the groupby() method in Pandas to group a DataFrame and perform aggregations:

Gigasheet

To group the DataFrame, head over to the “Group” button in the top bar.

After grouping, you can perform all sorts of common aggregations here.

Grouping the DataFrame (Gif by Author)

#13 Adding New Column

Pandas

You can use the assignment operator to add a new column:

Gigasheet

Here, you can head over to “Insert” → “Calculations” and perform the above operation as shown below:

Adding New Column to the DataFrame (Gif by Author)

#14 Merging DataFrame

Pandas

If you want to merge two DataFrames with a joining key, use the pd.merge() method:

Gigasheet

To demonstrate this, I will merge the following CSV file. The merge column is Employment_Status.

The steps are demonstrated below. We will use the “Cross File VLOOKUP” tool to merge dataframes.

Merging two DataFrames (Gif by Author)

#15 Storing a DataFrame to a CSV

Pandas

You can use the df.to_csv() method to dump a DataFrame to a CSV, as shown below:

Gigasheet

The steps to save the DataFrame are shown below (File → Export).

Storing the DataFrame (Gif by Author)

Conclusion

In this blog, I demonstrated how you can leverage Gigasheet to perform the 15 most common Pandas operations without writing any code.

I am a big fan of no-code solutions. In my opinion, they are literally game-changers when it comes to eliminating redundant work, thereby making life easier.

Of course, I agree that coded solutions offer customization (and much more), which is one of its most significant benefits. Thus, to reiterate, I am not claiming that Gigasheet is (or will be) the ultimate replacement for Pandas.

However, as per my experience, I believe that Gigasheet is extremely handy for beginners as it lowers the barriers to starting with fundamental operations in data science.

This blog will help beginners to learn how to back reference operations in Gigasheet to Pandas.

At the same time, this blog can also help experts in the field to translate common Pandas operations to Gigasheet. This will help them work faster and effortlessly by avoiding the redundancy of writing the same code repeatedly.

Another potential set of users that can take advantage of Gigasheet is Excel users. One may argue that most of the operations demonstrated in this blog can be easily performed in Excel.

However, the biggest issue with Excel is its max row limit. This inhibits working on large-scale data analytics projects, which Excel does not support.

To conclude, while Gigasheet is not yet in the realm of killing off Pandas (or Excel), the trajectory certainly exists. I’m eager to see how they continue!

As always, thanks for reading! I would love to read your responses :)

--

--

👉 Get a Free Data Science PDF (550+ pages) with 320+ tips by subscribing to my daily newsletter today: https://bit.ly/DailyDS.