Pandas Exercise for Data Scientists — Part 1

A set of challenging Pandas Questions

Avi Chawla
Towards Data Science

--

Photo by Olav Ahrens Røtne on Unsplash

Pandas library has always intrigued Data Scientists to do amazing things with it. It is undoubtedly the go-to tool for tabular data handling, manipulation, and processing.

Therefore, to scale your expertise, challenge your existing knowledge, and introduce you to numerous popular Pandas functions among Data Scientists, I am presenting Part 1 of the Pandas Exercise. The objective is to strengthen your logical muscle and to help internalize data manipulation with one of the best Python packages for data analysis.

Find the notebook with all questions for this quiz here: GitHub.

Table of Contents:

1. Sort DataFrame based on another list
2. Insert a column at a specific location in a DataFrame
3. Select columns based on the column’s Data Type
4. Count the number of Non-NaN cells for each column
5. Split DataFrame into equal parts
6. Reverse DataFrame row-wise or column-wise
7. Rearrange columns of a DataFrame
8. Get alternate rows of a DataFrame
9. Insert a row at an arbitrary position
10. Apply function to every cell of DataFrame

As an exercise, I recommend you attempt the questions yourself first and then look at the solution I have provided.

Note that the solutions I have provided here may not be the only way to solve the problem. You may come up with something different and still be correct. However, if that happens, do drop a comment, and I’ll be interested to know your approach.

Let’s begin!

1. Sort DataFrame based on another list

Prompt: You are given a DataFrame. Additionally, you also have a list that contains all the unique values of a particular column of the DataFrame. Sort the DataFrame such that the values in the column appear in the same order as they do in the given list.

Input and Expected Output:

Solution:

The idea here is to generate a series from the given list. Each index will denote the character, and the corresponding value will indicate the position. Using this, we can map the original DataFrame to the generated series and pass it to the sort_values() method for reference, as shown below:

P.S. We can also solve this using merge. Do let me know in the comments if you can figure that out.

2. Insert a column at a specific location in a DataFrame

Prompt: Assume that you again have a similar DataFrame as used above. Additionally, you are given a list whose size is the same as the number of rows in the given DataFrame. The task is to insert the given list as a new column at a given position of the DataFrame.

Input and Expected Output:

Solution:

Here, we can use the insert() method and pass the position, column_name, and the values as arguments as shown below:

3. Select columns based on the column’s Data Type

Prompt: We all are familiar with row-based filtering, aren’t we? Well, let’s try something else. Your task is to filter all the columns from a DataFrame whose entries adhere to a given data type.

Input and Expected Output:

Solution:

Here, we can use the select_dtypes() method and pass the data type we need to filter out as shown below:

4. Count the number of Non-NaN cells for each column

Prompt: Next, given a DataFrame (with NaN values in one or more columns), you need to print the number of Non-NaN cells for each column.

Input and Expected Output:

Solution:

Here, we can use the count() method to obtain the result: This is shown below:

5. Split DataFrame into equal parts

Prompt: Given a DataFrame, your task is to split the DataFrame into a given number of equal parts.

Input and Expected Output:

Solution:

Here, we will use NumPy's split() method and pass the number of parts as an argument, as shown below:

6. Reverse DataFrame row-wise or column-wise

Prompt: Next, consider that you have a DataFrame similar to the one we used above. Your task is to flip the entire DataFrame row-wise or column-wise.

Input and Expected Output:

Solution:

We can use the loc (or iloc) and specify the reverse indexing method using “::-1” as shown below:

7. Rearrange columns of a DataFrame

Prompt: In this exercise, you are given a DataFrame. Additionally, you have a list that specifies the order in which the columns should appear in the DataFrame. Given the list and the DataFrame, print the columns in the order specified in the list.

Input and Expected Output:

Solution:

Similar to above, we can use iloc to select all the rows and specify the order of columns given in the list as shown below:

8. Get alternate rows of a DataFrame

Prompt: Next, given a DataFrame, you need to print every alternate row starting from the first row of the DataFrame.

Input and Expected Output:

Solution:

This solution is also similar to the two above. Here, while defining the slicing part, we can specify the step of slicing as 2, which is shown below:

9. Insert a row at an arbitrary position

Prompt: Similar to earlier tasks, you are given the same DataFrame. Your task is to insert a given list at a specific index of the DataFrame and reassign the indexes.

Input and Expected Output:

Solution:

Given an insert position, first assign the new row to an index right between the given index and the one before that. This is what the assignment statement will do. Next, we sort the DataFrame on the index. Finally, we reassign the indexes to eliminate float-based index values.

10. Apply function to every cell of DataFrame

Prompt: Lastly, you need to apply a given function to the entire DataFrame. The given DataFrame consists of just integer values. The task is to increase each entry by 1 through a function.

Input and Expected Output:

Solution:

Instead of using the apply() method, here we shall use the applymap() method as shown below:

This brings us to the end of this quiz, and I hope you enjoyed attempting this. Let me know many you got correct. Also, if you didn’t notice, this entire quiz is available in a Jupyter Notebook which you can download from here.

Also, stick around as I intend to release many more practice exercises soon. Thanks for reading.

--

--

👉 Get a Free Data Science PDF (550+ pages) with 320+ tips by subscribing to my daily newsletter today: https://bit.ly/DailyDS.