
Introduction
Learning something from scratch has never been an easy task. One of the reasons was that we don’t know what we don’t know. The same went for me when I first learned Python. I was frustrated with its logic, syntaxes, etc, because it was so difficult to memorize those. In the end, I discovered my own approach of learning Python, which was through SQL. As SQL is my frequently-used tool in everyday jobs, I tried to apply everything I learned in SQL to Python. This way of learning helped me research, practice and memorize at the same time.
In this article, I’ll show you some of the important SQL queries and their equivalents in Python, Pandas to be specific. This is considered as an instruction to those who want to learn Python using the same approach as I did. Hopefully, it can be a useful cheat sheet for those who want to pick up Pandas.
Data
I will create a simple dataset as below:
data = {'product_name': ['pencil', 'eraser', 'pen', 'desk', 'laptop'],
'price': [2, 1, 1, 50, 678],
'owner': ['Chi','Joe','Dan','Lisa','Rose']}
df = pd.DataFrame(data)

Now, let’s get started!
Select
Here are some simple select
statements from SQL and its equivalent commands in Python.
You can easily select all columns and rows by calling the dataset’s name (df in my example).
In another case, you only need to extract a specific column from the data, you can consider a few simple approaches such as slicing and indexing using loc
& iloc.
In my example below, I showed you three different ways to get column owner from df dataset.
If I want to see the unique products existing in my data? It’s very simple, just extracting the _product_name column from df_ dataset and apply unique()
function.
Select with conditions
There are several ways to select specific rows with conditions in Pandas. We can use Python’s slicing method, indexing, applying query function or lambda.
In the example below, I’ll show you some solutions to get desired data with one or more constraints to the column values. Usually, when working with numerical data in conditional argument, we have to deal with different comparisons (E.g: col_number > 2, total_people ≥ 10, etc). For your reference, in Python, comparison operators on numerical data are described as below:
- Equal to
==
- Not equal to
!=
- Greater than
>
- Less than
<
- Greater than or equal to
>=
- Less than or equal to
<=
Additionally, in case you want to extract rows whose specific category equals to a string or object, you should specify it with ==
operator. My circumstance with selecting all rows whose _product_name_ is pen is demonstrated in the table below. Let’s take a look.
Select rows whose values are similar to a specific pattern
Imagine you have thousands of products listing in a data, but you only want to select data whose product name is related to pen. Using contains()
function is a great choice in this case.
Sometime, when you know exactly the starting and ending characters of your desired category, you can specify it with the help of function startswith()
or endswith()
.
Select rows whose values are in a determined set
If you want to select rows whose specific column values are within a range or in a set. isin()
command can help you to do it. For example, I want to select data whose _product_name_ are pen or pencil. I will call the command as below:
Calculating functions
Similar to Sql, Python also provides different functions to calculate aggregation or produce descriptive summary.
In my example, I can easily find the total number of products, total prices of all the products, the range of product prices (i.e: max, min, median, etc) with just a line of code.
Group by
Group by is simple. Just use groupby()
function on the column you want to group. For instance, from my df data, in order to calculate the money each person has paid for his/her products, I group data by each person’s name and then calculate the corresponding total amount of product price.
Sort values
Basically, sort_values
has default of sorting values in ascending order. Therefore, if you want to order value in descending order, just state ascending = 'FALSE'
in sort_values()
argument.
In order to see who paid the most money, I calculate the total money each person paid with groupby()
and sum()
functions, and then sort the sum of price in a descending order.
Join
I will create another data frame named df2 to describe join
command.
data1 = {'class': ['A','A','C','B','E'],
'teacher':['Mona','Mila','Laurel','Wes','Connor'],
'owner': ['Chi','Joe','Dan','Lisa','Rose']}
df2 = pd.DataFrame(data1)

The basic JOIN includes left join, right join, outer join and inner join. In different cases depending on our need, we can select the suitable kind of join.
Let’s see how we can transform our data with four types of join.
Conclusion
By connecting with something you know well, I think you can learn and memorize new things better. I hope these tips can act as your useful cheat sheet in the future when you start to dig in data manipulation in Python. Good luck!
In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.