The world’s leading publication for data science, AI, and ML professionals.

Guide To SQL And Its Equivalent Commands In Python

Quick but detailed cheat sheet for those who want to convert their SQL code to its equivalent command in Python!

Photo by Lopez Robin on Unsplash
Photo by Lopez Robin on Unsplash

Introduction

Learning something from scratch has never been an easy task. One of the reasons was that we don’t know what we don’t know. The same went for me when I first learned Python. I was frustrated with its logic, syntaxes, etc, because it was so difficult to memorize those. In the end, I discovered my own approach of learning Python, which was through SQL. As SQL is my frequently-used tool in everyday jobs, I tried to apply everything I learned in SQL to Python. This way of learning helped me research, practice and memorize at the same time.

In this article, I’ll show you some of the important SQL queries and their equivalents in Python, Pandas to be specific. This is considered as an instruction to those who want to learn Python using the same approach as I did. Hopefully, it can be a useful cheat sheet for those who want to pick up Pandas.

Data

I will create a simple dataset as below:

data = {'product_name': ['pencil', 'eraser', 'pen', 'desk', 'laptop'],
        'price': [2, 1, 1, 50, 678], 
        'owner': ['Chi','Joe','Dan','Lisa','Rose']}
df = pd.DataFrame(data) 
Figure 1: df Dataset
Figure 1: df Dataset

Now, let’s get started!

Select

Here are some simple select statements from SQL and its equivalent commands in Python.

You can easily select all columns and rows by calling the dataset’s name (df in my example).

In another case, you only need to extract a specific column from the data, you can consider a few simple approaches such as slicing and indexing using loc & iloc.In my example below, I showed you three different ways to get column owner from df dataset.

If I want to see the unique products existing in my data? It’s very simple, just extracting the _product_name column from df_ dataset and apply unique() function.

SQL Pandas
Select * from df df
Select owner from df df[['owner']]
df.iloc[ : ,2]
df.loc[ : ,['owner']]
Select * from df limit 2 df.head(2)
Select distinct product_name from df df.product_name.unique()
view raw select.md hosted with ❤ by GitHub

Select with conditions

There are several ways to select specific rows with conditions in Pandas. We can use Python’s slicing method, indexing, applying query function or lambda.

In the example below, I’ll show you some solutions to get desired data with one or more constraints to the column values. Usually, when working with numerical data in conditional argument, we have to deal with different comparisons (E.g: col_number > 2, total_people ≥ 10, etc). For your reference, in Python, comparison operators on numerical data are described as below:

  • Equal to==
  • Not equal to!=
  • Greater than>
  • Less than<
  • Greater than or equal to>=
  • Less than or equal to<=

Additionally, in case you want to extract rows whose specific category equals to a string or object, you should specify it with == operator. My circumstance with selecting all rows whose _product_name_ is pen is demonstrated in the table below. Let’s take a look.

SQL Pandas
Select * from df where product_name = "pen" df[df['product_name'] == 'pen']
df.loc[df['product_name']== 'pen']
df.query('product_name == "pen"')
df[df.apply(lambda x: x['product_name'] == 'pen', axis=1)]
Select * from df where product_name = "pen" and price = 2 df[(df['product_name'] == 'pen') & (df['price']==2)]
df.loc[(df['product_name']== 'pen') & (df['price'] == 2)]
df.query('product_name == "pen" and price ==2')
df[df.apply(lambda x: x['product_name'] == 'pen' and x['price'] == 2, axis=1)]
Select owner, product name from df where product_name = "pen" df[df['product_name'] == 'pen'][['owner','product_name']]
df.loc[df['product_name']== 'pen', ['owner','product_name']]

Select rows whose values are similar to a specific pattern

Imagine you have thousands of products listing in a data, but you only want to select data whose product name is related to pen. Using contains() function is a great choice in this case.

Sometime, when you know exactly the starting and ending characters of your desired category, you can specify it with the help of function startswith() or endswith() .

SQL Pandas
Select * from df where product_name like "pen%" df[df['product_name'].str.startswith('pen')]
df[df['product_name'].str.contains('pen')]
Select * from df where product_name like "%pen" df[df['product_name'].str.endswith('pen')]
df[df['product_name'].str.contains('pen')]

Select rows whose values are in a determined set

If you want to select rows whose specific column values are within a range or in a set. isin() command can help you to do it. For example, I want to select data whose _product_name_ are pen or pencil. I will call the command as below:

SQL Pandas
Select * from df where product_name in ('pen','pencil') df[df['product_name'].isin(['pen','pencil'])]
Select * from df where product_name not in ('pen','pencil') df[~df['product_name'].isin(['pen','pencil'])]

Calculating functions

Similar to Sql, Python also provides different functions to calculate aggregation or produce descriptive summary.

In my example, I can easily find the total number of products, total prices of all the products, the range of product prices (i.e: max, min, median, etc) with just a line of code.

SQL Pandas
Select count(product_name) from df df['product_name'].count()
Select sum(price) from df df['product_name'].sum()
Select min(price) from df df['product_name'].min()
Select max(price) from df df['product_name'].max()
Select count(price), count(distinct price), sum(price), min(price), max(price) from df df.agg({'price': ['count','nunique','sum','min', 'max']})

Group by

Group by is simple. Just use groupby() function on the column you want to group. For instance, from my df data, in order to calculate the money each person has paid for his/her products, I group data by each person’s name and then calculate the corresponding total amount of product price.

SQL Pandas
Select owner, sum(price) from df group by sum(price) df.groupby('owner').agg({'price':'sum'}).reset_index()
Select owner, product_name, sum(price) from df group by owner, product_name df.groupby(['owner','product_name']).agg({'price':'sum'}).reset_index()

Sort values

Basically, sort_values has default of sorting values in ascending order. Therefore, if you want to order value in descending order, just state ascending = 'FALSE' in sort_values() argument.

In order to see who paid the most money, I calculate the total money each person paid with groupby()and sum() functions, and then sort the sum of price in a descending order.

SQL Pandas
Select owner, sum(price) from df group by sum(price) order by sum(price) df.groupby('owner').agg({'price':'sum'}).sort_values(by=['price']).reset_index()
Select owner,count(product_name), sum(price) from df group by owner order by sum(price), count(product_name) df.groupby(['owner']).agg({'price':'sum','product_name':'count'}).sort_values(by=['price','product_name']).reset_index()
Select owner, sum(price) from df group by sum(price) order by sum(price) desc df.groupby('owner').agg({'price':'sum'}).sort_values(by=['price'], ascending = 'FALSE').reset_index()
view raw sort.md hosted with ❤ by GitHub

Join

I will create another data frame named df2 to describe join command.

data1 = {'class': ['A','A','C','B','E'],
        'teacher':['Mona','Mila','Laurel','Wes','Connor'], 
        'owner': ['Chi','Joe','Dan','Lisa','Rose']}
df2 = pd.DataFrame(data1)
Figure 2: Data frame df2
Figure 2: Data frame df2

The basic JOIN includes left join, right join, outer join and inner join. In different cases depending on our need, we can select the suitable kind of join.

Let’s see how we can transform our data with four types of join.

SQL Pandas
Select df.owner, product_name, class, teacher from df left join df2 on df.owner = df2.owner pd.merge(df[['owner','product_name']], df2, how="left", on=["owner"])
Select df2.*, product_name, price from df right join df2 on df.owner = df2.owner pd.merge(df, df2, how="right", on=["owner"])
Select df2.*, product_name from df inner join df2 on df.owner = df2.owner pd.merge(df[['product_name','owner']], df2, how="inner", on=["owner"])
Select df2.*, product_name from df outer join df2 on df.owner = df2.owner pd.merge(df[['product_name','owner']], df2, how="outer", on=["owner"])
view raw join.md hosted with ❤ by GitHub

Conclusion

By connecting with something you know well, I think you can learn and memorize new things better. I hope these tips can act as your useful cheat sheet in the future when you start to dig in data manipulation in Python. Good luck!

In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.