Enhance Your Pandas Skills with SQLite

An easy trick of python’s built-in database, SQLite, to make your data manipulation more flexible and effortless.

James Ho
Towards Data Science

--

Pandas is a powerful Python package to wrangle your data. However, have you ever encountered some tasks that just make you think ‘if only I could use SQL query here!’? I personally found it particularly annoying when it comes to joining multiple tables and extracting only those columns you want in pandas. For example, you’d like to join 5 tables. You absolutely can do this with only one query in SQL. But in pandas, you have to do 4 times merge, a+b, (a+b)+c, ((a+b)+c)+d,….What’s worse, every time you merge, pandas will keep all columns, despite you probably only need one or two columns from the other table. If you have the same or similar issue as mine, you come to the right place.

This issue brought me to SQLite, a built-in, lightweight database for Python. Built-in means it comes with Python, you don’t have to run any more pip/conda install to install this package. How amazing is it?! Let’s see how this magic works. There are 2 ways to implement this, either use the built-in sqlite3 package, or you can also use sqlalchemy. I will illustrate them both in the following.

Simply import the package without any other installations (you might need to install sqlalchemy):

import sqlite3 as sl
import sqlalchemy as sa
from sqlalchemy import create_engine

Don’t worry about drivers or connection strings etc. You can create a connection as easy as:

# SQLite
sl_conn = sl.connect(‘{your db name}.db’)
# SQLAlchemy
engine = create_engine('sqlite:///PATH/TO/DB/{your db name}.db',echo=True)
sa_conn = engine.connect()

If the database does not exist, this will create the database automatically and then connect to it. Otherwise, it will connect to the database directly. How convenient is that?

Note that this will create a db file. In sqlalchemy’s create_engine, setting the echo attribute to True will print out the log. Set to False if you don’t want to see the whole log printed out.

Once you have created the connection, it’s even easier to interact with the database via pandas’s to_sql and read_sql function.

Say you’ve already had some pandas dataframe that you want to use SQL query on. The first step is to import that dataframe to the database we just created.

# sa_conn and sl_conn can be used interchangeably
df_1.to_sql(‘table_1’, sa_conn, if_exists='replace')
df_2.to_sql(‘table_2’, sl_conn, if_exists='replace')
df_3.to_sql(‘table_3’, sa_conn, if_exists='replace')

You can specify the table name in the database, it doesn’t have to be the same as the dataframe’s name. Use if_exists to specify what would you like to do if that table has already existed in the database. Options are ‘fail’, ‘replace’, ‘append’.

  • ‘fail’: Raise a ValueError. (Default)
  • ‘replace’: Drop the table before inserting new values.
  • ‘append’: Insert new values to the existing table.

Once having the tables in the database, you can start using SQL to query them.

query = ```
SELECT
*
FROM table_1
INNER JOIN table_2 ON table_1_key=table_2_key
INNER JOIN table_3 ON table_1_key=table_3_key
```
joined_table = pd.read_sql(query, sa_conn)

Define your query as a string first, use ``` if the query is lengthy and has multiple lines, and then use pandas’s functionread_sql to pull the data from the database and convert it into a pandas dataframe.

Interacting with the database with SQL client tools is a complete no-brainer too. You can use tools like DBeaver or DB Browser for SQLite (DB4S).

DBeaver

Open your DBeaver and choose SQLite as the database driver.

Browse to your DB file generated before and click finish.

DB4S

DB Browser for SQLite is no different. Click on the ‘Open Database’ then navigate to the DB file and open it.

Congratulations!! Now you know how to utilise Python’s built-in database tool, SQLite, to bring your pandas skills to another level! By integrating pandas and SQL queries, you’ve made your data processing skills more flexible and powerful.

--

--