
Relational database management systems (RDBMS) are ubiquitous.
They provide an easy way to store vast quantities of information. Furthermore, with the aid of SQL, accessing, maintaining, and modifying the data stored in them is all too easy.
That being said, such an arrangement comes with certain caveats. If you wish to draw insights from the information stored in databases, you are limited to the commands provided by SQL.
While SQL allows you to conduct joins or utilize aggregation functions, it does not provide the means to perform more advanced techniques such as conducting statistical tests or building predictive models. If you wish to carry out such operations, you’ll need the aid of Python.
This begs the question: how do you implement Python code on databases that respond to SQL queries?
Database Adapter
To access databases in Python, you’ll need to use a database adapter.
Python offers database adapters through its modules that allow access to major databases such as MySQL, PostgreSQL, SQL Server, and SQLite.
Furthermore, all of these modules rely on Python’s database API (DB-API) for managing databases. As a result, the code used for managing databases is consistent across all database adapters.
Once you are able to successfully communicate with one relational database, you should be able to communicate with them all.
Leveraging database adapters requires an understanding of the key objects and methods that will be used to facilitate interaction with the database in question.
Case Study
A simple demonstration can showcase the features of a database adapter.
Since I use PostgreSQL, I’ll need to rely on the psycopg2 module.
The study will use a database called "test_db". This database contains a table called "Person", which contains fake data generated with Mockaroo.
Here’s a preview of the data:

- Connecting to the database
First, we need to establish a connection to a database so that we can communicate with it using Python. We can achieve this by creating a connection object.
When creating a connection object, you provide it with the information needed to locate and access the database. This includes the host, username, password, and database name.
2. Writing queries to the database from Python
Now that we have connected to the database, we should be able to write queries to the database directly from Python.
Although we are on Python now, we’ll need to continue using SQL queries to retrieve any information.
Before writing any query, we need to create a cursor object.
The cursor object uses the execute method to carry out the given query. The resulting output of the query can be retrieved with the fetchall method.
Let’s use this procedure to select the first 5 rows in the Person table.

As shown in the output, each row is presented as a tuple.
To obtain the column names of the table, you can use the description attribute on the cursor object.

3. Performing advanced analysis with the database
Although we can write SQL queries to databases from Python, there’s little point in using an adapter if that’s all we can do.
Let’s perform an operation with the data in the "Person" table that we wouldn’t be able to do with SQL queries alone.
To do this, we should first store the table in Python as a pandas data frame.

Now, we have carte blanche to do whatever we want with the data.
As an example, let’s examine the average income of the men and women in this dataset.

We can see that women, on average, earn a higher income than men. However, there is no indication that this difference is statistically significant.
We can utilize Python’s SciPy module to conduct a t-test to see if there is evidence that men and women don’t earn the same amount of income.

Thanks to Python’s utility, we are able to learn so much more from our data! We now know that there is a statistically significant gap in income between men and women.
Of course, since the data is fake, any analysis we conduct is about as useful as a sauna in a desert, but you get the idea.
4. Modifying the database directly from Python
There may come a need to create, drop, or modify tables in the database during the study. Such operations can also be done from Python.
In our case, let’s say that instead of having 2 separate columns with the first name and last name, we only want one column storing the full name.
We can accomplish this with the following steps:
- Create a new column called "name"
- Add values to the "name" column by concatenating the values in the "first_name" and "last_name" columns
- Drop the "first_name" and "last_name" columns
We can carry out these steps with the following code:
Note: A very important inclusion in the code is the commit method in line 12. Any modification made to the database in Python is not automatically committed to the database. So, this method must be called for the change to be put into effect.
You can see the applied changes when you go back to the database.

A very helpful feature of database adapters is that they allow you to undo any unwanted modifications (as long as they are not committed).
With the rollback method, you can revert your database to its state at the last commit.
5. Terminating the connection to the database
Once you’ve finished communicating with the database, you can end your connection to the database with the close method.
Conclusion

SQL and Python come with their own unique sets of benefits and drawbacks.
Thankfully, with database adapters, you get to leverage both tools in your studies and get the best of both worlds.
For a more thorough understanding of what you can do with a specific database adapter, I suggest you spend some time reading its documentation to learn more about the tools that are at your disposal.
I wish you the best of luck in your Data Science endeavors!