The world’s leading publication for data science, AI, and ML professionals.

Connecting Python to Oracle, SQL Server, MySQL, and PostgreSQL

Learn how to connect Python to most commonly used databases

Databases. Often ignored to a degree by Data Scientists, at least by new ones. I get it, you have your Boston housing dataset stored in CSV file, so why even bother with databases?

Photo by panumas nikhomkhai from Pexels
Photo by panumas nikhomkhai from Pexels

I’ve also ignored them for a good part of my career, but then I needed to learn SQL and PL/SQL for my daily job.

If that sounds like you but you don’t have the time, maybe it’s best to leave it to the experts.

Nevertheless, SQL is a great skill to have – would even say a must for your CV, especially as data engineering is getting more traction and increases the barrier to entering year after year.


Oracle Connection

To connect to the Oracle database you will, of course, need the database installed on your machine. My machine has 12c version, so there are no guarantees everything will work on older or newer versions. To test everything I’ve unlocked the famous HR schema and set the password to hr.

Once you do so too, clicking on Connections will show you every detail you need to connect Python to your Oracle database instance:

Before jumping into code, you will need to install _cx_Oracle_ library through pip, like this:

pip install cx_oracle

The connection process is the lengthiest from the four I’ll cover today but is really simple if you read it line by line. I’ve chosen to fetch the first 10 rows from the famous employees table (note how Oracle uses ROWNUM, instead of TOP, or LIMIT):

import cx_Oracle
dsn = cx_Oracle.makedsn(
    'localhost', 
    '1521', 
    service_name='orcl'
)
conn = cx_Oracle.connect(
    user='hr', 
    password='hr', 
    dsn=dsn
)
c = conn.cursor()
c.execute('SELECT * FROM employees WHERE ROWNUM <= 10')
for row in c: print(row)
conn.close()

Executing this code block will output the following to your Notebook:

That wasn’t so hard, was it? the following ones will be easier, or shorter to write at least, I promise!


SQL Server

Alongside with Oracle Database, SQL Server by Microsoft is also a pretty common database system to see in your typical work environment. To connect to it you will first need to grab the server name (the selected string):

Then, when you are connected to the database engine you will need to find the database name and table name, so you know what to put in the connection string:

Oh, almost forgot. You also need to install pyodbc library through pip:

pip install pyodbc

As promised, the code is a bit shorter than the one for connection to Oracle:

import pyodbc
conn = pyodbc.connect(
    'Driver={Sql Server};'
    'Server=DESKTOP-TLF7IMQSQLEXPRESS;'
    'Database=retail;'
    'Trusted_Connection=yes;'
)
cursor = conn.cursor()
cursor.execute('SELECT TOP 5 * FROM dbo.table_transactions')
for row in cursor: print(row)
conn.close()

Executing this code cell will output the following:

Up next, MySQL.


MySQL

Although maybe not used so many in work environments as the previous two, MySQL is still very popular, and you’ve probably used it before if you learned web development for example.

By default, MySQL instance comes with this sakila database, which looks really familiar to the dvdrental database. The connection I’ll make is to sakila database, and actor table.

My Python installed through Anaconda already came with necessary libraries built-in, so there’s no need for additional installations. To connect from Python, I’ll use mysql library:

import mysql.connector
conn = mysql.connector.connect(
    host='localhost',
    user='root',
    passwd='1234'
)
cursor = conn.cursor()
cursor.execute('SELECT * FROM sakila.actor LIMIT 5')
for row in cursor: print(row)
conn.close()

Once you execute this block of code you will get this as an output:

3 down, 1 to go!


PostgreSQL

Last but not least is the Postgres database. To connect to it you will need to install psycopg2 library:

pip install psycopg2

Strange name, I know. So easy to make a typo upon import.

Nevertheless, the connection process is fairly simple. My database instance contains the earlier mentioned dvdrental database, so I’ll connect to it, once again to the actor table:

import psycopg2
conn = psycopg2.connect(
    user='postgres',
    password='1234',
    host='127.0.0.1',
    port='5432',
    database='dvdrental'
)
cursor = conn.cursor()
cursor.execute('SELECT * FROM actor LIMIT 10')
for row in cursor: print(row)
conn.close()

Executing that code block will output the following:


Conclusion

And that pretty much concludes this article. I’ve seen 15+ minutes article for connecting to only one database, covering all the nitty-gritty details. Personally, I don’t see a point in that, because come on, it’s only a database connection! There are a lot more important things to focus on.

If you are a data scientist you’ll be using databases mostly for fetching stuff, so this article is enough for you. For any more advanced stuff, ask your best friend, Google.

Thanks for reading…


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.