How I use Cx Oracle and a few other Python libraries to automate small ETL tasks
data:image/s3,"s3://crabby-images/a35c8/a35c8549afed1f636000b5ffb8eefcf99dcd9ed7" alt="Image from Pexels"
I’ve been using cx_Oracle Python extension module to access my Oracle Database and I was able to automate most of my small tasks because of it.
There are documentations available online provided by Oracle which can help you directly access Oracle database, but in this article I’ll be demonstrating how I use Cx Oracle and a few other python libraries to automate small ETL tasks from local computer to Oracle Database.
The goal is to write a Python script that reads a file in a local folder, perform data cleaning and transformations, then insert into the Oracle Database. This python script file can then be scheduled via task scheduler so it can be run automatically on a set schedule. Extra logging codes will be added to keep track of what happens within the script.
Prerequisite:
Before you can write Cx Oracle in your python scripts, you need to install Oracle Client Libraries, when you’ve successfully installed those you’re all set and ready to write your python scripts. To begin the script I imported these libraries:
import pandas as pd
import cx_Oracle
import math
import logging
Logger Info Chunks:
The logger info chunks of code generates a log file that logs if your script ran successfully or not. These logs are useful if the scripts will be automated, so you are notified on the status of the commands.
Data Loading and Cleaning:
The first python function reads the file from a server or a local folder, cleans and transform the data if necessary and then returns the desired data frame output which will be used to insert into an Oracle Database Table. The code is pretty straight forward and any additional command to transform the dataset can be added.
Cx Oracle Connection
The second python function is where the Database Connection and Oracle Commands take place. The first part of the chunk requires the connection details like the username, password, hostname, port and SID so python can communicate to Oracle. The second part is where the commands or tasks are defined through short SQL queries. I’ve added 3 queries in the function, Truncate (Clear the table but keep the table schema) , Insert (Injecting data in the table) and Grant Access (Giving public access so users can query your table) – more queries can be added if desired.
Script Runner
The last part of the python script to call the two functions. If you’ve noticed logger.info is called on most of the lines across the whole script, this makes it easier to track the status of every command.
This python script file can now be used as part of a batch file that can be uploaded via task scheduler or run via command prompt (logger info prints notifications via cmd too).
If you’re curious about the import speed, I’ve tested it on small to big dataset. For 1 million rows of code for example, the python script took about 12 minutes to successfully import on my Oracle table. This of course, will vary on several other factors on your local machine.
Overall, this simple python script and automation process will help you save time on doing small tasks that takes up some of your time.
Website: https://www.kathleenlara.com/
Twitter: https://twitter.com/itskathleenlara