The world’s leading publication for data science, AI, and ML professionals.

How To Create a SQL Practice Database with Python

Finally, start practicing SQL with your own database

Hands-on

People illustrations by Storyset
People illustrations by Storyset

Writing SQL is important. Being able to efficiently query a Database is often considered one of the most essential skills to develop as an aspiring data analyst/scientist.

SQL is not only important but also quite commonly used. According to the Stackoverflow Developer Survey 2021, SQL ranks among the top five used programming languages.

So, we should probably invest some time to learn SQL.

But there is just one problem: How to practice querying a database when there is no database, to begin with?

In the following sections, we are going to address this fundamental problem and learn how to create our own MySQL database from scratch. With the help of Python and some external libraries, we will create a simple script that automatically creates and populates our tables with randomly generated data.

But before diving straight into the implementation details, we have to cover some prerequisites first.

Note: There are of course other ways to get a SQL database for practicing purposes (e.g. simply by downloading). However, using Python and some external libraries provides us with additional and valuable practice opportunities.


Prerequisites

Let’s start with the basics.

First of all, we need to install the MySQL Workbench and set up a connection. Once, we have a connection we can set up our database with:

CREATE DATABASE IF NOT EXISTS your_database_name;

Now, we simply need to install the necessary python libraries and we are done with the basic setup. The libraries we are going to use are the following and can easily be installed via the terminal.

  1. NumPy pip install numpy
  2. sqlalchemy pip install sqlalchemy
  3. faker pip install faker

Creating the Script

Having finished the basic setup, we can finally start writing our python script.

Let’s create a single class with some boilerplate code first, providing us with a blueprint, guiding us through the rest of the implementation.

Nothing fancy so far.

We basically just created a class, stored the database credentials for later use, imported the libraries, and defined some methods.

Establishing a connection

The first thing we have to accomplish is to create a database connection.

Luckily, we can utilize the python library sqlalchemy that pretty much does most of the job for us.

The method creates and stores 3 objects as instance attributes.

First of all, we create an engine that acts as the starting point for any sqlalchemy application, describing how to talk to a specific kind of database / DBAPI combination.

In our case, we specify a MySQL database and pass in our credentials.

Next, we create a connection that simply allows us to execute SQL statements and a MetaData object which is a container, keeping together different features of the database, allowing us to associate and access the database tables.

Create the tables

Now, we need to create our database tables.

We create 3 tables and store them inside a dictionary for later reference.

Creating a table in sqlalchemy is pretty straightforward as well. We simply instantiate a new table class, provide a table name, the MetaData object, and specify the different columns.

In our case, we create a job, a company, and a person table. The person table also references the other tables by foreign keys which makes the database more interesting in terms of practicing SQL joins.

Once we defined the tables, we simply need to invoke the create_all() method of the MetaData object.

Generate some random data

We created the database tables, but we still don’t have any data to work with. Thus, we need to generate and insert some random data into our tables.

Now, we utilize the Faker library to generate random data.

We simply create a new record, represented by a dictionary, inside a for loop with randomly generated data. The single record is then appended to a list that can be used for a (multiple) insert statement.

Next, we invoke theexecute()method from our connection object and pass our list of dictionaries as an argument.

And this is it! We finished our class implementation – we simply need to instantiate the class and call its methods to create our database.


Make a query

The only thing left – we need to verify that our database is up and running and actually contains some data.

Starting with a basic query:

SELECT *
FROM jobs
LIMIT 10;
Basic query result [Image by Author]
Basic query result [Image by Author]

Looks like our script worked and we have a database with actual data.

Now, let’s try a more involved SQL statement:

SELECT
  p.first_name,
  p.last_name,
  p.salary,
  j.description
FROM
  persons AS p
JOIN
  jobs AS j ON
  p.job_id = j.job_id
WHERE 
  p.salary > 130000
ORDER BY
  p.salary DESC;
More involved query result [Image by Author]
More involved query result [Image by Author]

This looks also very promising – our database is alive and well.


Conclusion

In this article, we learned how to leverage Python and some external libraries to create our own practice database with randomly generated data.

Although one can easily download an existing database to start practicing SQL, using Python to create our own database from scratch provides additional learning opportunities. Since SQL and Python often go hand in hand those learning opportunities can prove especially useful.

You can find the full code here on my GitHub.

Feel free to adjust the code and create a bigger, better, and more intricate database.

#30DaysOfNLP


Enjoyed the article? Become a Medium member and continue learning with no limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.

Join Medium with my referral link – Marvin Lanhenke


Related Articles