A beginner’s guide to SQL

Learn one of the most desired tech skills in under 10 minutes

Veronica Nigro
Towards Data Science

--

Photo by Unsplash

Introduction

Databases are computer system that allow for storage and easy access of data. Depending on how their data is organised, databases can be relational (RDBMS) or non-relational (NoSQL). Relational databases are structured in a table form, similar to csv files or excel spreadsheets related to one another. Although, relational databases are much more robust than csv or json files. Unlike single files, if multiple users tried to interact with the data stored in a database, the actions taken would not affect each other and would only be saved if valid according to certain rules. Moreover, if one part of the action failed, the whole action would fail and the database would be left unchanged; once it is finalised though, it will be stored permanently (even if database crashes occurred).

The illustration below shows an example of relational database where the tables have a column with a primary key, which contains unique values for each row. Tables can also have foreign keys, which serve as a link to other tables. These links define the relational database.

Some examples of relational databases are MySQL and PostgreSQL (both free and open-source) or Oracle and Microsoft SQL (mainly used by companies).

All these databased can be accessed and manipulated with SQL, a programming language which stands for Structured Query Language. Learning SQL is a lot easier than learning other programming languages because it reads as a natural language. In the next sections I will show how to create, manipulate and query data in a SQL database.

Set up a database with SQLite

For this exercise we will be using SQLite, which is similar to other SQL databases, except that it is a single disk file, so there is no intermediary server process. These feature makes SQLite more suited for small projects and because it is file-based it doesn’t require any installation. In your command line you can simply type sqlite3 followed by the name of the database you want to access or create exercise.sqlite.

You will notice that the terminal prompt changes to sqlite> . Here you can type your SQL commands followed by a semicolon (;) to execute them.

Creating tables

We will now try to recreate the illustration above, which shows the customers table and the loans table. The syntax of SQL commands follows simple patterns: CREATE TABLE table_name (column_name data_type constrains) to create tables and ALTER TABLE table_name ADD/DROP COLUMN column_name to modify them.

It’s important to specify which kind of data will be inserted into the tables. SQL supports different field types, including strings, numeric, date and time. For more details on the different data types you can check out this link.

When creating a table you can also add several constraints. The most common ones are NOT NULL , which prevents columns from having null values, and PRIMARY KEY or FOREIGN KEY, which uniquely identify records in a table or to another table.

CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
name VARCHAR(50) NOT NULL,
email VARCHAR(50) NOT NULL
);
CREATE TABLE loans (
loan_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
customer_id INTEGER NOT NULL,
requested DOUBLE NOT NULL,
approved DOUBLE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers (customer_id) ON DELETE CASCADE ON UPDATE CASCADE
);

So far our database has a structure but no data yet. You can add data to the tables by typing the code below. Notice that the first column of both tables only supports primary keys so, if we were to use an existing entry, we would get an error.

/*Data for the table customers */INSERT INTO customers (customer_id,name,email) VALUES (1,'Grace','grace@gmail.com');
INSERT INTO customers(customer_id,name,email) VALUES (2,’James’,’james@gmail.com’);
INSERT INTO customers(customer_id,name,email) VALUES (3,’Tony’,’tony@gmail.com’);
INSERT INTO customers(customer_id,name,email) VALUES (4,’Alex’,’alex@gmail.com’);
/*Data for the table loans */INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (1,1,80.00,80.00);
INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (2,4,100.00,80.00);
INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (3,4,50.00,50.00);
INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (4,2,140.00,0.00);
INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (5,1,200.00,0.00);
INSERT INTO loans(loan_id,customer_id,requested,approved) VALUES (6,1,100.00,100.00);

To update or remove records you can use the UPDATE or DELETE command.

Example: UPDATE loans SET customer_id = 3 WHERE loan_id = 1; changes the customer_id to 3 where the loan_id is equal to 1 in the table loans.

Example: DELETE FROM customers WHERE name like 'Leonard'; deletes all customers names “Leonard” from the table customers.

Querying a database in a Python environment

Personally, I prefer to use to use Jupyter Notebook for querying a database rather than the terminal because it has a nicer visual. There are several ways to run SQL directly in the Notebooks.

One option that doesn’t require any library installation is using the Python SQLite3 package. Simply import the library and run the following code to establish a connection with the database.

import sqlite3                              # import library
sqlite_db = 'exercise.sqlite' # select database
conn = sqlite3.connect(sqlite_db) # connect
c = conn.cursor() # create cursor object

It is necessary to create the cursor object to iterate over the rows of data and to fetch information when using SQLite3. The syntax to create a table is similar to the console, only now we need to write the SQL code within brackets and execute it using the .execute() method of the cursor object c .

c.execute("INSERT INTO customers VALUES (5,’Ann’,’ann@gmail.com’);")# commit changes
conn.commit()

SELECT columns FROM table

Now that we are all set up we can run our first data query. The two main clauses that must be present in every data query are SELECT and FROM .

SELECT allows to select the columns that we want to pull from the database.

FROM specifies from which table we are pulling the columns.

To run a query using the SQLite3 package, first we need to store our results and then extract them using the .fetchall() method.

results = c.execute("SELECT * FROM loans")    
results.fetchall()

An alternative would be to use pandas function .read_sql() to read the SQL query into a DataFrame.

sql_query = (“SELECT * FROM loans”) 
pd.read_sql(sql_query, con=conn)

Notice that to select all the columns you can use the asterisk (*). SQL is also case-insensitive, which means that there’s no difference between typing SELECT and select .

Aggregate functions

Aggregate functions take a list of values and return a single value. They can be used in SELECT statements and often in combination with GROUP BY functions.

The most common ones are: AVG() , COUNT() , MAX() , MIN() , SUM() .

sql_query = (“SELECT AVG(requested) FROM loans”) 
pd.read_sql(sql_query, con=conn)

In this example, the query returns the average amount of requested loans.

Filtering data

The WHERE clause is the most common way to filter results in SQL. It appears after the FROM clause and contains one or more logical expressions that evaluate each row and return them if the condition is true.

The most common comparison operators to use in WHERE clauses are: = (equal to), <> or != (not equal to), < (less than), > (greater than), <= (less than or equal), >= (greater than or equal).

Logical operators, such as AND, OR, LIKE, can also be used to test for the truth of a condition. For a complete list of all logical operators, I would recommend checking out this link.

sql_query = ("""SELECT * FROM customers
WHERE name = 'Tony' OR name = 'Grace'
""")
pd.read_sql(sql_query, con=conn)

The query above returns customers whose name is either “Tony” or “Grace”.

Sorting data

Results of a query can be sorted using the ORDER BY clause. This clause appears after the FROM clause and the WHERE clause (if present). To sort the results, you need to specify the column on which you want to sort the values and the order (ascending ASC or descending DESC).

sql_query = ("""SELECT loan_id, customer_id, requested 
FROM loans
WHERE approved != 0
ORDER BY customer_id ASC
""")
pd.read_sql(sql_query, con=conn)

In this example, the query selects three columns from the table “loans”, filters the data so that the amount approved is different from 0 and then sorts the results by customer_id in an ascending order.

Grouping data

The GROUP BY clause is used after the SELECT statement to combine rows into groups. If you want to filter the rows before grouping, the WHERE clause is applied before the GROUP BY clause, whereas, to filter groups, you need to use the HAVING clause after grouping.

sql_query = ("""SELECT customer_id, AVG(requested) AS avg_requested 
FROM loans
GROUP BY customer_id
HAVING avg_requested > 100
ORDER BY customer_id ASC
""")
pd.read_sql(sql_query, con=conn)

Here the data is grouped by customer_id and filtered so that the average requested amount per customer is greater than 100. Notice that the ORDER BY clause is added at the end.

Join tables

So far we have been querying tables one at a time, but we could also link tables together and query them at the same time. This process is called “joining” which is similar to the idea of intersecting tables. The most common types of joins are:

INNER JOIN : will keep only the rows common to both tables.

LEFT JOIN : will keep the rows of the first table.

RIGHT JOIN : will keep the rows of the second table.

FULL OUTER JOIN : will keep rows from both tables.

sql_query = ("""SELECT * FROM customers
LEFT JOIN loans ON customers.customer_id = loans.customer_id
""")
pd.read_sql(sql_query, con=conn)

In this query we are joining the two tables on the column “customer_id”. Because we used LEFT JOIN , the result includes all rows from the table “customers” even if Ann has not requested a loan.

If instead we used INNER JOIN , the results would have only included customers that also requested a loan.

Additional resources and considerations

That’s it for this tutorial! If you enjoyed it, stay tuned because there’ll be many more to come. In the meantime, you can check out sqltutorial.org for more details about SQL syntax and clauses.

Notice that SQL has many dialects depending on the database that you use, so make sure to always check for possible differences.

--

--