An Introduction to the Relational Database

Peter Bell
Towards Data Science
7 min readOct 18, 2019

--

As a data scientist, a big part of your job is obtaining the data that you need to operate on. In many companies, that data will be stored in one or more relational databases. The goal of this article is to introduce relational databases — what they are and why they’ve historically been so popular.

Different types of data

It’s important to understand that there are different types of data, each of which lends itself to different storage mechanisms. One common, and important, characterization is the distinction between structured and unstructured data.

Structured data is something that would fit really well into a spreadsheet. Think lots of small chunks of information with consistent labels. Addresses, for example, are structured data. Most US addresses comprise of the same five fields — the street address, the second address field (e.g. PO box), the city, the state and the zip code.

Unstructured data is the kind of information you’re more likely to store in a Word or Google doc. A dissertation, a phone call transcription, the contents of a book. Unstructured data doesn’t really break down into small consistent chunks of information that can be usefully separated or queried (as opposed to addresses where you might want to retrieve all the addresses within a given city, state, or…

--

--

Senior Dir @Flatiron / @WeWork, Founder/CTO @ctoconnection & @learn2speakgeek, ex-SVP Eng @GA cofounder @CTOSchool & @ctosummit, author @pearson & @oreillymedia