Understanding Statistical Data Types

“Data is the new oil” — but just like several types of oil exist, so do several types of data

Rohan Vij
Towards Data Science

--

Random letters and numbers representing data.
Photo by Markus Spiske on Unsplash

Introduction

“Data is the new oil” — a phrase coined in 2006 which took the world by storm.

Wants some shocking facts? Over 90% of all data in the world was created in the last two years. If you burned all the data generated in a day onto CDs, that stack could reach the mean twice. Data is big and valuable — so knowing how to operate on it is crucial. To do so, it is crucial to learn about the different types of data and what they represent. Let’s get started!

Qualitative vs Quantitative

Now, you have probably gone over this several times throughout your life, so I will keep this short. Qualitative data (also known as categorical data) is data that cannot be measured using numbers. When sorting categorical data, one can only group them into categories. Common examples of categorical data are sex (male/female), race, and educational level.

Quantitative data is exactly what you might guess — quantifiable (numerical) data. Quantitative data can be sorted (think greatest to lowest), graphed, and used in mathematical analyses. Some common examples of quantitative data are time, weight, temperature, and grade level.

We can treat these two types of data, qualitative and quantitative data, as the root of the other four data types we will be exploring.

Types of Qualitative Data

Nominal Data

Nominal data is a type of categorical data in which each data variable cannot be compared to one another. While each variable is different from the other, they are not relatively different from each other. For instance, eye color is an example of nominal data. While several types of eye color exist (black, brown, green, blue), we cannot say that they are different in relation to each other — they are simply labels describing an attribute. The meaning of the aforementioned list of eye colors would not change if we were to change its order.

Ordinal Data

Ordinal data is data where each data variable is naturally related to the other. Each one is relatively different from the other, whether it be in terms of size, length, duration, etc. For example, education level (in this case college degrees) is a type of ordinal data. We can say that associate, bachelor’s, master’s, and doctoral degrees are all relatively different from one another because each requires a different about of time. Theoretically, we could quantify ordinal data (associate=2 years, bachelor’s=4 years, etc) and perform mathematical operations on it, so it is sometimes considered to be in a gray area between qualitative and quantitative data.

While ordinal data is also simply labels, the background information behind the labels can be compared to each other. As a result, if we were to reverse the order of the aforementioned list of college degrees, its order would change from least time → most time to most time → least time.

Types of Quantitative Data

Discrete Data

You have probably heard about discrete data during your middle and high school math classes. Chances are, you visualize discrete data through a graph like this:

A series of disconnected points plotted at random on a graph.
Image by the author.

Data that only involves integers that are discrete (or separate) from one another. For example, the number of people in a room is an example of discrete data. It can only be measured in whole numbers — after all, you cannot have a fraction of a person! Discrete values can be counted because there is an exact set of them, but they cannot be measured.

Continuous Data

Continuous data is data that involves fractions, or non-whole numbers. You most likely visualize it through a line:

A linear line with connected points.
Image by the author.

Continuous data consists of values such as time, height, and the price of an item. Each value can get divided or become smaller and still remain valid. For example, we can divide the time a person took to complete a race by two and it still will remain valid — even if the number goes into milliseconds and microseconds. On the other hand, we cannot always divide the number of people in a room by two. Again, you cannot have a fraction of a person! You can measure any continuous value, but you cannot count it (there are infinitely many points to count).

Conclusion

A diagram representing the tree of statistical data types.
Image by the author.

While these 4 data types mentioned in this article do form the backbone of statistical data types, several more subtypes exist under the ones already mentioned. If you want to read more, I highly recommend this article by Niklas Donges which goes more into depth.

Thank you for reading! I hope you thoroughly enjoyed the article and now are more comfortable with statistical data types.

--

--

Hi! 👋 I’m a high school student who enjoys writing about technology and astronautics.