Beginners guide: Data types and their measurement scale

A brief to Data types

Anshul Kanthaliya
Towards Data Science

--

Data have an important story to tell. They rely on you to give them a voice.

Before you give them a voice, you have to understand the different data types. There are different ways to categorize data based on the way it has been collected or its structure.

Based on Data Collection: Data can be categorized into three types based on how data has been collected.

  1. Cross-Sectional Data: Any data points/values captured on multiple variables over one specific time period is termed as cross-sectional data. Ex: attributes of the employee such as age, salary, level, team for the year 2019.
  2. Time-Series Data: Any data points/values captured on a single variable over multiple periods is called time-series data. Ex: sales of smartphones on a monthly, quarterly, yearly basis.
  3. Panel Data: A combination of both the cross-sectional and time-series data is known as Panel data. Ex: GDP of the various country over different periods.

Based on Structure. Another important way to classify data is based on their structure. It can be categorized into two types.

  1. Structured Data: All the data points which have a specific structure and can be arranged in tabular form (also known as a matrix) with rows and columns are called structured data. Ex: Salary of employees arranged with employee id.
  2. Unstructured Data: All the data points which are not arranged into any tabular format are unstructured data. Ex: Emails, videos, clickstream data, etc.

70% of the available data is unstructured and while analyzing or building any analytics model one has to convert unstructured data to a structured one.

Another problem which most of the beginners with data analytics domain face is, even the structured data is available what to do with it, how to use it, how it can be measured and how to infer insights from that.

And for all these, Measurement scales becomes important. One must be aware that if the structured data is available how we can measure them and how those can be differentiated based on measurement.

Data can be divided into four parts based on a measurement scale.

  1. Nominal Scale: All the data points which are qualitative in nature falls in this category. These are also referred to as categorical variables. Ex: Marital Status (Single, married, etc.). No arithmetic operation (addition, subtraction, multiplication or division) can be performed on such variables.
  2. Ordinal Scale: All the data points from the ordered set falls in this category. Ex: Ratings on a 1–5 scale (5 being highest and 1 being lowest). Here the order of the set is fixed, but no arithmetic operation can be performed such as we know, rating 4 is better than 2, but two 2 ratings cannot be equaled to rating 4.
  3. Interval Scale: All the data points which have been taken from some fixed interval set. Ex: Temperature (in centigrade), IQ level. In such variables, addition or subtraction can be performed but division doesn’t make sense. As you can say Mumbai has 10 centigrade more than Bangalore, but you saying that Mumbai is twice hotter than Bangalore is not right, thus ratios don’t make sense here.
  4. Ratio Scale: All the data points which are quantitative in nature falls in this category. Ex: Sales of a product, the salary of an employee, etc. Here all the arithmetic operations can be performed and comparison can be made as such that Ram earns twice of what Shyam earns, thus ratios make sense.

Thus, by looking at the data, one can infer what kind of data is available like nominal, ordinal, etc. which eventually helps a data analyst/scientist, while building any analytics model for understanding different variables, doing exploratory data analysis, doing data imputation, and performing one-hot encoding.

And not only it becomes important in predictive analytics, but it also helps in descriptive analytics. You can't do Exploratory data analysis if you don’t have information about the type of data. Once you identify the type of data then lot of univariate and bivariate analysis, visualization and calculation such as mean, mode, median, etc. can be performed to infer insight from data.

--

--