
Big Data refers to large amounts of data from areas such as the internet, mobile telephony, the financial industry, the energy sector, healthcare etc. and from sources such as intelligent agents, social media, smart metering systems, vehicles etc. which are stored, processed and evaluated by using special solutions [1]. The term Big Data has already been present in science and practice for several years. Often, the challenges of Big Data, the 3Vs in particular, are refered to the constant growth of data (Volume), the increasing of Variety and Velocity and also the increasing speed of data changes [2]. But how can Big Data, or the ability to process large amounts of data, be turned into smart data and winning insights?
Being capable working with Big Data
When do we actually talk about Big Data? How much data must be congregated in order to be considered as big data? Normal data storage technologies usually work with megabytes or gigabytes. We are speaking of Big Data when the data amount reaches terabytes or petabytes. The reason for this rule of thumb is that traditional systems are no longer powerful enough and are also significantly more expensive when working with this kind of data amounts. Typical characteristics of Big Data (Storage) Technologies are:
- Distributed Storage
- Data Replication
- Local Data Processing
- High Availability
- Data Partitioning
- Denormalized Data
- Working with Structured and unstructured data

Gaining Insights from Data (Smart Data)
However, this does not yet mean that concrete knowledge has been extracted from the data or that corresponding extracted from the data or that corresponding recommendations for action have been derived on this basis. In order to turn Big Data into Smart Data and thus generate added value for for companies, analytics processes are necessary. To do this, companies typically need one or more analytics capabilities in one of the three common disciplines:
- Descriptive analytics – What happened?
- Predictive analytics – What will happen?
- Prescriptive analytics – What can i do to make something better?
In order to make the data quickly and flexibly analyzable and accessible for BI tools or ML, the right Data Platforms is needed.
With big cloud providers like Amazon, Google, etc. IT services like Data Warehouses, Databases and a lot more come via plug and play. Providing a Data Warehouse service like BigQuery [3] or Redshift [4] comes per click. Public clouds are also providing way more computing power as a self hosted computer center will ever do. Especially for smaller companies and startups, it is an interesting chance to use these kinds of services, since they are cost efficient and easy to set up. If you are interested you can easily test solutions via free tiers which most of the big cloud providers offer.
For decades data engineers, software engineers and data analysts have been building up Data Warehouses with an ETL process and have been focussing on implementing an architecture strictly following data models like Star or Snowflake. Also often the focus was more on technical details than on business needs. In a Data Lake all data is stored raw in a staging area. Afterwards, the data will be processed into a Data Warehouse (Hybrid models where the Data Warehouse is part of the Data Lake are also common), Data Marts or used for analytics and reporting. This makes a Data Lake way more flexible than a Data Warehouse. Also, it enables new use cases like Machine Learning and provides storing capabilities for unstructured data.

If we look at the simplified architecture, it becomes clear that, if the services can already communicate with each other in a cloud environment, or are integrated in a service, no further interfaces to third-party systems are necessary. This significantly shortens the setup and maintenance of these environments. Another important factor is that the Data Science process can be remarkably streamlined. Every data scientist and engineer knows how time consuming the process can be – so the approach of having everything you need in a cloud environment or even in a service simplifies this process significantly.
Summary
With the help of ready-to-use cloud services, new paradigms like Data Lakes and self service BI tools along with agile methods companies – especially smaller and medium ones – can build up a data analytics platform in a shorter span of time with the ability to focus more on the business needs.

In order to generate Smart Data from Big Data, the first step should be to create the technical prerequisites. The second step is to enable analyses with the help of integrated and scalable services. Here, the data can then be further processed, for example, via a self-service BI tool for descriptive tasks or for machine learning services. Thanks to this interaction, the focus can then be placed on more business-related activities instead of the operation of technology.
Sources
[1] Google, https://cloud.google.com/what-is-big-data (2021)
[2] McKinsey and Company, ARTIFICIAL INTELLIGENCE THE NEXT DIGITAL FRONTIER? (2017)
[3] Google, BigQuery (2020)
[4] AWS, Amazon Redshift (2020)