
Data-driven decision-making – the sales pitch
Stating that data radically transforms the way businesses operate is hardly a revelation. Whether it is high-frequency sensor data, real-time stock market prices or detailed user logs – we track, collect and store data at a scale unprecedented in history. The reason: these vast piles of data hide value. Data-driven decision-making is not just a buzz word – quality data truly allows taking evidence-backed actions that propel businesses forwards.
Developments in data analytics follow one another at a neck-braking pace. We will only store more data; higher frequencies, richer records, additional sources. Not only data volumes increase – we also continue finding new ways to use it. Sophisticated analytics tools pop up like mushrooms, new patterns keep being derived, and we find connections between data sets previously never conceived. To see how bustling the field of data analytics is, just check the daily stream of Medium articles.

Data-driven decision-making – the reality
Unfortunately, company systems evolve at a very different pace. Obscure programming languages and tools – years or even decades old – are scattered across teams and departments. These so-called legacy systems are hard to upgrade and near-impossible to access for those not directly involved. We might be talking about hundreds or even thousands of applications. Such a collection of systems cannot be redesigned and modernized overnight; it is a massive undertaking that likely takes years.
Consequently, most companies experience a great mismatch between what they want to do and what they can do with their data. The data may be found somewhere in a company, but only for the lucky few who are aware of it and happen to know their way around the system. This is what is known as a silo structure – when residing in a different company silo, retrieving insights from another one is a slow and painful process.
So how do we navigate the revolutionary world of data analytics, while simultaneously maintaining (and perhaps gradually updating) our core legacy systems? The conceptual solution is as simple as it is brilliant: simply segregate the data layer from the IT systems. That way, we harness the potential of Big Data and advanced analytics, without radically uprooting daily business. This new home to our data is called a data platform.
What is a Data Platform?

Splunk gives a comprehensive definition of what a data platform is:
"A data platform is a complete solution for ingesting, processing, analyzing and presenting the data generated by the systems, processes and infrastructures of the modern digital organization."
As seen, it is not merely a giant data lake that holds all raw data of the company. No, it is an ecosystem of its own, a platform that encompasses everything from retrieving data from applications to presenting it to the end-user.
In all likelihood, such a data platform resides in the cloud rather than on-premise, driven by considerations of scalability and elasticity. Cloud services are easy to set up, storage is relatively cheap, tools are constantly updated, and services can often be used with a pay-as-you-go model. To take advantage of developments in the years to come, build in resilience to change and flexibly respond to new opportunities, the cloud is probably the way to go.
The benefits
The potential benefits of data platforms are abundant:
- Data from all kinds of (segregated) systems and sources can be pooled.
- Up-to-date information is continuously retrieved from inflexibly legacy systems.
- The end-user is in control. Whether a data scientist or a business manager, you can use the tools you desire to gain the insights you want.
- IT departments are no longer a bottleneck to retrieve certain data. No more drawing tickets, no more dependency on the priorities set by IT.
- In-house IT talent can be deployed where they add most value, rather than manually extracting data from old systems.
- State-of-the-art services – such as new artificial intelligence techniques and Digital Twins – can be deployed without the need to integrate with existing legacy systems.
- Future data sources can be connected to the platform. Even mergers and acquisitions can potentially be handled by a well-designed data platform.
- A platform prepares for data needs of the future. Big data brings challenges like velocity, variety, volume and veracity that must be handled.
To summarize, a data platform enables every company stakeholder to access and process all data relevant to decision-making, whenever and wherever they want. Only then, a true data-driven enterprise can been realized.
How to do it?

As you may have surmised from its capabilities, establishing a data platform is not a matter of simply installing a software kit that magically scrapes all data from company systems. BCG claims that – compared to transforming the legacy systems – setting up a data platform can be done at half the time and against half the cost. Mind you; that is still a substantial operation.
The platform might be viewed as a collection of tools and data operations, which combined lay the foundation for a truly data-driven enterprise. Naturally, such a design requires serious thought on data strategy and a clear roadmap.
It would stretch too far to list everything needed to build a data platform, but here is an overview of common building blocks and examples of corresponding tools:
- Data ingestion tools: Data must be gathered from bring a variety of sources, with different volumes, formats and frequencies. Whether it is sensor data, user logs or a third party database, all must be ingested somehow. (Apache Airflow, Singer)
- Data storage: Storage is traditionally the domain of data warehouses, providing a structured yet inflexible data representation. Data lakes, by contrast, store varied and unstructured data. Data platforms may require something in between (the emerging data lakehouse), handling a variety of data while preserving as much structure as possible. (Redshift, Amazon S3, Google Cloud Storage)
- Data transformation: When working with traditional database warehouses, the transformation boils down to selecting the right data in the right format. For more exotic data, orchestration applications might be needed. (dbt, Apache Airflow)
- Business intelligence: At the managerial and executive level, the platform should report relevant insights, visualize KPIs and project trends. Dashboards are common for these purposes. (Power BI, Tableau)
- Data science: To obtain non-predefined insights from the data, custom scripts and tests may be deployed. However, cloud platforms nowadays also host many analytical tools. (Python, R, Amazon Sagemaker)
These blocks merely cover the functional aspect of the platform. There are additional building blocks and topics that require attention, such as:
- Security: Naturally it entails a considerable risk to bundle all company data at a single platform. Throughout the entire architecture, maintaining a high standard of security is crucial. (IBM Security Guardium)
- Data governance: To establish trust in the system, all data (and the transformations performed) should be traceable back to its origin. Aspects like responsibilities, privacy, and data catalogizing should be well-designed. (Apache Atlas)
Bottom line: a data platform is not some gimmick to try – building one is a strategic commitment that transforms data into a separate company asset.
![Conceptual layout of a data platform and underlying architecture [image by author]](https://towardsdatascience.com/wp-content/uploads/2021/10/1r1yaR9WidO0SuJns3WG38A.png)
Is it worth it?
Data Analytics no longer is a nice-to-have – the power of informed decision-making is essential to remain competitive in the long run. For the foreseeable future, data availability and -analytics will continue growing at a much faster pace than the systems themselves. Remember that the primary purpose of legacy systems is not to generate data; they are the systems that schedule employees, process transactions, handle customer orders. These systems form the electronic heart of the company. Nonetheless, data analytics no longer should be held back by VBA scripts from the ’90s, either. There is only one solution – to liberate data from the legacy cages.
Takeaways
- A data platform segregates data from operational systems. It creates a separate layer that explicitly treats data as a corporate asset.
- Readily accessible real-time data is essential for successful analytics. End-users need to be able to make data-driven decisions based on relevant information, unhampered by inflexible legacy systems and silo structures.
- Developments in data analytics far outpace the changes of internal systems. Building a separated data layer takes advantage of advances in analytics, while preserving the working of operational systems.
- A data platform is a long-term strategic commitment. Before moving forwards, considerably thought should be put into business alignment and platform architecture.
Enjoyed the article? You might also like the following, leveraging Data Platforms in a Digital Twin environment: