Definition
A Data Hub is a data exchange with frictionless data flow at its core. It can be described as a solution consisting of different technologies: Data Warehouse, Engineering, Data Science. It’s rather a technology, but an approach to more effectively determine where, when, and for whom data needs to be mediated, shared, and then linked and/or persisted. Endpoints, which can be applications, processes, people, or algorithms, interact with the hub, potentially in real time, to provide data to or receive data from the hub [1].
Distinction from the Data Warehouse and Lake
While Data Warehouses and Data Lakes are understood to be endpoints for data collection that exist to support an organization’s analytics, Data Hubs serve as points of intermediation and data exchange. A summary of the characteristics of each solution can be seen below.

Benefits of a Data Hub
A Data Hub enables data sharing by connecting producers of data with consumers of data. Endpoints interact with the Data Hub by providing data into it or receiving data from it, and the hub provides a mediation and management point, making visible how data flows across the enterprise [2].
A Data Hub connects many different systems in real time – it is a suitable tool for today’s challenges: Namely to exchange a lot of data as fast and standardized as possible and to make it available for applications like systems, Machine Learning or reporting.
Examples for Data Hub Technologies
Even if, as described, data hubs are not a technology in themselves but rather an approach, there are products on the market which are marketed as Data Hubs. These examples also clearly show that a Data Hub is a combination of several technologies.
Examples:
Another good example is the description of SAP. This shows quite well the interaction of technologies and what the actual idea of a Data Hub could be.
![SAP DATA HUB - Source SAP [6]](https://towardsdatascience.com/wp-content/uploads/2021/01/0TrMKcYWCTfVFHTdR.png)
Conclusion
A Data Hub brings together enterprise data from different sources and formats to extract valuable knowledge. We speak less of a technology than of an approach or a platform. Hopefully, this article gives you a first idea of what a Data Hub is. For a deep dive, however, I recommend reading further into the topic. My sources below might be a help.
Sources and Further Readings
[1] Eckerson, Data Hubs – What’s Next in Data Architecture? (2019)
[2] A. Awadallah, The Platform for Big Data, Cloudera White Paper (2013)
[3] Cumulocity IoT, DataHub overview (2021)
[4] Cloudera, Enterprise Data Hub (2020)
[5] Google, Ads Data Hub (2021)
[6] SAP, I have SAP HANA, when would I need SAP Data Hub (2019)