
Both, data lakes and data warehouses are established terms when it comes to storing Big Data, but the two terms are not synonymous. A Data Lake is a large pool of raw data for which no use has yet been determined. A data warehouse, on the other hand, is a repository for structured, filtered data that has already been processed for a specific purpose [1].
Common Grounds
The data warehouse and the data lake are representing a central database system that can be used for analysis purposes in the company. The system extracts, collects and saves relevant data from various heterogeneous data sources and supplies downstream systems.
Data warehousing can be divided into four sub-processes:
- Data acquisition: Acquisition and extraction of data from various data repositories.
- Data storage: Storage of data in the data warehouse including long-term archiving.
- Data supply: Supply of downstream systems with the required data, provision of data marts.
- Data evaluation: Analysis and evaluations of the data stocks.
Differences
While data warehouses use the classic ETL process in combination with structured data in a relational database, a data lake uses paradigms such as ELT and a schema on read as well as often unstructured data [2].

Above, you can see the main differences. Also the technologies you use are quite different. For a data warehouse you will use SQL and relational databases while for data lakes you will probably use NoSQL or a mixture of both.
Combine Both in a Hybrid Data Lake
So how can both concepts be combined? In the figure below, you can see an architecture from a high-level-view.
The process is that unstructured and untransformed data is loaded into a data lake. From here, data can be used, one the one hand, for ML and Data Science tasks. On the other hand, the data can be also transformed and loaded into the data warehouse in a structured form. From here, the classical data warehouse distribution of the data via Data Mart and (Self Service) BI tools can be realized.

The main technologies you can use for this architecture can be for the example: [3][4]
- ELT/ETL Process via – talend, Google Dataflow, AWS Data Pipeline
- Data Lake via – HDFS, AWS Athena & S3, Google Cloud Storage
- Data Warehouse via – Google BigQuery, AWS Redshift, Snowflake
Note: Technologies like Google’s BiqQuery or AWS Redshift often are considered a mixture between data warehouse and data lake technologies, because they often already fulfill some characteristics of NoSQL.
Conclusion
This article explains how you can use a hybrid data lake. A data lake gives your company the flexibility to capture every aspect of business operations in data form while keeping the traditional data warehouse alive.
Sources and Further Readings
[1] talend, Data Lake vs. Data Warehouse
[2] IBM, Charting the data lake: Using the data models with schema-on-read and schema-on-write (2017)
[3] Google, Help secure the pipeline from your data lake to your data warehouse
[4] AWS, Hybrid-Data Lake in AWS