Data Engineer, Patterns & Architecture The future
Deep-dive into Microservices Patterns with Stream Process
TL;DR
With Industry 4.0, several technologies are used to have data analysis in real-time, maintaining, organizing, and building this, on the other hand, is a complex and complicated job. Over the past 30 years, we saw several ideas to centralize the database in a single place as the united and true source of data has been implemented in companies, such as Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture.
On the other hand, Software Engineering has been applying ideas to separate applications to facilitate and improve application performance, such as microservices.
The idea is to use the MicroService patterns on the date and divide the model into several smaller ones. And a good way to split it up is to use the model using the DDD principles. And that’s how I try to explain and define DataMesh & Data Fabric.
It is worth mentioning here that I simplified the concept and idea of the Data Mesh & Data Fabric just as I simplified the concept and idea of Streaming & Kafka.
Idea
I was invited to do a talk in a Data Engineer meetup and with that, I came with this idea to show my vision of Data Mesh.
Industry 4.0
In the last years, several ideas and architectures have been in place like, Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture, Big Data, and others, they present the idea that the data should be consolidated and grouped in one place. An idea of a single place as the united and true source of the data.
The image here shows the concept of grouping all data in a unique place as a final destiny.
The Software engineer area in recent years has shown that applications should be isolated to facilitate and improve application performance and facilitate maintenance. One of the ideas presented for the division would be the use of DDD and Microservices.
If we compare the data area with the development area we see that the two are saying just the opposite, one wants to unify and the other wants to divide.
And this is exactly the idea that DataMesh presents, it presents the idea that we should separate the data using the ideas of DDD and Microservices to generate smaller and simpler applications, where maintenance and performance are better.
Looking at the Microservice Patterns we see that there are some Data-driven Patterns. And if we analyze them in more detail, we see that they all use or are linked to Stream Processing.
The idea here is that all Designer Patterns related to Data we can apply Streaming and that tools like Apache Spark, Apache Flink, and Apache Kafka are the ones that are most in use today. There is an ecosystem around that with several other technologies as well.
I created a table where I put the principal options in the market for Streaming, but the idea here is that this is not a Silver Bullet and is just my vision. This is subjective.
Books that I used to compose my idea (Just want to remember that there are several articles, paper, and videos around these ideas)
- Domain Drive Designer — Eric Evans
- Microservice Architecture: aligning principles, practices, and culture — Irakli Nadareishvili, Ronnie Mitra, Matt McLarty & Mike Amundsen
- Kubernetes Patterns: Reusable elements for designing Cloud Native applications — Bilgin Ibryam & Rolanb Hub
- Designing Data Intensive Applications — Martin Kleppmann
- The Fourth Industrial Revolution — Klas Schwab
- The Inevitable — Kevin Kelly
Articles that I used to compose my idea:
- Streaming as a Database
https://yokota.blog/2019/09/23/building-a-relational-database-using-kafka/
https://yokota.blog/2020/01/13/building-a-graph-database-using-kafka/
- Event Driven & Data Mesh
http://jacekmajchrzak.com/event-driven-data-mesh-introduction/
- Serverless era
https://blogs.oracle.com/cloud-infrastructure/serverless-big-data-pipelines-architecture
- Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)
https://www.youtube.com/watch?v=v2RJQELoM6Y
- The Kubernete native
https://medium.com/@graemecolman/the-new-kubernetes-native-d19dd4ae75a0
https://developers.redhat.com/blog/2020/05/11/top-10-must-know-kubernetes-design-patterns/
- Microservices Patterns with GoldenGate
https://www.slideshare.net/jtpollock/microservices-patterns-with-goldengate
- Webinar future dataintegration-datamesh-and-goldengatekafka
https://www.slideshare.net/jtpollock/webinar-future-dataintegrationdatameshandgoldengatekafka
- Pondering Distributed Data Lakes Idea
https://www.youtube.com/watch?v=mnvxeU3oDyQ
- What is a service mesh?
https://www.youtube.com/watch?v=QiXK0B9FhO0
Twitter Influencers:
@KaiWaehner
@bibryam
@gschmutz
Conclusion
- Divider and conquer is the best way to start, it’s easy, cheaper and will save you time and money
- Several companies failed when they tried to implement Big Data, Data Lake and Data warehouse because they tried to build something big and complex
- Streaming is everywhere
- We are in the multi-cloud and hybrid-cloud era
- Serverless architecture is more and more trending
- We shouldn’t be concerned with the name, but with the goal
- I simplified it in a way that I passed the idea that
Yes, we are already in the Serverless era;
Yes, we are already in the era of native Kubernetes;
Yes, Streaming is a database;
Yes, Streaming is everywhere;
Yes, divide to conquer;
Yes, Microservice patterns;
Yes, Kubernetes Patterns is the plus;
My slide deck:
Append:
- Big Data
http://www.igfasouza.com/blog/what-is-big-data/
- Apache Spark
http://www.igfasouza.com/blog/what-is-apache-spark/
- Apache Kafka
http://www.igfasouza.com/blog/what-is-kafka/
- Stream processing
http://www.igfasouza.com/blog/what-is-stream-processing/
- Data warehouse
https://www.oracle.com/ie/database/what-is-a-data-warehouse/
- Data lake
http://www.igfasouza.com/blog/what-is-data-lake/
- Data lakehouse
https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html
- Industry 4.0
- Data Fabric
https://www.forrester.com/report/Now+Tech+Enterprise+Data+Fabric+Q2+2020/-/E-RES157315#
- DataMesh
https://martinfowler.com/articles/data-monolith-to-mesh.html
- Microservice