Data Engineer, Patterns & Architecture The future

Deep-dive into Microservices Patterns with Stream Process

Igor De Souza

Published in

Towards Data Science

5 min readJun 16, 2020

Image created by me http://www.igfasouza.com/

TL;DR

With Industry 4.0, several technologies are used to have data analysis in real-time, maintaining, organizing, and building this, on the other hand, is a complex and complicated job. Over the past 30 years, we saw several ideas to centralize the database in a single place as the united and true source of data has been implemented in companies, such as Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture.

On the other hand, Software Engineering has been applying ideas to separate applications to facilitate and improve application performance, such as microservices.

The idea is to use the MicroService patterns on the date and divide the model into several smaller ones. And a good way to split it up is to use the model using the DDD principles. And that’s how I try to explain and define DataMesh & Data Fabric.

It is worth mentioning here that I simplified the concept and idea of the Data Mesh & Data Fabric just as I simplified the concept and idea of Streaming & Kafka.

Idea

I was invited to do a talk in a Data Engineer meetup and with that, I came with this idea to show my vision of Data Mesh.

https://www.meetup.com/engenharia-de-dados/events/271280539/

Industry 4.0

In the last years, several ideas and architectures have been in place like, Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture, Big Data, and others, they present the idea that the data should be consolidated and grouped in one place. An idea of a single place as the united and true source of the data.

The image here shows the concept of grouping all data in a unique place as a final destiny.

The Software engineer area in recent years has shown that applications should be isolated to facilitate and improve application performance and facilitate maintenance. One of the ideas presented for the division would be the use of DDD and Microservices.

If we compare the data area with the development area we see that the two are saying just the opposite, one wants to unify and the other wants to divide.

And this is exactly the idea that DataMesh presents, it presents the idea that we should separate the data using the ideas of DDD and Microservices to generate smaller and simpler applications, where maintenance and performance are better.

Jeffrey T. Pollock — Webinar future data integration datamesh and Golden Gate Kafka

Image from https://martinfowler.com/articles/data-monolith-to-mesh.html

Looking at the Microservice Patterns we see that there are some Data-driven Patterns. And if we analyze them in more detail, we see that they all use or are linked to Stream Processing.

The image idea came from Microservice Architecture: aligning principles, practices, and culture. But this image is not in the book itself. Created by me inspired in https://medium.com/@madhukaudantha/microservice-architecture-and-design-patterns-for-microservices-e0e5013fd58a

The idea here is that all Designer Patterns related to Data we can apply Streaming and that tools like Apache Spark, Apache Flink, and Apache Kafka are the ones that are most in use today. There is an ecosystem around that with several other technologies as well.

I created a table where I put the principal options in the market for Streaming, but the idea here is that this is not a Silver Bullet and is just my vision. This is subjective.

Books that I used to compose my idea (Just want to remember that there are several articles, paper, and videos around these ideas)

Domain Drive Designer — Eric Evans
Microservice Architecture: aligning principles, practices, and culture — Irakli Nadareishvili, Ronnie Mitra, Matt McLarty & Mike Amundsen
Kubernetes Patterns: Reusable elements for designing Cloud Native applications — Bilgin Ibryam & Rolanb Hub
Designing Data Intensive Applications — Martin Kleppmann
The Fourth Industrial Revolution — Klas Schwab
The Inevitable — Kevin Kelly

Articles that I used to compose my idea:

Streaming as a Database

https://yokota.blog/2019/09/23/building-a-relational-database-using-kafka/

https://yokota.blog/2020/01/13/building-a-graph-database-using-kafka/

https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-transactions-sql-nosql-data-lake/

Event Driven & Data Mesh

http://jacekmajchrzak.com/event-driven-data-mesh-introduction/

Serverless era

https://blogs.oracle.com/cloud-infrastructure/serverless-big-data-pipelines-architecture

Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)

https://www.youtube.com/watch?v=v2RJQELoM6Y

The Kubernete native

https://medium.com/@graemecolman/the-new-kubernetes-native-d19dd4ae75a0

https://developers.redhat.com/blog/2020/05/11/top-10-must-know-kubernetes-design-patterns/

Microservices Patterns with GoldenGate

https://www.slideshare.net/jtpollock/microservices-patterns-with-goldengate

Webinar future dataintegration-datamesh-and-goldengatekafka

https://www.slideshare.net/jtpollock/webinar-future-dataintegrationdatameshandgoldengatekafka

Pondering Distributed Data Lakes Idea

https://www.youtube.com/watch?v=mnvxeU3oDyQ

What is a service mesh?

https://www.youtube.com/watch?v=QiXK0B9FhO0

Twitter Influencers:

@KaiWaehner
@bibryam
@gschmutz

Conclusion

Divider and conquer is the best way to start, it’s easy, cheaper and will save you time and money
Several companies failed when they tried to implement Big Data, Data Lake and Data warehouse because they tried to build something big and complex
Streaming is everywhere
We are in the multi-cloud and hybrid-cloud era
Serverless architecture is more and more trending
We shouldn’t be concerned with the name, but with the goal
I simplified it in a way that I passed the idea that
Yes, we are already in the Serverless era;
Yes, we are already in the era of native Kubernetes;
Yes, Streaming is a database;
Yes, Streaming is everywhere;
Yes, divide to conquer;
Yes, Microservice patterns;
Yes, Kubernetes Patterns is the plus;

My slide deck:

https://www.slideshare.net/IgorSouza137/data-engineer-patterns-architecture-the-future-deepdive-into-microservices-patterns-with-stream-process

Append: