Data Engineer, Patterns & Architecture The future

Deep-dive into Microservices Patterns with Stream Process

Igor De Souza
Towards Data Science

--

Image created by me http://www.igfasouza.com/

TL;DR

With Industry 4.0, several technologies are used to have data analysis in real-time, maintaining, organizing, and building this, on the other hand, is a complex and complicated job. Over the past 30 years, we saw several ideas to centralize the database in a single place as the united and true source of data has been implemented in companies, such as Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture.

On the other hand, Software Engineering has been applying ideas to separate applications to facilitate and improve application performance, such as microservices.

The idea is to use the MicroService patterns on the date and divide the model into several smaller ones. And a good way to split it up is to use the model using the DDD principles. And that’s how I try to explain and define DataMesh & Data Fabric.

It is worth mentioning here that I simplified the concept and idea of ​​the Data Mesh & Data Fabric just as I simplified the concept and idea of Streaming & Kafka.

Idea

I was invited to do a talk in a Data Engineer meetup and with that, I came with this idea to show my vision of Data Mesh.

https://www.meetup.com/engenharia-de-dados/events/271280539/

Industry 4.0

Image inspired by https://aethon.com/mobile-robots-and-industry4-0/

In the last years, several ideas and architectures have been in place like, Data wareHouse, NoSQL, Data Lake, Lambda & Kappa Architecture, Big Data, and others, they present the idea that the data should be consolidated and grouped in one place. An idea of ​​a single place as the united and true source of the data.

Image created by me

The image here shows the concept of grouping all data in a unique place as a final destiny.

The Software engineer area in recent years has shown that applications should be isolated to facilitate and improve application performance and facilitate maintenance. One of the ideas presented for the division would be the use of DDD and Microservices.

Image created by me

If we compare the data area with the development area we see that the two are saying just the opposite, one wants to unify and the other wants to divide.

And this is exactly the idea that DataMesh presents, it presents the idea that we should separate the data using the ideas of DDD and Microservices to generate smaller and simpler applications, where maintenance and performance are better.

Jeffrey T. Pollock — Webinar future data integration datamesh and Golden Gate Kafka
Image from https://martinfowler.com/articles/data-monolith-to-mesh.html

Looking at the Microservice Patterns we see that there are some Data-driven Patterns. And if we analyze them in more detail, we see that they all use or are linked to Stream Processing.

The image idea came from Microservice Architecture: aligning principles, practices, and culture. But this image is not in the book itself. Created by me inspired in https://medium.com/@madhukaudantha/microservice-architecture-and-design-patterns-for-microservices-e0e5013fd58a

The idea here is that all Designer Patterns related to Data we can apply Streaming and that tools like Apache Spark, Apache Flink, and Apache Kafka are the ones that are most in use today. There is an ecosystem around that with several other technologies as well.

Image created by me

I created a table where I put the principal options in the market for Streaming, but the idea here is that this is not a Silver Bullet and is just my vision. This is subjective.

Image created by me

Books that I used to compose my idea (Just want to remember that there are several articles, paper, and videos around these ideas)

  • Domain Drive Designer — Eric Evans
  • Microservice Architecture: aligning principles, practices, and culture — Irakli Nadareishvili, Ronnie Mitra, Matt McLarty & Mike Amundsen
  • Kubernetes Patterns: Reusable elements for designing Cloud Native applications — Bilgin Ibryam & Rolanb Hub
  • Designing Data Intensive Applications — Martin Kleppmann
  • The Fourth Industrial Revolution — Klas Schwab
  • The Inevitable — Kevin Kelly
Image created by me

Articles that I used to compose my idea:

  • Streaming as a Database

https://yokota.blog/2019/09/23/building-a-relational-database-using-kafka/

https://yokota.blog/2020/01/13/building-a-graph-database-using-kafka/

https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-transactions-sql-nosql-data-lake/

  • Event Driven & Data Mesh

http://jacekmajchrzak.com/event-driven-data-mesh-introduction/

  • Serverless era

https://blogs.oracle.com/cloud-infrastructure/serverless-big-data-pipelines-architecture

  • Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)

https://www.youtube.com/watch?v=v2RJQELoM6Y

  • The Kubernete native

https://medium.com/@graemecolman/the-new-kubernetes-native-d19dd4ae75a0

https://developers.redhat.com/blog/2020/05/11/top-10-must-know-kubernetes-design-patterns/

  • Microservices Patterns with GoldenGate

https://www.slideshare.net/jtpollock/microservices-patterns-with-goldengate

  • Webinar future dataintegration-datamesh-and-goldengatekafka

https://www.slideshare.net/jtpollock/webinar-future-dataintegrationdatameshandgoldengatekafka

  • Pondering Distributed Data Lakes Idea

https://www.youtube.com/watch?v=mnvxeU3oDyQ

  • What is a service mesh?

https://www.youtube.com/watch?v=QiXK0B9FhO0

Twitter Influencers:

@KaiWaehner
@bibryam
@gschmutz

Conclusion

  • Divider and conquer is the best way to start, it’s easy, cheaper and will save you time and money
  • Several companies failed when they tried to implement Big Data, Data Lake and Data warehouse because they tried to build something big and complex
  • Streaming is everywhere
  • We are in the multi-cloud and hybrid-cloud era
  • Serverless architecture is more and more trending
  • We shouldn’t be concerned with the name, but with the goal
  • I simplified it in a way that I passed the idea that
    Yes, we are already in the Serverless era;
    Yes, we are already in the era of native Kubernetes;
    Yes, Streaming is a database;
    Yes, Streaming is everywhere;
    Yes, divide to conquer;
    Yes, Microservice patterns;
    Yes, Kubernetes Patterns is the plus;

My slide deck:

https://www.slideshare.net/IgorSouza137/data-engineer-patterns-architecture-the-future-deepdive-into-microservices-patterns-with-stream-process

Append:

  • Big Data

http://www.igfasouza.com/blog/what-is-big-data/

  • Apache Spark

http://www.igfasouza.com/blog/what-is-apache-spark/

  • Apache Kafka

http://www.igfasouza.com/blog/what-is-kafka/

  • Stream processing

http://www.igfasouza.com/blog/what-is-stream-processing/

  • Data warehouse

https://www.oracle.com/ie/database/what-is-a-data-warehouse/

  • Data lake

http://www.igfasouza.com/blog/what-is-data-lake/

  • Data lakehouse

https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html

  • Industry 4.0

https://www.kai-waehner.de/blog/2020/04/21/apache-kafka-as-data-historian-an-iiot-industry-4-0-real-time-data-lake/

https://www.forbes.com/sites/bernardmarr/2018/09/02/what-is-industry-4-0-heres-a-super-easy-explanation-for-anyone/#166bad289788

  • Data Fabric

https://www.forrester.com/report/Now+Tech+Enterprise+Data+Fabric+Q2+2020/-/E-RES157315#

  • DataMesh

https://martinfowler.com/articles/data-monolith-to-mesh.html

  • Microservice

https://microservices.io/

--

--

Developer Advocate, Big Data Evangelist. #BigData #DataLake #IoT #AppDev Hadoop & Spark Developer