As recently included in Apache Kafka and introduced in my previous blog, new MirrorMaker becomes the officially certified open-source tool that replicates data between two Kafka instances across datacenters.
To have the first-hand experience of new Mirrormaker, in this article, we will walk through the end-to-end deployment on local Kubernetes.
As a prerequisite, Minikube and an instance of Virtual Machine Monitor (e.g. VirtualBox, VMWare Fusion…) need to be installed on local before the following steps.
Note: the scripts used in the following may be used in a Kubernetes cluster, but do not warrant a production quality deployment
Step 1: start local Kubernetes
minikube start --driver=<driver_name> --kubernetes-version=v1.15.12 --cpus 4 --memory 8192
If use VirtualBox, will be "virtualbox"
Step 2: clone Kubernetes deployment scripts and spin up Kafka
Clone the repo (https://github.com/ning2008wisc/minikube-mm2-demo) and run the following commands to create namespace, 2 kafka instances
kubectl apply -f 00-namespace
kubectl apply -f 01-zookeeper
kubectl apply -f 02-kafka
kubectl apply -f 03-zookeeper
kubectl apply -f 04-kafka
Then verify that 2 kafka clusters are running, each with 3 nodes
kubectl config set-context --current --namespace=kafka
kubectl get pods
NAME READY STATUS RESTARTS AGE
kafka-0 1/1 Running 0 2m5s
kafka-1 1/1 Running 0 86s
kafka-2 1/1 Running 0 84s
kafka2-0 1/1 Running 0 119s
kafka2-1 1/1 Running 0 84s
kafka2-2 1/1 Running 0 82s
zookeeper-<hash> 1/1 Running 0 2m8s
zookeeper-backup-<hash> 1/1 Running 0 2m2s
Step 3: deploy new MirrorMaker
MirrorMaker will be deployed by Helm. Install Helm on local then initialize it as follows,
helm init --tiller-namespace kafka
kubectl create serviceaccount --namespace kafka tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kafka:tiller
kubectl patch deploy --namespace kafka tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
To minimize the footprint, MirrorMaker is deployed as a distributed and independent kubernetes service, rather than setting up a Kafka Connect cluster and deploying MirrorMaker via Kafka Connect REST interface.
cd kafka-mm
helm --tiller-namespace kafka install ./ --name kafka-mm
Check the log of MM 2 to make sure it is running properly
kubectl logs -f kafka-mm-<hash> -c kafka-mm-server
Step 4: test MirrorMaker with Kafka instances
Now, let’s produce something on the source kafka cluster (kafka-{0,1,2}
) and consume from the target cluster (kafka2-{0,1,2}
) to verify the data is mirrored simultaneously.
Open a new terminal, switch to kafka
namespace, login to the broker node of the source
Kafka cluster then start the console producer
kubectl exec -i -t kafka-0 -- /bin/bash
bash-4.4# unset JMX_PORT
bash-4.4# /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Open another new terminal, switch to kafka
namespace, login to the broker node of the target
Kafka cluster then start the console consumer
kubectl exec -i -t kafka2-0 -- /bin/bash
bash-4.4# unset JMX_PORT
bash-4.4# /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic primary.test
Now type in some random characters at console producer . It is expected to see the same characters at console consumer simultaneously.
Step 5: monitor MirrorMaker
To track the performance and healthiness, MirrorMaker exposes many metrics via JMX Beans. Here is how to quickly validate their raw format via port forwarding.
kubectl port-forward kafka-mm-<hash> 8081:8081
Open a local web browser and enter http://localhost:8081/ to view the relevant metrics in the raw and plain text format.
Conclusion
In the next few blogs, I plan to introduce more interesting topics around new MirrorMaker, including:
- exactly-once message delivery guarantee across datacenters
- tools to migrate from existing mirroring solutions to MM2
Please stay tuned for more articles!