The world’s leading publication for data science, AI, and ML professionals.

Fault Tolerance & Redundant System with Seamless Integration to Development on AWS

Inspired by the Netflix Simian Army to construct a fully well-managed, fault-tolerant system using AWS Auto Scaling Groups.

Making Sense of Big Data

Inspired by the Netflix Simian Army to construct a fully well-managed, fault-tolerant system using AWS Auto Scaling Groups and CI/CD utilizing CodeDeploy and CodePipeline wielding GitHub as the source control.

Photo by Carl Oldenbourg on Unsplash
Photo by Carl Oldenbourg on Unsplash

Every architect has this dream of making a sophisticated system for their development environment. Making a Fault Tolerance system ensures high availability and redundancy to the system from a single point of failure. Nowadays many tools help to do this automatically like AWS Auto Scaling Groups and we need to employ it. I read this piece from Netflix Technology Blog and got motivated to write this blog. My blog includes a comprehensive end to end system starting from GitHub and ending on your browser. I have sketched an image so that you would get a clearer picture of the output from this post.

Shrestha, Sulabh. AWS UML Diagram. 2020. JPEG file.
Shrestha, Sulabh. AWS UML Diagram. 2020. JPEG file.

Pitfalls: This blog has a separate sections called pitfalls which shows where I have stumbled and where you can also slip and trip. While writing this blog I slipped on many of these pitfalls and spent many hours solving it. So, gaining knowledge where you can fall down is also important to prevent from floundering.


Netflix Technology Blog created a tool called Chaos Monkey, "a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact."[1](The Netflix Simian Army, 2011)

So basically we are going to replicate the same tool like Chaos Monkey and deploy it on our system. Their name of Chaos Monkey came from the unpredictability attribute of a monkey in which it disrupts everywhere it goes in the server, we do the complete opposite. We choose Lethargic Panda, a being who is so lazy that it forgot to update some servers on the cloud. We will put this Panda on the server and fix the problems created by it.


For this project, I will be using services of AWS like EC2 Instances, Launch Templates, Elastic IP and Auto Scaling Groups for Redundant and Fault Tolerance System, CodeDeploy, and CodePipeline for seamless code integration to the instances and GitHub as our code repository. Before starting we need to know what exactly are these services.

  • Amazon Elastic Compute Cloud (Amazon EC2) is a virtual machine service offering from Amazon which makes Cloud Computing easier, faster, and scalable with minimal cost according to the usage. [2] (Amazon EC2, n.d)
  • An Auto Scaling Group is a group of Amazon EC2 instances which is used for automatic scaling according to the parameter of the usage. Auto Scaling Group is mostly used to prevent any fault tolerance that may occur on the cloud. [3] (Auto Scaling groups, n.d)
  • A Launch Template is a configuration template that is used for Auto Scaling or as a singleton service to boot up EC2 instances. [4] (Launch configurations, n.d)
  • An Elastic IP is a static IPv4 address for dynamic IP addresses allocated by the EC2 instances. For Auto Scaling Group, Elastic IP is dynamically allocated to the currently running instances for low-to-no downtime. [5] (Elastic IP addresses, n.d)
  • AWS CodeDeploy is a deployment service that automatically deploys code into different Amazon services like Amazon EC2, Lambda, and on-premises servers. [6] (AWS CodeDeploy, n.d)
  • AWS CodePipeline is a Continuous Integration and Continuous Deployment service offered by Amazon that helps to bring code from different sources into AWS service. AWS CodePipeline also uses CodeDeploy on top to deploy code into the services configured on CodeDeploy. [7] (AWS CodePipeline, n.d)
  • GitHub is a hosting service for your code and provides a version control system from an open-source service called Git.[8] (GitHub, n.d)

Elastic IP

Let’s start building this system then. We will first allocate an Elastic IP Address from Elastic IP’s. Open your AWS console and then go into EC2 service and while scrolling on the left-hand side of the screen you will see Elastic IPs. Select it and then click on Allocate Elastic IP address.

Shrestha, Sulabh. Allocating an Elastic IP address 1. 2020. JPEF file.
Shrestha, Sulabh. Allocating an Elastic IP address 1. 2020. JPEF file.

After clicking on it you will be redirected to the next page.

Shrestha, Sulabh. Allocating an Elastic IP address 2. 2020. JPEG file.
Shrestha, Sulabh. Allocating an Elastic IP address 2. 2020. JPEG file.

Here, you have to enter the region you want the system to exist. My region is currently in North Virginia so it’s ‘us-east-1’ for me. You change this accordingly in the region you are/you want. Click on Allocate.

Shrestha, Sulabh. The elastic IP address is allocated. 2020. JPEG file.
Shrestha, Sulabh. The elastic IP address is allocated. 2020. JPEG file.

As you have allocated an Elastic IP address. Now from this, the important parts to note are the Allocated IPv4 Address and Allocation ID. This is because we need this to associate it with our Auto scaling groups.

Pitfall 1: It turns out that we cannot simply add the elastic IP address into the auto scaling groups nor the launch templates through the AWS console. It can only be done through the AWS CLI.

Now, we need a bash script for our instances which will allocate the address dynamically. You will need to make some small changes so make sure to copy the code given below on some text editor

After getting those Allocated IPv4 Address and Allocation ID, copy those to their appropriate places, line number 10 and 11. If you are on the region other than us-east-1, change the region in which you are in on line number 6(2), 10, and 11. Do not worry about line number 2–9 as they are part of AWS CodeDeploy to install code deploy-agent on the instances.

Pitfall 2: I have tested the above code and it works perfectly to this date but if something doesn’t work later on you need to change and the only way to do it make a instance and run this command one by one. If you find error – fix it, if not move to the next command and if everything works then bravo. Also you will get an error message on line number 10 because it is trying to disassociate an IP which is not even associated so you can ignore that.


Launch Templates

So far so good, Now we need to make a template for our auto-scaling groups to launch instances from. For this system, we will make a t2.micro instance with minimal setup just enough to run a web-page. On the same EC2 service, select Launch Templates and then click on Create launch template.

You can also use Launch Configuration but AWS prefers Launch Templates so better use the latter on.

Shrestha, Sulabh. Creating Launch Template 1. 2020. JPEG file.
Shrestha, Sulabh. Creating Launch Template 1. 2020. JPEG file.

After you are redirected to the next page you will be given a form to fill in the details. Fill it in with the images that I have provided. Some are straightforward and no need for an explanation but others need explanation and I will explain it with images.

Shrestha, Sulabh. Creating Launch Template 2. 2020. JPEG file.
Shrestha, Sulabh. Creating Launch Template 2. 2020. JPEG file.

This is a basic name and description with Tags. Fill those up accordingly.

Shrestha, Sulabh. Creating Launch Template 3. 2020. JPEG file.
Shrestha, Sulabh. Creating Launch Template 3. 2020. JPEG file.

I am selecting Amazon Linux 2 AMI on t2.micro with my default keypair.

Shrestha, Sulabh. Creating Launch Template 4. JPEG file.
Shrestha, Sulabh. Creating Launch Template 4. JPEG file.

Make a habit of adding tags. Like in the figure, put the tag names for your instances.

Pitfall 3: If you are not going to use graphics in the template then do not add Elastic graphics tags as it will give error in the CodeDeploy stage. So, it’s crucial to choose instance tags accordingly to the requirements.

Shrestha, Sulabh. Creating Launch Template 5. JPEG file.
Shrestha, Sulabh. Creating Launch Template 5. JPEG file.

Now, expand the Advanced details and click on Create a new IAM profile. You will be redirected to the IAM service. Make a policy with this JSON.

The EC2 policy is there for the instances to allocate Elastic IP from the command line.

Pitfall 4: You can see that there is S3 get and list access and you might be wondering why? That’s exactly what I thought. Somehow the CodeDeploy agent requires S3 get and list actions to make it work. Where can I see my errors and know where I get errors. If your instance has problems while deploying CodeDeploy agent you can always see the logs here:

  • less /var/log/aws/codedeploy-agent/codedeploy-agent.log From here you can debug for any future problems you might face.

After you make a policy, attach it to the new role that you make and then save it. Click refresh and then you will see the new role popping in the selection. The role name for me is EC2AddressRole.

Shrestha, Sulabh. Creating Launch Template 6. 2020. JPEG file.
Shrestha, Sulabh. Creating Launch Template 6. 2020. JPEG file.

Remember the script that you copied above. Now, it’s time for you to copy that script code back to the template on the User data.

Shrestha, Sulabh. Creating Launch Template 7. 2020. JPEG file.
Shrestha, Sulabh. Creating Launch Template 7. 2020. JPEG file.

Click on Create template version and View launch templates. Here you have successfully made a Launch template. Now onto the Auto Scaling Groups.


Auto Scaling Groups

Let’s make Auto Scaling Groups. Go to Auto Scaling Groups on the same EC2 service and then click on the Create an Auto Scaling group.

Shrestha, Sulabh. Creating Auto Scaling Group 1. 2020. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 1. 2020. JPEG file.

Now add Name and select the Template that we just made.

Shrestha, Sulabh. Creating Auto Scaling Group 2. 2020. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 2. 2020. JPEG file.

Select the subnets from the given region.

Shrestha, Sulabh. Creating Auto Scaling Group 3. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 3. JPEG file.

Pitfall 5: Be very careful while selecting the subnets. In my case, the t2.micro instance was not available on the us-east-1e which caused the auto scaling group to crash. You can always see the errors on the auto scaling group Activity section. Read it very carefully as it is very important to debug your system and to know what is happening overall.

Shrestha, Sulabh. Creating Auto Scaling Group 4. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 4. JPEG file.

Leave it to default.

Shrestha, Sulabh. Creating Auto Scaling Group 5. 2020. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 5. 2020. JPEG file.

Here, add your desired capacity i.e how many instances you want up and running. Select the minimum capacity always equal or less than the desired capacity and maximum equal or greater than the desired capacity. On the scaling policies, you can select it and you can add CPU utilization as your metric to auto-scale but for now, I will leave this as default.

Shrestha, Sulabh. Creating Auto Scaling Group 6. 2020. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 6. 2020. JPEG file.

Add notification on your email or phone if you want.

Shrestha, Sulabh. Creating Auto Scaling Group 7. 2020. JPEG file.
Shrestha, Sulabh. Creating Auto Scaling Group 7. 2020. JPEG file.

Pitfall 6: Do not depend on Launch Template tags as sometimes the tags aren’t written on the instances. So better make them on the Auto Scaling Group and play safe.

Add the tags as selected above. Click Next and then Create the auto-scaling group. You will see your instance popping up on your EC2 Dashboard.


CodeDeploy

Now the next step is to set up the CodeDeploy and CodePipeline to deploy your code from GitHub to these instances. Before that, you need to create a new role again, this time for CodeDeploy. Go to IAM and create a Role for CodeDeploy. This is straightforward: Go to Roles -> Create role -> Select CodeDeploy -> Select the first one (Allows CodeDeploy to call AWS services such as Auto Scaling on your behalf.) -> Next -> Give a Name ->Review and Done. Now, go to CodeDeploy from AWS services and Click on Create application from Application.

Shrestha, Sulabh. Setting up Code Deploy 1. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 1. 2020. JPEG file.

Write the Name and choose EC2 on the Compute Platform and then hit Create application.

Shrestha, Sulabh. Setting up Code Deploy 2. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 2. 2020. JPEG file.

Now, click on Create deployment group.

Shrestha, Sulabh. Setting up Code Deploy 3. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 3. 2020. JPEG file.

Add the Name and select the role that you just created from above. The role name for me is CodeDeployUnit4.

Shrestha, Sulabh. Setting up Code Deploy 4. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 4. 2020. JPEG file.

Select the Auto Scaling Groups as your setup for Deployment.

Shrestha, Sulabh. Setting up Code Deploy 5. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 5. 2020. JPEG file.

Disable Load balancing and Create a deployment group.

Shrestha, Sulabh. Setting up Code Deploy 6. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Deploy 6. 2020. JPEG file.

Your Deployment Group is now ready. Now let’s add some sample code to your Git repository.


CodePipeline

AWS has its own code ready to push from CodeDeploy and here is the bucket.

aws s3 cp s3://aws-codedeploy-us-east-1/samples/latest/SampleApp_Linux.zip . - region us-east-1

Pitfall 7: Do not try to upload you own code right now as you will see later that there are some hooks in itself and we can dig deeper in another blog so for right now stick with the default code given by AWS. Also care for the region while downloading from the bucket.

Add this code into your repository, private/public doesn’t matter. Now on the settings of the Developer Tools, select Connections

Shrestha, Sulabh. Setting up Code Pipeline 1. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 1. 2020. JPEG file.

Click on Create connection

Shrestha, Sulabh. Setting up Code Pipeline 2. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 2. 2020. JPEG file.

It will redirect you to the next page where you give the connection name. Add any name there.

Shrestha, Sulabh. Setting up Code Pipeline 3. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 3. 2020. JPEG file.

That will redirect you to the GitHub Page. Sign in or if you have already, then Allow Access to it. After that, you will see your connection on the Connections tab. Nice. Now Go to CodePipeline and click on Create pipeline.

Shrestha, Sulabh. Setting up Code Pipeline 4. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 4. 2020. JPEG file.

Add your name and leave everything to default and click Next.

Shrestha, Sulabh. Setting up Code Pipeline 5. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 5. 2020. JPEG file.

Select Github Version 2 and select the connection that you just made in Settings. Also, give the repository name and the branch name for it.

Shrestha, Sulabh. Setting up Code Pipeline 6. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 6. 2020. JPEG file.

Skip the Build stage as we are using vanilla HTML code. Now, select the Application Name and Deployment group that you just made and click Next and Create Pipeline.

Shrestha, Sulabh. Setting up Code Pipeline 7. 2020. JPEG file.
Shrestha, Sulabh. Setting up Code Pipeline 7. 2020. JPEG file.

Result

Now, the only thing left to do is a test. On the HTML page edit some lines and push the code. As soon as you push the code the code pipeline will run and then deploy it to the auto-scaling groups which will be shown through on single elastic IP.

Pitfall 8: This code will run but of course you will run into deployment problem later when you put your own code. So for that I have come prepared. https://docs.aws.amazon.com/codedeploy/latest/userguide/error-codes.html CodeDeploy has their own errors and you can sort them using this link above. Also check Pitfall 4 as the errors can also be found there

Pitfall 9: Sometimes everything is right from start to top but the CodeDeploy fails. You may laugh but the solution here is sudo service codedeploy-agent restart Yup, restart sometimes fixes all errors so also use this while debugging

Shrestha, Sulabh. Result of Code Pipeline 1. 2020. JPEG file.
Shrestha, Sulabh. Result of Code Pipeline 1. 2020. JPEG file.

Now let’s see on the elastic IP too. I will commit my name into the index.html file.

Shrestha, Sulabh. Result of Code Pipeline 2. 2020. JPEG file.
Shrestha, Sulabh. Result of Code Pipeline 2. 2020. JPEG file.

I added my name below the boilerplate text and it seems to work. Yayyy!


Lethargic Panda

Without further ado let’s call our Lethargic Panda into action. For its prerequisites you just need an AWS CLI installed, AWS IAM User with EC2 access role, and add it to the machine that you will be working on.

Run the command and you can see that this panda caused one of your instances to be terminated. But fear, not our Auto Scaling Group has got it covered. Now check your Elastic IP to check if your site is down or now. It’s running as it should be.

Shrestha, Sulabh. The result from Lethargic Panda. 2020. JPEG file.
Shrestha, Sulabh. The result from Lethargic Panda. 2020. JPEG file.

Conclusion

I really had fun doing making this system that flow the code smoothly and is fault-tolerant. This is just the beginning as we can employ other technology like running a machine learning code through this and using S3 for saving models, hosting a Node/React app while making use of the build step on code pipeline, using code pipeline to push GitHub code to other than CodeDeploy, make use of Deployment group on AWS Lambda but let’s talk specifically about the use of this in data science.

Pitfall 10: This is more of a story of what I used to do. While deploying machine learning code for making up the models I used to push code into GitHub and then open the instances, pull the latest code, run the code and generate the models. Sometimes while making small changes to the code I used to directly change it on the instances which sometimes caused the discrepancy between two local repositories. Implementing this, not only while making some websites, but also on big data science, machine learning projects will certainly be accelerated and the rate of workflow will be increased. Doing these steps make you focus more on the research aspect for the model and less on the configurations because doing data science is all about experimenting and delivering.

Pitfall 11: After generating machine learning models you can automatically sync it to S3 using this command on the cronjob of the linux system. aws s3 sync s3:// This will make your models highly available on a single path. No need to go into the instances, search and download the path.

Still, the choices are endless and it’s up to you for what you want to make it. If you encounter any problems or have difficulty following the steps, comment below on this post or message me at [email protected]. You can also connect with me at Linkedin and GitHub or subscribe to my newsletter for awesome blogs.

Sulabh Shrestha


References

[1] Netflix Technology Blog, The Netflix Simian Army (2011), Medium.

[2] Amazon EC2, Retrieved from https://aws.amazon.com/ec2 (n.d), Amazon Official Website.

[3] Auto Scaling groups, Retrieved from https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html (n.d), Amazon Official Website.

[4] Launch configurations, Retrieved from https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchConfiguration.html (n.d), Amazon Official Website.

[5] Elastic IP addresses, Retrieved from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html (n.d), Amazon Official Website.

[6] AWS CodeDeploy, Retrieved from https://aws.amazon.com/codedeploy/ (n.d), Amazon Official Website.

[7] AWS CodePipeline, Retrieved from https://aws.amazon.com/codepipeline/ (n.d), Amazon Official Website.

[8] Github, Retrieved from https://github.com/ (n.d), GiHub Inc.


Related Articles