“My data scientist doesn’t know how to properly start an EC2 instance”.

VPC, subnets or Internet Gateways refer to components that are at the heart of any AWS setup. However, data scientists often assume that it will be configured for them. Let’s introduce the most important concepts and gain a clear understanding of their roles.

Louis Douge
Towards Data Science

--

Singapore’s CBD using Neural Style Transfer, using this Code and Paper.

In my previous post, we set up a cloud infrastructure for a resilient and data-intensive app, at a high level as shown below. I recommend going through this post if you don’t have a good understanding of AWS regions, AZs or High Availability zones.

Example of a cloud architecture from our previous post

Let’s use this example to explain the underlying Networking and Security concepts. Feel free to go to the AWS console and start the resources (most are free to start) as we present them. After reading this article, you’ll be able to understand what you are actually doing when setting up an EC2 instance. More than that, it will also enable you to configure other AWS services and link them together. Also, parts are reasonably independent so feel free to jump straight to the concepts as the need arises.

I. Isolating your infrastructure

  1. VPC
  2. Subnet

II. Communicating between your instances and networks

  1. Elastic Network Interfaces
  2. Internet Gateway
  3. Route table

III. Security matters

  1. Security groups
  2. NACLs

Let’s get started!

Isolating your infrastructure

First, remember that the Internet is fundamentally a huge network where nodes are made of servers or computers like yours, identified by a public Internet Protocol address or IP (like 104.16.122.127). These nodes communicate between themselves using the Internet Protocol Suite (TCP/IP).

Make sure to familiarize yourself a bit with these concepts before continuing.

When creating an app in the cloud such as the one presented above, we have exclusive access to a network residing inside a larger public network, created by the cloud provider. This is called a Virtual Private Cloud (VPC) and is the first thing to set up. This is the network inside which all our AWS resources will be running.

Inside the huge network that is the Internet, you get your own private network through a cloud provider.

VPC

Initially and by default, a VPC is isolated from any other network. It must reside in one AWS region. Then, you need to select some IP addresses that will be used to identify your instances running inside the VPC. This must be done when you start building your infrastructure. For instance, using the Classless Inter-Domain Routing (CIDR) notation, you could set aside these IP addresses (or CIDR block) for your future use: 176.13.0.0/16. This syntax designates all the IP addresses from 176.13.0.0 to 176.13.255.255.

To make sure you do not overlap these addresses with public IP addresses currently in use on the Internet (like the one of medium.com for instance), you should choose them from the RFC1918 range. There shouldn’t be an overlap with an existing VPC as well if you intend to connect the two! Choosing your IP addresses range is important as you won’t be able to modify it once your VPC is created. So think about it carefully!

Note that we’ve used IPv4 addresses in our examples, similar concepts exist for the newer (1998…) IPv6 addresses.

Subnet

It is not yet time to start our EC2 instances in the VPC we’ve just created: we need another layer. Each instance must be associated with a subnet, which is simply a logical subnetwork inside a VPC. It is defined using a CIDR block, which has to be a subset of the VPC’s CIDR block. For instance, we could define a subnet in the previously created VPC using this CIDR block: 176.13.0.0/24. It is indeed a subset of 176.13.0.0/16.

Why do we need such subnetworks? There are three main reasons a cloud architect keeps in mind when building an app:

  • Isolating instances from each other: one public subnet could be used for webservers communicating directly with the Internet, while a private subnet could be used to communicate only with webservers.
  • Controlling traffic flows between instances: it is easier to monitor your network and prevent attacks when you have a segmented architecture. It can also boost performance by allowing traffic to only reach some subnets.
  • Keeping things organized by assigning specific functions for each subnet: one subnet could be dedicated to storage for instance with services like RDS.

Last but not least, you should remember that each subnet can only span one Availability Zone (AZ). As we’ve seen in the previous post, that means resources inside a single subnet are vulnerable to a local failure (as an AZ corresponds to a physical data center). To make your application Highly Available (HA), you then need to replicate subnets across Availability Zones but still inside the same VPC.

So let’s see below what it could look like in the app we previously designed.

Basic setup with a single VPC covering one IPv4 block and one IPv6 block, with two subnets spanning two AZs.

We have now a proper setup for our network. Now, how do we make our instances communicate with each other?

Communication between your instances and networks

Elastic Network Interfaces

Elastic Network Interface (ENI) is what enables the communication between instances of a network to happen. It is therefore mandatory for any instance to have at least one: it is the link between your instance and one subnet. IP addresses are actually bound to ENI, not instances!

Most of the time, we omit these ENI because they link a single instance to a single subnet. So our previous architecture should actually look like this (viewing only one AZ for simplicity):

IPs are bound to ENIs, which in turn enable communication with instances.

ENIs are nonetheless useful to redirect traffic when an instance fails. Consider the case below with two instances:

In the case of instance failure, the same IP address can be used while a different instance is used to do the processing job.

If instance 1 fails, its ENI can be attached to instance 2 as a secondary ENI. From the outside, the traffic still reaches the same IP address, but it is actually processed by another instance behind the hood.

So now we can communicate inside our subnets, how about communicating with the Internet?

Internet Gateway

For this, we use an Internet Gateway (IG) as a middle man, attached to a subnet. It will give an instance:

  • a public IP address, different from the private IP address identifying our instance inside the subnet (so instances can have two IPs — public and private!),
  • a connection to the Internet to query web clients, for instance,
  • the ability to receive requests from the Internet

An IG is associated with a single VPC. You recognize it thanks to its ID on AWS starting with igw-xxx .

Now, note that we’ve never mentioned how exactly traffic should be flowing between instances, between subnets, between instances and the Internet, etc. That’s where the route table intervenes. It will specify routes from your instances to other instances and to the outside world.

Route Tables

The first thing you should keep in mind about route tables is that they are linked to one or more subnets. Route tables describe how traffic from the instances of the subnet(s) should be redirected. By route, we mean the following two things:

  • A destination expressed as an IP address in CIDR notation
  • A target which is an AWS network resource like an IG, ENI, local or a NAT device (read on to know what is this ;)

At first, it seemed confusing to me that a route is made of only a destination. It is actually because the origin is the whole subnet associated with the route table.

Equally confusing: what is the difference between a destination and a target? The destination is the final IP of where you want your packet (data) to end up, while the target is where the packet should go next in order to get closer to this final destination.

There are in particular two routes that are important and that you will see almost all the time:

  • The local route allows instances from one subnet to communicate with other instances of a different subnet, inside your VPC.
  • The default route pointing to the Internet Gateway, giving Internet access to the subnet associated with the route table.

In our app, we could typically configure the following table for our first subnet:

----- Example of a Route table for a webserver ----
+-------------------------+------------------------+
| Destination | Target |
+-------------------------+------------------------+
| 2001:db8:1234:1a00::/56 | local |
| 0.0.0.0/0 | igw-0e533011g0frrd318 |
| ::/0 | igw-0e533011g0frrd318 |
+-------------------------+------------------------+

You can see that the first route redirects traffic from the first subnet we created to the second one, defined by 2001:db8:1234:1a00::/56. The last two lines simply create an Internet connection with our first subnet by allowing all addresses (“0.0.0.0/0 in IPv4 and “::/0 in IPv6 refer to all addresses — public and those inside your VPC) to be reached through the IG. The subnet becomes a public subnet thanks to these two lines. Without these connections to the Internet Gateway, it would be a private subnet.

Let’s update our app’s cloud architecture with these new elements:

You can see that the upper subnet is public while the one containing the databases is private. Note that the private subnet can still access indirectly the Internet through the public subnet.

What if now you want to control the traffic going to your instances in a detailed manner? Like which protocol is allowed to reach your instances, on what port, etc.?

Security matters

Fine-grained control of what is reaching your infrastructure is achieved mainly through the use of Security groups at the instance level, and Network Access Control Lists (NACL) at the subnet level.

Security groups

A Security group is basically a firewall controlling traffic to and from an instance that has been linked to this Security group. It does this by specifying inbound and outbound rules, made of the following:

  • the source for inbound rules (respectively destination for outbound rules) of the packet expressed as a CIDR block or another security group’s ID
  • the Protocol used to transfer packets to and from the instance (such as TCP)
  • Port range, which specifies through which ports packets are transiting

Let’s take the example of an EC2 instance that you want to use as a web server. In this case, you would allow every IP address to be a source as you want your users to access your app, right? In this case, the protocol used to establish a connection between an Internet user and the instance would be TCP while the port range is 443, typically used for HTTPS data transfers. Now, you may also want to take control of your instance from a remote terminal to do some maintenance work. You will do this over SSH, and from a specific source (at 197.52.101.10 for instance). With this in mind, the inbound rules of your security group look like this:

- Example of Security group Inbound rules -
+------------------+----------+------------+
| Source | Protocol | Port range |
+------------------+----------+------------+
| 0.0.0.0/0 | TCP | 443 |
| ::/0 | TCP | 443 |
| 197.52.101.10/32 | TCP | 443 |
+------------------+----------+------------+

Note that by default, all outside communications are denied. You need to specifically whitelist sources that are going to communicate with your instance.

On the other hand, all addresses are allowed by default when it comes to outbound communications. It means that your default outbound rules look like this:

Default Security Group's Outbound rules 
+-------------+----------+------------+
| Destination | Protocol | Port range |
+-------------+----------+------------+
| 0.0.0.0/0 | TCP | 443 |
| ::/0 | TCP | 443 |
+-------------+----------+------------+

Finally, there is an important concept to remember about security groups: it is said to be stateful. This means that if some traffic is allowed in one direction, then it also allows some reply traffic in the opposite direction. This is important when communicating with Internet clients: our instance may try to send a packet to a web server (like a GET request) and expect an answer. Our security group will allow the answer to go through.

While Security groups control traffic at the instance level, another service is used to more conveniently control traffic between subnets: NACL.

NACLs

Network Access Control Lists are comparable to Security groups in the sense that they control inbound and outbound traffic using rules. However, one NACL is attached to a subnet and controls traffic that may enter or exit the entire subnet. For traffic between instances of a subnet, Security groups are preferred.

Similarly to a Security group, NACL implements rules characterized by the following elements:

  • Rule number: an integer used to determine the order in which rules are applied. Lower rule numbers are applied first.
  • Protocol: same field as for Security groups
  • Port range: same field as for Security groups
  • Source for inbound rules (or destination for outbound rules): same field as for Security groups
  • Action: either ALLOW or DENY communications with the source or destination

When creating a subnet, a default NACL is attached to it and can be modified further. Its default inbound rules look like this:

--------------- Default NACL Outbound rules -----------------
+-------------+-----------+------------+-----------+---------+
| Rule Number | Protocol | Port Range | Source | Action |
+-------------+-----------+------------+-----------+---------+
| 100 | All | All | 0.0.0.0/0 | Allow |
| * | All | All | 0.0.0.0/0 | Deny |
+-------------+-----------+------------+-----------+---------+

Beware that all inbound traffic is allowed by default which was not the case with Security groups! Note also the last row with an asterisk as a rule number: it is a default rule that cannot be modified or removed. It denies any traffic not explicitly allowed by the previous rules.

There is one NACLs’ characteristic that can lead to some bugs and is, therefore, worth remembering: NACLs are stateless. This means it won’t automatically return traffic: you need to specify the outbound rule corresponding to the inbound rule that allowed the communication to happen in the first place. Or you could use ephemeral ports but that’s another story!

NACLs cover entire subnets while Security Groups operate at the instance level (note that DynamoDB does not have a Security Group, it uses IAM roles instead…)

Conclusion

We went through the fundamental concepts needed to properly start AWS services. We saw how to arrange instances into subnets depending on their utilization. We also enabled communication between instances and communication with the Internet as well (IG, Route Table). Finally, we protected our infrastructure by specifying which traffic should be allowed to reach our instances (Security groups, NACL).

Next steps

Bear in mind that this post only scratched the surface of what it takes to understand cloud architectures. As a next step, it would be interesting for you to play around the AWS UI, creating your own free resources (using free-tier services). Then, there are numerous subtleties that you can learn about by reading the documentation or following the new services regularly launched by AWS:

  • For instance, you can still assign IP addresses to a VPC using a secondary CIDR block after creating the VPC (as announced here). This is to allow scaling on-the-go when you need more resources.
  • It is better to restrict outbound traffic using Security groups as with updates, ports may change and NACLs don’t allow dynamic change to it.
  • You can have exclusive access and use of a range of public IP addresses using Elastic IP addresses.
  • Simply re-route traffic from a private subnet to a NAT device in order to further protect your sensitive instances (databases, etc.)
  • and much more…

As always thanks for reading, and please do not hesitate to provide feedback or comments for me to also improve my understanding of the subject.

--

--