Many people know that Amazon Web Services are one of the big players in the cloud computing business, and especially their Infrastructure as a Service offering EC2 is becoming increasingly popular. Few people know that EC2 is probably one of the biggest Xen installations deployed. But how many know how EC2 actually works and how the underlying architecture is constructed? I was curious and needed that kind of insight for my Master’s thesis, which deals with EC2 from a security perspective. The following notes were gathered out of plain curiosity and for academic purposes only. The notes are not complete and a lot of guessing is involved, so I might be wrong (please leave a comment if you think so).
As I said Amazon EC2 is probably one of the biggest Xen installations deployed. It is said that Amazon uses a heavily modified and adapted version of Xen, but unfortunately I was not able to gather information about exact version numbers. Dom0, the Xen management domain, can be either be based on Linux, NetBSD, or OpenSolaris. Based on the information I have gathered for the storage setup, I am very certain that it is Linux based. I do not know the version number of the used kernel. Amazon seems to be fond of RedHat Linux, so the Dom0 might be a RedHat Linux.
Amazon EC2 uses two different kinds of storage. One is local storage, known as Instance Storage, which is non-persistent and data will be lost after an instance terminates. The other kind is persistent, network-based storage called Elastic Block Store(EBS), which can be attached to running instances or also used as a persistent boot medium. I have excluded Amazon S3 from here. The information in this section were gathered from XenStore, which holds configuration information about all domains. A domain can read its own configuration information from XenStore using
xenstore-ls, which is part of the xen utils.
Instance Storage appears as 3 partitions to the VM: sda1 for root, sda2 for extra storage space (/mnt), and sda3 for swap. Typically the backend of these virtual block devices are based on loopback devices and/or LVM logical volumes. Logical volumes are considered to have a better performance and reliability compared to loopback devices. Therefore I was surprised that sda1 is using a loopback backend as noted as
node = "/dev/loop13" in the XenStore VBD entry. The actual file used by the loopback device is
params = "/mnt/instance_image_store_3/262768". The suffix “_3” of the directory is probably based on the local domain id. The numerical filename of the image is not the same as the AMI, which is surprising. I also do not know if
/mnt/instance_image_store_3 is actually a locally stored directory or if the images are mounted via e.g. NFS. The latter would make sense, because they only need to create a copy of the AMI image on the image storage server, maybe even using copy-on-write, and do not have to transfer the image over the network to the node.
The swap device is a LVM logical volume denoted as
params = "/dev/VolGroupDomU/instance_swap_store_3" in XenStore. The extra storage space /mnt is also using a logical volume backend:
params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_3". An interesting aspect is that they seem to use copy-on-write (cow) for this volume, but I don’t know why.
The characteristics of the Elastic Block Store (EBS) lead to the conclusion that it is probably a SAN-based setup. I have guessed that they probably use iSCSI volumes exported to the nodes, and I was surprised that they use something different: Global Network Block Device (GNBD). The backend device of a EBS is listed as something like
params = "/dev/gnbd89". Of course I don’t know how the block device exporter is designed and what kind of storage system they use on that side. Some more information about GNBD can be found here, which could also lead to the conclusion that Amazon uses RedHat’s cluster suite.
The networking setup of EC2 is quite unorthodox and I haven’t figured it out completely yet. Amazon uses a routed Xen network setup with DHCP providing private IP addresses to the VMs. A traceroute will show the private IP address of the router in Dom0, as well as the external IP address of Dom0. Furthermore, the network setup script is named
script = "/etc/xen/scripts/ec2-vif-route-dhcpd" in XenStore. A VM only has one interface with a private IP address and we have to assume that EC2 uses NAT to translate the external IP address to the internal one.
On L2, they also seem to use NAT, because the MAC address of all incoming and outgoing packets is
EF:FF:FF:FF:FF:FF. They are also preventing IP spoofing and ARP poisoning, which suggests that they do filtering on each virtual interface in Dom0 based on the L2/L3 address of that particular VM. Furthermore, the Security Groups of EC2 are probably also implemented in a similar way. I would not be surprised if Amazon uses ebtables and iptables in their Dom0.
Via XenStore one can also determine the name of a VM, which is something like
domain = "dom_32504936" in Amazon EC2. Based on a limited sample of domain names, it seems that the suffix number is incremental. Assuming that the domain name is unique throughout the entire lifetime of EC2, it would mean that Amazon has started 32.5 million VMs in EC2. The difference between the domain name suffixes of two instances started 24 hours apart was around 82000, which could lead to the conclusion that in that time period around 82000 VMs were started (assuming the suffix is actually incremental). It would be interesting to monitor the domain name suffixes and thereby tracking the utilization of EC2 over time, e.g. how many VMs are started in specific time frames.
So far I am satisfied with the information about the storage setup, but I will definitely need to get a better understanding of the networking. I will keep this post updated when I will gain new insights. If anyone has more information about any components, please leave a comment.