Introduction to containers and Docker
In recent times, containers have become a common lingua franca in the technology world, and it's difficult to imagine a world where, just a mere few years ago, only a small portion of the technology community had even heard about containers.
To trace the origins of containers, you need to rewind way back to 1979, when Unix V7 introduced the chroot system call. The chroot system call provided the ability to change the root directory of a running process to a different location in the file system, and was the first mechanism available to provide some form of process isolation. chroot was added to the Berkeley Software Distribution (BSD) in 1982 (this is an ancestor of the modern macOS operating system), and not much more happened in terms of containerization and isolation for a number of years, until a feature called FreeBSD Jails was released in 2000, which provided separate environments called "jails" that could each be assigned their own IP address and communicate independently on the network.
Later, in 2004, Solaris launched the first public beta of Solaris Containers (which eventually became known as Solaris Zones), which provided system resource separation by creating zones. This was a technology I remember using back in 2007 to help overcome a lack of expensive physical Sun SPARC infrastructure and run multiple versions of an application on a single SPARC server.
In the mid 2000s, a lot more progress in the march toward containers occurred, with Open Virtuozzo (Open VZ) being released in 2005, which patched the Linux kernel to provide operating system level virtualization and isolation. In 2006, Google launched a feature called process containers (which was eventually renamed to control groups or cgroups) that provided the ability to restrict CPU, memory, network, and disk usage for a set of processes. In 2008, a feature called Linux namespaces, which provided the ability to isolate different types of resources from each other, was combined with cgroups to create Linux Containers (LXC), forming the initial foundation to modern containers as we know them today.
In 2010, as cloud computing was starting to gain popularity, a number of Platform-as-a-Service (PaaS) start-ups appeared, which provided fully managed runtime environments for specific application frameworks such as Java Tomcat or Ruby on Rails. One start-up called dotCloud was quite different, in that it was the first "polyglot" PaaS provider, meaning that you could run virtually any application environment you wanted using their service. The technology underpinning this was Linux Containers, and dotCloud added a number or proprietary features to provide a fully managed container platform for their customers. By 2013, the PaaS market had well and truly entered the Gartner hype cycle (https://en.wikipedia.org/wiki/Hype_cycle) trough of disillusionment, and dotCloud was on the brink of financial collapse. One of the co-founders of the company, Solomon Hykes, pitched an idea to the board to open source their container management technology, sensing that there was huge potential. The board disagreed, however Solomon and his technical team proceeded regardless, and the rest, as they say, is history.
After announcing Docker as a new open source container management platform to the world in 2013, Docker quickly rose in prominence, becoming the darling of the open source world and vendor community alike, and is likely one of the fastest growing technologies in history. By the end of 2014, during which time Docker 1.0 was released, over 100 million Docker containers had been downloaded – fast forward to March 2018, and that number sat at 37billion downloads. At the end of 2017, container usage amongst Fortune 100 companies sat at 71%, indicating that Docker and containers have become universally accepted for both start-ups and enterprises alike. Today, if you are building modern, distributed applications based upon microservice architectures, chances are that your technology stack will be underpinned by Docker and containers.
Why containers are revolutionary
The brief and successful history of containers speaks for itself, which leads to the question, why are containers so popular? The following provides some of the more important answers to this question:
- Lightweight: Containers are often compared to virtual machines, and in this context, containers are much more lightweight that virtual machines. A container can start up an isolated and secure runtime environment for your application in seconds, compared with the handful of minutes a typical virtual machine takes to start. Container images are also much smaller than their virtual machine counterparts.
- Speed: Containers are fast – they can be downloaded and started within seconds, and within a few minutes you can test, build, and publish your Docker image for immediate download. This allows organizations to innovate faster, which is critical in today's ever increasing competitive landscape.
- Portable: Docker makes it easier than ever to run your applications on your local machine, in your data center, and in the public cloud. Because Docker packages are complete runtime environments for your application complete with operating system dependencies and third-party packages, your container hosts don't required any special prior setup or configuration specific to each individual application – all of these specific dependencies and requirements are self-contained within the Docker image, making comments like "But it worked on my machine!" relics of the past.
- Security: There has been a lot of debate about the security of containers, but in my opinion, if implemented correctly, containers actually offer greater security than non-container alternative approaches. The main reason for this is that containers express security context very well – applying security controls at the container level typically represents the right level of context for those controls. A lot of these security controls are provided by "default" – for example, namespaces are inherently a security mechanism in that they provide isolation. A more explicit example is that they can apply SELinux or AppArmor profiles on a per container basis, making it very easy to define different profiles depending on specific security requirements of each container.
- Automation: Organizations are adopting software delivery practices such as continuous delivery, where automation is a fundamental requirement. Docker natively supports automation – at its core, a Dockerfile is an automation specification of sorts that allows the Docker client to automatically build your containers, and other Docker tools such as Docker Compose allow you express connected multi-container environments that you can automatically create and tear down in seconds.
Docker architecture
As discussed in the preface of this book, I assume that you have at least a basic working knowledge of Docker. If you are new to Docker, then I recommend that you supplement your learning in this chapter by reading the Docker overview at https://docs.docker.com/engine/docker-overview/, and running through some of the Docker tutorials at https://docs.docker.com/get-started/.
The Docker architecture includes several core components, as follows:
- Docker Engine: This provides several server code components for running your container workloads, including an API server for communications with Docker clients, and the Docker daemon that provides the core runtime of Docker. The daemon is responsible for the complete life cycle of your containers and other resources, and also ships with built-in clustering support to allow you to build clusters or swarms of your Docker Engines.
- Docker client: This provides a client for building Docker images, running Docker containers, and managing other resources such as Docker volumes and Docker networks. The Docker client is the primary tool you will work with when using Docker, and interacts with both the Docker Engine and Docker registry components.
- Docker registry: This is responsible for storing and distributing Docker images for your application. Docker supports both public and private registries, and the ability to package and distribute your applications via a Docker registry is one of the major reasons for Docker's success. In this book, you will download third-party images from Docker Hub, and you will store your own application images in the private AWS registry service called Elastic Container Registry (ECR).
- Docker Swarm: A swarm is a collection of Docker Engines that form a self-managing and self-healing cluster, allowing you to horizontally scale your container workloads and provide resiliency in the event of Docker Engine failures. A Docker Swarm cluster includes a number of master nodes that form the cluster control plane, and a number of worker nodes that actually run your container workloads.
When you work with the preceding components, you interact with a number of different types of objects in the Docker architecture:
- Images: An image is built using a Dockerfile, which includes a number of instructions on how to build the runtime environment for your containers. The result of executing each of these build instructions is stored as a set of layers and is distributed as a downloadable and installable image, and the Docker Engine reads the instructions in each layer to construct a runtime environment for all containers based on a given image.
- Containers: A container is the runtime manifestation of a Docker image. Under the hood, a container is comprised of a collection of Linux namespaces, control groups, and storage that collectively create an isolated runtime environment form which you can run a given application process.
- Volumes: By default, the underlying storage mechanism for containers is based upon the union file system, which allows a virtual file system to be constructed from the various layers in a Docker image. This approach is very efficient in that you can share layers and build up multiple containers from these shared layers, however this does have a performance penalty and does not support persistence. Docker volumes provide access to a dedicated pluggable storage medium, which your containers can use for IO intensive applications and to persist data.
- Networks: By default, Docker containers each operate in their own network namespace, which provides isolation between containers. However, they must still provide network connectivity to other containers and the outside world. Docker supports a variety of network plugins that support connectivity between containers, which can even extend across Docker Swarm clusters.
- Services: A service provides an abstraction that allows you to scale your applications by spinning up multiple containers or replicas of your service that can be load balanced across multiple Docker Engines in a Docker Swarm cluster.
Running Docker in AWS
Along with Docker, the other major technology platform we will target in this book is AWS.
AWS is the world's leading public cloud provider, and as such offers a variety of ways to run your Docker applications:
- Elastic Container Service (ECS): In 2014, AWS launched ECS, which was the first dedicated public cloud offering that supported Docker. ECS provides a hybrid managed service of sorts, where ECS is responsible for orchestrating and deploying your container applications (such as the control plane of a container management platform), and you are responsible for providing the Docker Engines (referred to as ECS container instances) that your containers will actually run on. ECS is free to use (you only pay for the ECS container instances that run your containers), and removes much of the complexity of managing container orchestration and ensuring your applications are always up and running. However, this does require you to manage the EC2 infrastructure that runs your ECS container instances. ECS is considered Amazon's flagship Docker service and as such will be the primary service that we will focus on in this book.
- Fargate: Fargate was launched in late 2017 and provides a fully managed container platform that manages both the ECS control plane and ECS container instances for you. With Fargate, your container applications are deployed onto shared ECS container instance infrastructures that you have no visibility of which AWS manages, allowing you to focus on building, testing, and deploying your container applications without having to worry about any underlying infrastructure. Fargate is a fairly new service that, at the time of writing this book, has limited regional availability, and has some constraints that mean it is not suitable for all use cases. We will cover the Fargate service in Chapter 14, Fargate and ECS Service Discovery.
- Elastic Kubernetes Service (EKS): EKS launched in June 2018 and supports the popular open source Kubernetes container management platform. EKS is similar to ECS in that it is a hybrid managed service where Amazon provides fully managed Kubernetes master nodes (the Kubernetes control plane), and you provide Kubernetes worker nodes in the form of EC2 autoscaling groups that run your container workloads. Unlike ECS, EKS is not free and at the time of writing this book costs 0.20c USD per hour, plus any EC2 infrastructure costs associated with your worker nodes. Given the ever growing popularity of Kubernetes as a cloud/infrastructure agnostic container management platform, along with its open source community, EKS is sure to become very popular, and we will provide an introduction to Kubernetes and EKS in Chapter 17, Elastic Kubernetes Service.
- Elastic Beanstalk (EBS): Elastic Beanstalk is a popular Platform as a Service (PaaS) offering provided by AWS that provides a complete and fully managed environment that targets different types of popular programming languages and application frameworks such as Java, Python, Ruby, and Node.js. Elastic Beanstalk also supports Docker applications, allowing you to support a wide variety of applications written in the programming language of your choice. You will learn how to deploy a multi-container Docker application in Chapter 15, Elastic Beanstalk.
- Docker Swarm in AWS: Docker Swarm is the native container management and clustering platform built into Docker that leverages the native Docker and Docker Compose tool chain to manage and deploy your container applications. At the time of writing this book, AWS does not provide a managed offering for Docker Swarm, however Docker provides a CloudFormation template (CloudFormation is a free Infrastructure as Code automation and management service provided by AWS) that allows you to quickly deploy a Docker Swarm cluster in AWS that integrates with native AWS offerings include the Elastic Load Balancing (ELB) and Elastic Block Store (EBS) services. We will cover all of this and more in the chapter Docker Swarm in AWS.
- CodeBuild: AWS CodeBuild is a fully managed build service that supports continuous delivery use cases by providing a container-based build agent that you can use to test, build, and deploy your applications without having to manage any of the infrastructure traditionally associated with continuous delivery systems. CodeBuild uses Docker as its container platform for spinning up build agents on demand, and you will be introduced to CodeBuild along with other continuous delivery tools such as CodePipeline in the chapter Continuously Delivering ECS Applications.
- Batch: AWS Batch provides a fully managed service based upon ECS that allows you to run container-based batch workloads without needing to worry about managing or maintaining any supporting infrastructure. We will not be covering AWS Batch in this book, however you can learn more about this service at https://aws.amazon.com/batch/.
With such a wide variety of options to run your Docker applications on AWS, it is important to be able to choose the right solution based upon the requirements of your organization or specific use cases.
If you are a small to medium sized organization that wants to get up and running quickly with Docker on AWS, and you don't want to manage any supporting infrastructure, then Fargate or Elastic Beanstalk are options that you may prefer. Fargate supports native integration with key AWS services, and is a building block component that doesn't dictate how your build, deploy, or operate your applications. At the time of writing this book, Fargate is not available in all regions, is comparatively expensive when compared to other solutions, and has some limitations such as not being able to support persistent storage. Elastic Beanstalk provides a comprehensive end-to-end solution for managing your Docker applications, providing a variety of integrations out of the box, and includes operational tooling to manage the complete life cycle of your applications. Elastic Beanstalk does require you to buy into a very opinionated framework and methodology of how to build, deploy, and run your applications, and can be difficult to customize to meet your needs.
If you are a larger organization that has specific requirements around security and compliance, or just wants greater flexibility and control over the infrastructure that runs your container workloads, then you should consider ECS, EKS, and Docker Swarm. ECS is the native flagship container management platform of choice for AWS, and as such has a large customer base that has been running containers at scale for a number of years. As you will learn in this book, ECS is integrated with CloudFormation, which allows you to define all of your clusters, application services, and container definitions using an Infrastructure as Code approach that can be combined with other AWS resources to provide you with the ability to deploy complete, complex environments with the click of a button. That said, the main criticism of ECS is that it is a proprietary solution specific to AWS, meaning that you can't use it in other cloud environments or run it on your own infrastructure. Increasingly larger organizations are looking to infrastructure and cloud agnostic cloud management platforms, and this is where you should consider EKS or Docker Swarm if these are your goals. Kubernetes has taken the container orchestration world by storm, and is now one of the largest and most popular open source projects. AWS now offers a managed Kubernetes service in the form of EKS, which makes it very easy to get Kubernetes up and running in AWS, and leverage core integrations with CloudFormation, and the Elastic Load Balancing (ELB) and Elastic Block Store (EBS) services. Docker Swarm is a competitor to Kubernetes, and although it seems to have lost the battle for container orchestration supremacy to Kubernetes, it does have the advantage of being a native out-of-the-box feature integrated with Docker which is very easy to get up and running using familiar Docker tools. Docker does currently publish CloudFormation templates and support key integrations with AWS services that makes it very easy to get up and running in AWS. However, there are concerns around the longevity of such solutions given that Docker Inc. is a commercial entity and the ever growing popularity and dominance of Kubernetes may force Docker Inc. to focus solely on its paid Docker Enterprise Edition and other commercial offerings in the future.
As you can see, there are many considerations when it comes to choosing a solution that is right for you, and the great thing about this book is that you will learn how to use each of these approaches to deploy and run your Docker applications in AWS. Regardless of which solution you think might sounds more suited to you right now, I encourage you to read through and complete all of the chapters in this book, as much of the content you will learn for one specific solution can be applied to the other solutions, and you will be in a much better position to tailor and build a comprehensive container management solution based upon your desired outcomes.