Book Image

Native Docker Clustering with Swarm

By : Fabrizio Soppelsa, Chanwit Kaewkasi
Book Image

Native Docker Clustering with Swarm

By: Fabrizio Soppelsa, Chanwit Kaewkasi

Overview of this book

Docker Swarm serves as one of the crucial components of the Docker ecosystem and offers a native solution for you to orchestrate containers. It’s turning out to be one of the preferred choices for Docker clustering thanks to its recent improvements. This book covers Swarm, Swarm Mode, and SwarmKit. It gives you a guided tour on how Swarm works and how to work with Swarm. It describes how to set up local test installations and then moves to huge distributed infrastructures. You will be shown how Swarm works internally, what’s new in Swarmkit, how to automate big Swarm deployments, and how to configure and operate a Swarm cluster on the public and private cloud. This book will teach you how to meet the challenge of deploying massive production-ready applications and a huge number of containers on Swarm. You'll also cover advanced topics that include volumes, scheduling, a Libnetwork deep dive, security, and platform scalability.
Table of Contents (18 chapters)
Native Docker Clustering with Swarm
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Dedication
Preface

Similar projects


We have more than only Docker Swarm out there to clusterize containers. For completeness, we will briefly review the most widely known open source alternatives, before diving completely into Swarm.

Kubernetes

Kubernetes (http://kubernetes.io), also known as k8s, aims at the same goal of Docker Swarm; it's a manager for cluster of containers. Started originally as project Borg in Google laboratories, it was later open sourced and released as a stable version in 2015, supporting Google Cloud Platform, CoreOS, Azure, and vSphere.

Kubernetes so far runs containers in Docker, which is commanded via API by a so called Kubelet, a service that registers and manages Pods. Architecturally, Kubernetes divides its clusters, logically, not into bare containers but into Pods. A Pod is the smallest deployable unit and is physically a representation of an application made by a group of one or more containers, usually collocated, that share resources such as storage and networking (users can simulate Pods in Docker using Compose and starting from Docker 1.12 create Docker DABs (Distributed Application Bundles)).

Kubernetes includes some expected basic clustering features, such as labels, health checkers, Pods registry, has configurable schedulers, and services such as ambassadors or load balancers.

In practice, the Kubernetes user utilizes the kubectl client to interface to the Kubernetes master, the cluster controlling unit that commands the Kubernetes nodes doing some work, called minions. Minions run Pods and everything is glued by Etcd.

On a Kubernetes node, you will find a running Docker Engine, which runs a kube-api container, and a system service called kubelet.service.

There are a of kubectl commands that are pretty intuitive, such as

  • kubectl cluster-info, kubectl get pods, and kubectl get nodes to retrieve information about the cluster and its health

  • kubectl create -f cassandra.yaml and any derivative Pod commands, to create, manage, and destroy Pods

  • kubectl scale rc cassandra --replicas=2 to scale Pods and applications

  • kubectl label pods cassandra env=prod to configure Pod labels

This is just a high level panoramic of Kubernetes. The main differences between Kubernetes and Docker Swarm are:

  • Swarm has a more straightforward architecture to understand. Kubernetes requires more focus, just to grasp its fundamentals. But studying is always good!

  • Again on architecture: Kubernetes is based on Pods, Swarm on containers, and DABs.

  • You need to install Kubernetes. By either deploying on GCE, using CoreOS, or on the top of OpenStack, you must take care of it. You must deploy and configure a Kubernetes cluster, and this is some little extra effort. Swarm is integrated into Docker, and requires no extra installations.

  • Kubernetes has an additional concept of Replication Controllers, a technology that ensure that all the Pods described by some templates are running at a given time.

  • Both Kubernetes and Swarm use Etcd. But while in Kubernetes it's treated as an external facility service, in Swarm it's integrated and runs on manager nodes.

A performance comparison between Kubernetes and Swarm might take form of holy wars and we want to subtract to this practice. There are benchmarks showing how fast is Swarm in starting containers and other benchmarks showing how fast is Kubernetes in running its workloads. We are of the opinion that benchmark results must always be taken cum grano salis. That said, both Kubernetes and Swarm are suitable for running big, fast, and scalable containers clusters.

CoreOS Fleet

Fleet (https://github.com/coreos/fleet) is another possible choice amongst container orchestrators. It comes from the family of CoreOS container products (which includes CoreOS, Rocket, and Flannel) and is basically different from Swarm, Kubernetes, and Mesos in that it's architected as an extension to system. Fleet operates through schedulers to distribute resources and tasks across the cluster nodes. Hence, its goal is not only to provide a pure containers clusterization rather to be a distributed more general elaboration system. It's possible, for example, to run Kubernetes on the top of Fleet.

A Fleet cluster is made of engines responsible for scheduling jobs, other management operations, and agents, running on each host, that are physically executing the jobs they're assigned and reporting the status continuously to engines. Etcd is the discovery services that keeps everything glued.

You interact through a Fleet cluster with its main command fleetctl, with the list, start, and stop containers and services options.

So, summarising, Fleet is different from Docker Swarm:

  • It's a higher-level abstraction that distributes tasks, it's not a mere container orchestrator.

  • Think of Fleet as more of a distributed init system for your cluster. Systemd is for one host, Fleet for a cluster of hosts.

  • Fleet clusterizes specifically a bunch of CoreOS nodes

  • You can run Kubernetes on the top of Fleet to exploit Fleet features of resiliency and high availability

  • There are no known stable and robust ways to integrate Fleet and Swarm  v1 automatically.

  • Currently, Fleet is not tested to run clusters with more than 100 nodes and 1000 containers (https://github.com/coreos/fleet/blob/master/Documentation/fleet-scaling.md) while we were able to run Swarms with 2300 and later 4500 nodes.

Apache Mesos

Whether you can see Fleet as a distributed init system for your cluster, you can think of Mesos (https://mesos.apache.org/) in terms of a distributed kernel. With Mesos, you can make available all your nodes resources as if they were one and, for the scope of this book, run containers clusters on them.

Mesos, originally started at the University of Berkeley in 2009, is a mature project and has been used in production with success, for example by Twitter.

It's even more general purpose than Fleet, being multi-platform (you can run it on Linux, OS X or Windows nodes) and capable of running heterogeneous jobs. You can typically have clusters of containers running on Mesos just aside of pure Big Data jobs (Hadoop or Spark) and others, including continuous integration, real-time processing, web applications, data storage, and even more.

A Mesos cluster is made of one Master, slaves, and frameworks. As you would expect, the master allocates resources and tasks on the slaves, it is responsible for the system communications and runs a discovery service (ZooKeeper). But what are frameworks? Frameworks are applications. A framework is made of a scheduler and an executor, the first one distributes tasks and the second executes them.

For our interest, typically containers are run on Mesos through a framework named Marathon (https://mesosphere.github.io/marathon/docs/native-docker.html).

A comparison between Mesos and Docker Swarm does not make sense here, since they may very well run complementarily, that is Docker Swarm v1 can run on Mesos and a portion of Swarm source code is just dedicated to this. Swarm Mode and SwarmKit, instead, are very similar to Mesos since they abstract jobs in tasks and group them in services, to distribute loads on the cluster. We'll discuss better of SwarmKit features in Chapter 3, Meeting Docker Swarm Mode.

Kubernetes versus Fleet versus Mesos

Kubernetes, Fleet and Mesos try to address a similar problem; they provide a layer abstraction for your resources and allow you to interface to a cluster manager. Then you can launch jobs and tasks and the project of your choice will sort it out. The difference can be seen in the features provided out-of-the-box and on how much you can customize the precision of allocating and scaling resources and jobs. Of the three, Kubernetes is more automatic, Mesos more customizable so, from a certain point of view, powerful (if you need all that power, of course).

Kubernetes and Fleet abstract and make default many details that for Mesos are needed to be configured, for example a scheduler. On Mesos, you can use the Marathon or Chronos scheduler or even write your own. If you don't require, don't want or even can't dig deep into those technicalities, you can pick up Kubernetes or Fleet. It depends on your actual and/or forecasted workload.

Swarm versus all

So, what solution should you adopt? As always, you have a problem and open source is generous enough to make many technologies available that can often intersect on to each other, to help you successfully reach a goal. The problem is how and what to choose to resolve your problem. Kubernetes, Fleet, and Mesos are all powerful and interesting projects and so is Docker Swarm.

In a hypothetic standing of how automatic and simple to understand these four guys are, Swarm is a winner. This is not an advantage always, but in this book we'll show how Docker Swarm can help you to make real things work, bearing in mind that in one of the DockerCon keynotes Solomon Hykes, CTO and Founder of Docker, suggested that Swarm would be a tier that could provide a common interface onto the many orchestration and scheduling frameworks.