Deploying Stateful Applications at Scale

We'll use go-demo-3 application throughout this book. It consists of a backend API written in Go that uses MongoDB to store its state. With time, we'll improve the definition of the application. Once we're happy with the way the application is running inside the cluster, we'll work on continuous deployment processes that will fully automate everything from a commit to the vfarcic/go-demo-3 GitHub repository, all the way until it's running in production.

We need to start somewhere, and our first iteration of the go-demo-3 application is in the sts/go-demo-3-deploy.yml directory.

 1  cat sts/go-demo-3-deploy.yml

Assuming that you are already familiar with Namespaces, Ingress, PersistentVolumeClaims, Deployments, and Services, the definition is fairly straightforward. We defined a Namespace go-demo-3 in which all the other resources will reside. Ingress will forward external requests with the base path /demo to the Service api. The PersistentVolumeClaim does not have a storageClassName, so it will claim a PersistentVolume defined through the default StorageClass.

There are two Deployments, one for the API and the other for the database. Both have a Service associated, and both will create three replicas (Pods).

Now that we have a very high-level overview of the go-demo-3 definition, we can proceed and create the resources.

 1  kubectl apply \
 2      -f sts/go-demo-3-deploy.yml \
 3      --record
 4  
 5  kubectl -n go-demo-3 \
 6      rollout status deployment api

We created the resources defined in sts/go-demo-3-deploy.yml and retrieved the rollout status of the api Deployment. The API is designed to fail if it cannot access its databases, so the fact that it rolled out correctly seems to give us a reasonable guarantee that everything is working as expected. Still, since I am skeptical by nature, we'll double-check that all the Pods are running. We should see six of them (three for each Deployment) in the go-demo-3 Namespace.

 1  kubectl -n go-demo-3 get pods

The output is as follows.

NAME    READY STATUS           RESTARTS AGE
api-... 1/1   Running          2        55s
api-... 1/1   Running          2        55s
api-... 1/1   Running          2        55s
db-...  1/1   Running          0        55s
db-...  0/1   CrashLoopBackOff 2        55s
db-...  0/1   CrashLoopBackOff 1        55s

A disaster befell us. Only one of the three db Pods is running. We did expect a few restarts of the api Pods. They tried to connect to MongoDB and were failing until at least one db Pod started running. The failure of the two db Pods is a bit harder to explain. They do not depend on other Pods and Services so they should run without restarts.

Let's take a look at the db logs. They might give us a clue what went wrong.

A note to GKE users
GKE will not be able to mount a volume to more than one Pod. Two of the db- Pods with have ContainerCreating status. If you describe one of those Pods, you'll see Multi-Attach error for volume "pvc-..." Volume is already exclusively attached to one node and can't be attached to another. That's a typical behavior since the default PersistentVolumeClass used in GKE creates volumes that can be attached only to one Pod at a time. We'll fix that soon. Until then, remember that you will not be able to see the logs of those Pods.

We need to know the names of the Pods we want to peek into, so we'll use a bit of "creative" formatting of the kubectl get pods output.

 1  DB_1=$(kubectl -n go-demo-3 get pods \
 2      -l app=db \
 3      -o jsonpath="{.items[0].metadata.name}")
 4
 5  DB_2=$(kubectl -n go-demo-3 get pods \
 6      -l app=db \
 7      -o jsonpath="{.items[1].metadata.name}")

The only difference between the two commands is in jsonpath. The first result (index 0) is stored in DB_1, and the second (index 1) in DB_2. Since we know that only one of the three Pods is running, peeking into the logs of two will guarantee that we'll look into at least one of those with errors.

 1  kubectl -n go-demo-3 logs $DB_1

The last lines of the output of the first db Pod are as follows.

...
2018-03-29T20:51:53.390+0000 I NETWORK  [thread1] waiting for connections on port 27017
2018-03-29T20:51:53.681+0000 I NETWORK  [thread1] connection accepted from 100.96.2.7:46888 #1 (1 
connection now open)
2018-03-29T20:51:55.984+0000 I NETWORK  [thread1] connection accepted from 100.96.2.8:49418 #2 (2 connections now open)
2018-03-29T20:51:59.182+0000 I NETWORK  [thread1] connection accepted from 100.96.3.6:43940 #3 (3 connections now open)

Everything seems OK. We can see that the database initialized and started waiting for connections. Soon after, the three replicas of the api Deployment connected to MongoDB running inside this Pod. Now that we know that the first Pod is the one that is running, we should look at the logs of the second. That must be one of those with errors.

1  kubectl -n go-demo-3 logs $DB_2

The output, limited to the last few lines, is as follows.

...
2018-03-29T20:54:57.362+0000 I STORAGE  [initandlisten] exception in initAndListen: 98 Unable to lock file: /data/db/mongod.lock 
Resource temporarily unavailable. Is a mongod instance already running?, terminating
2018-03-29T20:54:57.362+0000 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-03-29T20:54:57.362+0000 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2018-03-29T20:54:57.362+0000 I CONTROL  [initandlisten] now exiting
2018-03-29T20:54:57.362+0000 I CONTROL  [initandlisten] shutting down with code:100

There's the symptom of the problem. MongoDB could not lock the /data/db/mongod.lock file, and it shut itself down. Let's take a look at the PersistentVolumes.

 1  kubectl get pv

The output is as follows.

NAME    CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM           
STORAGECLASS REASON AGE
pvc-... 2Gi      RWO          Delete         Bound  go-demo-3/mongo 
gp2                3m

There is only one bound PersistentVolume. That is to be expected. Even if we'd want to, we could not tell a Deployment to create a volume for each replica. The Deployment mounted a volume associated with the claim which, in turn, created a PersistentVolume. All the replicas tried to mount the same volume.

MongoDB is designed in a way that each instance requires exclusive access to a directory where it stores its state. We tried to mount the same volume to all the replicas, and only one of them got the lock. All the others failed.

If you're using kops with AWS, the default StorageClass is using the kubernetes.io/aws-ebs provisioner. Since EBS can be mounted only by a single entity, our claim has the access mode set to ReadWriteOnce. To make things more complicated, EBS cannot span multiple availability-zones, and we are hoping to spread our MongoDB replicas so that they can survive even failure of a whole zone. The same is true for GKE which also uses block storage by default.

Having a ReadWriteOnce PersistentVolumeClaims and EBS not being able to span multiple availability zones is not a problem for our use-case. The real issue is that each MongoDB instance needs a separate volume or at least a different directory. Neither of the solutions can be (easily) solved with Deployments.

Figure 1-1: Pods created through the deployment share the same PersistentVolume (AWS variation)

Now we have a good use-case that might show some of the benefits of StatefulSet controllers.

Before we move on, we'll delete the go-demo-3 Namespace and all the resources running inside it.

 1  kubectl delete ns go-demo-3

The DevOps 2.4 Toolkit

By : Viktor Farcic

The DevOps 2.4 Toolkit

By: Viktor Farcic

Overview of this book

Using Deployments to run Stateful applications at scale

The DevOps 2.4 Toolkit

By : Viktor Farcic

The DevOps 2.4 Toolkit

By: Viktor Farcic

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access