Book Image

Mastering Kubernetes - Second Edition

By : Gigi Sayfan
Book Image

Mastering Kubernetes - Second Edition

By: Gigi Sayfan

Overview of this book

Kubernetes is an open source system that is used to automate the deployment, scaling, and management of containerized applications. If you are running more containers or want automated management of your containers, you need Kubernetes at your disposal. To put things into perspective, Mastering Kubernetes walks you through the advanced management of Kubernetes clusters. To start with, you will learn the fundamentals of both Kubernetes architecture and Kubernetes design in detail. You will discover how to run complex stateful microservices on Kubernetes including advanced features such as horizontal pod autoscaling, rolling updates, resource quotas, and persistent storage backend. Using real-world use cases, you will explore the options for network configuration, and understand how to set up, operate, and troubleshoot various Kubernetes networking plugins. In addition to this, you will get to grips with custom resource development and utilization in automation and maintenance workflows. To scale up your knowledge of Kubernetes, you will encounter some additional concepts based on the Kubernetes 1.10 release, such as Promethus, Role-based access control, API aggregation, and more. By the end of this book, you’ll know everything you need to graduate from intermediate to advanced level of understanding Kubernetes.
Table of Contents (16 chapters)

Hardware failure

Hardware failures in Kubernetes can be divided into two groups:

  • The node is unresponsive
  • The node is responsive

When the node is not responsive, it can be difficult sometimes to determine if it's a networking issue, a configuration issue, or actual hardware failure. You obviously can't use any information like logs or run diagnostics on the node itself. What can you do? First, consider if the node was ever responsive. If it's a node that was just added to the cluster, it is more likely a configuration issue. If it's a node that was part of the cluster then you can look at historical data from the node on Heapster or central logging and see if you detect any errors in the logs or degradation in performance that may indicate failing hardware.

When the node is responsive, it may still suffer from the failure of redundant hardware, such as non...