Book Image

Learning Nagios 3.0

Book Image

Learning Nagios 3.0

Overview of this book

Table of Contents (16 chapters)
Learning Nagios 3.0
Credits
About the Author
About the Reviewer
Preface

Introduction to Nagios


According to WikiPedia (http://en.wikipedia.org/wiki/System_Monitoring) Nagios is a tool for system monitoring. This means that it constantly checks the status of machines and various services on those machines. The main purpose of system monitoring is to detect and report on any system not working properly, as soon as possible, so that, you are aware of the problem before the user runs into it.

Nagios does not perform any host or service checks on its own. It uses plugins to perform the actual checks. This makes it a very modular and flexible solution for performing machine and service checks.

Objects monitored by Nagios are split into two categories: hosts and services. Hosts are physical machines (servers, routers, workstations, printers and so on), while services are particular functionalities, for example, a web server (an httpd process on a machine) can be defined as a service to be monitored. Each service is associated with a host it is running on. In addition, both machines and services can be grouped into host and service groups, accordingly. We will look into the details of each of these types of objects in the next section.

Nagios has two major strengths when it comes to scanning — first of all, instead of monitoring values, it only uses four states to describe status: OK, WARNING, CRITICAL, and UNKNOWN. The approach of only offering abstract states allows administrators to ignore monitoring values and just decide on what the warning/critical limits are. Having a strict limit to watch out for is much better as you always catch a problem regardless of whether it turns from a warning to a critical limit in 15 minutes or in a week. This is exactly what Nagios does. If you are monitoring a numeric value such as the amount of disk space and CPU usage, you can define thresholds for the values which are considered correct, a warning, or a failure. For example, system administrators tend to ignore things such as a slow decline in storage space. People often ignore it until a critical process runs out of disk space.

Another benefit is that a report states the number of services that are up and running in both warning state and critical state. Such a report offers a good overview of your infrastructure status. Nagios also offers similar reports for host groups and service groups, say when any critical service or database server is down. Such a report can also help prioritize what needs to be dealt with first, and which problems can be handled later.

Nagios performs all of its checks using plugins. These are external components to which Nagios passes information on what should be checked and what the warning and critical limits are. Plugins are responsible for doing the checks and analyzing the results. The output from such a check is a status (OK, WARNING, CRITICAL, or UNKNOWN) and additional text providing information on the service in detail. This text is primarily intended for system administrators to be able to read a detailed status of a service.

Nagios not only offers a core system for monitoring, but also offers a set of standard plugins in a separate package (see http://nagiosplugins.org/ for more details). These plugins allow checks for almost all of the services your company might have. Refer to Chapter 4, Overview of Nagios Plugins, for detailed information on plugins that are developed along with Nagios. If you need to perform a specific check (for example, to connect to a web service and invoke methods), it is very easy to write your own plugins. And that's not all — they can be written in any language, and it takes less than a quarter of the time it takes to write a complete check command! Chapter 11 Extending Nagios talks about this in more detail.