Learning Nagios 3.0

Nagios' main strength is its flexibility — it can be configured to monitor your IT infrastructure in the way you want. It also has a mechanism to automatically react to problems, and a powerful notification system. All of this is based on a clear object definition system and on a few object types:

1. Commands are definitions of how Nagios should perform particular types of checks; they are an abstraction layer on top of the actual plugins that allow you to group similar types of operations.
2. Time periods are date and time spans within which an operation should or should not be performed; for example: Monday to Friday between 09:00 and 17:00.
3. Contacts and contact groups are people who should be notified, along with information on how and when they should be contacted. Contacts can be grouped and a single contact can be a member of more than one group.
4. Host are physical machines, along with information on who should be contacted, how checks should be performed, and when. Hosts can be grouped; into host groups each host may be a member of more than one host group.
5. Services are various functionalities or resources to monitor a specific host, along with information on who should be contacted, how the checks should be performed, and when. Services can be grouped into service groups; each service may be a member of more than one service group.
6. Host and service escalations define the specific time period after which additional people should be notified of certain events — for example a critical server being down for more than 4 hours should alert IT management so that they start tracking the issue. These people are defined in addition to the normal notifications configured in the host and service objects.

An important benefit that you will gain by using Nagios is a mature dependency system. For any administrator, it is obvious that if your router is down, all machines accessed through it will fail. Some systems don't take that into account and in such a case, you would get a list of several failing machines and services. Nagios allows you to define dependencies between hosts to reflect your actual network topology. For example, if a switch that connects you to a router is down, Nagios will not perform any checks on the router or on the machines that are dependant on the router. This is illustrated in the following example:

You can also define that one particular service depends on another service; either on the same host or on a different host. If one of the services is down, a check for a service that depends on it is not performed. For example, for your company's intranet application to function properly, both an underlying web server and a database server must be running. So, if a database service is not working properly, Nagios will not perform checks on your application. The database server might be on the same host or on a different host. In such a case, if the machine is down or not accessible, notifications for all services dependent on the database service will not be sent out either.

Nagios offers a consistent system of macro definitions. These are the variables that can be put into all object definitions, depending on what the context is. They can be put inside commands, and depending on host, service, and many other parameters, values are substituted accordingly. For example, a command definition might use the IP address of the host it is currently checking in all remote tests. This also makes it possible to put information such as the previous and current statuses of a service in a notification email. Nagios 3 also offers various extensions to macro definitions, which makes it an even more powerful mechanism. This is described in detail in the last section of this chapter.

Nagios also offers mechanisms for scheduling planned downtimes. You can schedule that a particular host or service is planned to be unavailable. This will prevent Nagios from notifying people to be contacted regarding the problems related to these objects. Nagios can also notify people of planned downtimes automatically. This is mainly used when maintenance of the IT infrastructure is to be carried out, and the servers and/or services they provide are unavailable for a long time. This allows the creation of an integrated process of scheduling downtimes that will also handle informing the users.

Learning Nagios 3.0

Learning Nagios 3.0

Overview of this book

Related Content you might be interested in

Current Title:

Learning Nagios 3.0

Main Features