Book Image

Nagios Core Administration Cookbook Second Edition - Second Edition

By : Tom Ryder
Book Image

Nagios Core Administration Cookbook Second Edition - Second Edition

By: Tom Ryder

Overview of this book

Nagios Core is an open source monitoring framework suitable for any network that ensures both internal and customer-facing services are running correctly and manages notification and reporting behavior to diagnose and fix outages promptly. It allows very fine configuration of exactly when, where, what, and how to check network services to meet both the uptime goals of your network and systems team and the needs of your users. This book shows system and network administrators how to use Nagios Core to its fullest as a monitoring framework for checks on any kind of network services, from the smallest home network to much larger production multi-site services. You will discover that Nagios Core is capable of doing much more than pinging a host or to see whether websites respond. The recipes in this book will demonstrate how to leverage Nagios Core's advanced configuration, scripting hooks, reports, data retrieval, and extensibility to integrate it with your existing systems, and to make it the rock-solid center of your network monitoring world.
Table of Contents (18 chapters)
Nagios Core Administration Cookbook Second Edition
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Creating a new network host


In this recipe, we'll start with the default Nagios Core configuration and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up and will automatically check it (probably using PING) on a regular basis. In this example, I'll use the example of my Nagios Core monitoring server with the DNS name olympus.example.net and add a host definition for a web server with the DNS name sparta.example.net. This is all on an example network 192.0.2.0/24.

Getting ready

You'll need a working Nagios Core 4.0 or greater installation with a web interface and all the Nagios Core plugins installed. If you have not yet installed Nagios Core, you should start with the quick start guide at http://nagios.sourceforge.net/docs/nagioscore/4/en/quickstart.html that is appropriate to your operating system.

We'll assume that the configuration file Nagios Core reads on startup is at /usr/local/nagios/etc/nagios.cfg, as is the case with the default installation. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point. However, it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server and be able to write text files using an editor of your choice; I'll use vi. You will need root privileges on the server via su or sudo.

You should know how to reload Nagios Core on the server so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole machine to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios, which I'll use here. On modern GNU/Linux systems, it may be a better practice to use system nagios reload.

You should also get the hostname or IP address of the server you'd like to monitor ready. We'll use IP addresses rather than DNS hostnames here, which means that our checks will keep working even if DNS is unavailable. You may prefer to use hostnames if your addresses change regularly. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the ping(8) tool would need for its own check_ping command.

Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server using ping(8) by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:

user@olympus:~$ ping 192.0.2.21
PING sparta.example.net (192.0.2.21) 56(84) bytes of data.
64 bytes from sparta.example.net (192.0.2.21): icmp_req=1 ttl=64 time=0.149 ms

How to do it...

We can create the new host definition for sparta.example.net as follows:

  1. Change the directory to /usr/local/nagios/etc/objects and create a new file called sparta.example.net.cfg:

    # cd /usr/local/nagios/etc/objects
    # vi sparta.example.net.cfg
    
  2. Write the following code into the file, changing the values in bold as appropriate for your own setup:

    define host {
        host_name              sparta.example.net
        alias                  sparta
        address                192.0.2.21
        max_check_attempts     3
        check_period           24x7
        check_command          check-host-alive
        contacts               nagiosadmin
        notification_interval  60
        notification_period    24x7
    }
  3. Change the directory to /usr/local/nagios/etc and edit the nagios.cfg file:

    # cd ..
    # vi nagios.cfg
    

    At the end of the file, add the following line:

    cfg_file=/usr/local/nagios/etc/objects/sparta.example.net.cfg
  4. Reload the configuration:

    # /etc/init.d/nagios reload
    

If the server restarted successfully, the web interface should now show a brand new host in the hosts list and a PENDING state as it waits to verify that the host is alive:

In the next few minutes, the host's background should change to green to show that the verification was complete and the host status should change to UP, assuming that the checks succeeded:

If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, it would probably look something like this:

How it works...

The configuration we included in the preceding adds a host to Nagios Core's list of hosts to check, Nagios Core will periodically send a PING request to 192.0.2.21, checking whether it receives a reply, and will update the status as shown in the Nagios Core web interface appropriately. We have neither defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core and we can view its state in the web interface at any time.

The directives we defined in the preceding configuration are as follows:

  • host_name: This defines the hostname of the machine that is used internally by Nagios Core to refer to this host. It will end up being used in other parts of the configuration.

  • alias: This defines a more recognizable human-readable name for the host; this appears in the web interface. It could also be used for a full-text description of the host.

  • address: This defines the IP address of the machine. This is the actual value that Nagios Core will use to contact the server; using an IP address rather than a DNS name is generally a best practice, so the checks continue to work even if DNS is not functioning. In Nagios 4.0 or newer, if you leave this field blank, the value of host_name will be used instead. Before using Nagios 4.0, you must define it.

  • max_check_attempts: This defines the number of times Nagios Core should try to run the check if the checks fail. Here, we've defined a value of 3, meaning that Nagios Core will make a total of three attempts to contact the host before flagging it as DOWN.

  • check_period: This references the time period during which this host should be checked. The 24x7 time period is defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, not how often it will notify anyone.

  • check_command: This references the command that will be used to check whether the host is UP, DOWN, or UNREACHABLE. In this case, a standard Nagios Core configuration defines check-host-alive as a PING check, which suits as a good test of basic network connectivity and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.

  • contacts: This references the contact or contacts that will be told about state changes in the host. In this instance, we've used nagiosadmin, which is defined in the default Nagios Core configuration.

  • notification_interval: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of 60, which corresponds to 60 minutes, or 1 hour.

  • notification_period: This references the time period during which Nagios Core should send out notifications if there are problems. Here, we're again use the 24x7 time period, but for other hosts, another time period such as workhours might be more appropriate.

Note that we added the definition in its own file called sparta.example.net.cfg and then referred to it in the main configuration file nagios.cfg. This is simply a conventional way of laying out hosts and it happens to be a tidy way to manage things to keep definitions in their own files.

There's more...

There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.

While this is a perfectly valid way of specifying a host, it's more typical to define a host based on a template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core defines a simple template host called generic-host, which could be used by extending the host definition, as with the use directive:

define host {
    use                 generic-host
    name                sparta
    host_name           sparta.example.net
    address             192.0.2.21
    max_check_attempts  3
    contacts            nagiosadmin
    check_period        24x7
    check_command       check-host-alive
}

This uses all the parameters defined for generic-host and then adds on the details of the specific host that needs to be checked. If you're curious to see what's defined in generic-host, you'll find its definition by navigating to /usr/local/nagios/etc/objects/templates.cfg.

See also

  • Specifying how frequently to check a host, Chapter 3, Working with Checks and States

  • Using an alternative check command for hosts, Chapter 2, Working with Commands and Plugins

  • Grouping configuration files in directories, Chapter 9, Managing Configuration

  • Using inheritance to simplify configuration, Chapter 9, Managing Configuration