Nagios Core Administration Cookbook

Nagios Core Administration Cookbook

By : Tom Ryder

Buy this Book

Nagios Core Administration Cookbook

By: Tom Ryder

Buy this Book

Overview of this book

Network monitoring requires significantly more than just pinging hosts. This cookbook will help you to comprehensively test your networks' major functions on a regular basis."Nagios Core Administration Cookbook" will show you how to use Nagios Core as a monitoring framework that understands the layers and subtleties of the network for intelligent monitoring and notification behaviour. Nagios Core Administration Guide introduces the reader to methods of extending Nagios Core into a network monitoring solution. The book begins by covering the basic structure of hosts, services, and contacts and then goes on to discuss advanced usage of checks and notifications, and configuring intelligent behaviour with network paths and dependencies. The cookbook emphasizes using Nagios Core as an extensible monitoring framework. By the end of the book, you will learn that Nagios Core is capable of doing much more than pinging a host or to check if websites respond.

Nagios Core Administration Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Understanding Hosts, Services, and Contacts

Introduction

Creating a new network host

Creating a new HTTP service

Creating a new e-mail contact

Verifying configuration

Creating a new hostgroup

Creating a new servicegroup

Creating a new contactgroup

Creating a new time period

Running a service on all hosts in a group

Working with Commands and Plugins

Creating a new command

Customizing an existing command

Using an alternative check command for hosts

Writing a new plugin from scratch

Working with Checks and States

Introduction

Specifying how frequently to check a host or service

Changing thresholds for PING RTT and packet loss

Changing thresholds for disk usage

Scheduling downtime for a host or service

Managing brief outages with flapping

Adjusting flapping percentage thresholds for a service

Configuring Notifications

Introduction

Configuring notification periods

Configuring notification for groups

Specifying which states to be notified about

Tolerating a certain number of failed checks

Automating contact rotation

Defining an escalation for repeated notifications

Defining a custom notification method

Monitoring Methods

Introduction

Monitoring PING for any host

Monitoring SSH for any host

Checking an alternative SSH port

Monitoring mail services

Monitoring web services

Checking that a website returns a given string

Monitoring database services

Monitoring the output of an SNMP query

Monitoring a RAID or other hardware device

Creating an SNMP OID to monitor

Enabling Remote Execution

Introduction

Monitoring local services on a remote machine with NRPE

Setting the listening address for NRPE

Setting allowed client hosts for NRPE

Creating new NRPE command definitions securely

Giving limited sudo privileges to NRPE

Using check_by_ssh with key authentication instead of NRPE

Using the Web Interface

Introduction

Using the Tactical Overview

Viewing and interpreting availability reports

Viewing and interpreting trends

Viewing and interpreting notification history

Adding comments on hosts or services in the web interface

Viewing configuration in the web interface

Scheduling checks from the web interface

Acknowledging a problem via the web interface

Managing Network Layout

Introduction

Creating a network host hierarchy

Using the network map

Choosing icons for hosts

Establishing a host dependency

Establishing a service dependency

Monitoring individual nodes in a cluster

Using the network map as an overlay

Managing Configuration

Introduction

Grouping configuration files in directories

Keeping configuration under version control

Configuring host roles using groups

Building groups using regular expressions

Using inheritance to simplify configuration

Defining macros in a resource file

Dynamically building host definitions

Security and Performance

Introduction

Requiring authentication for the web interface

Using authenticated contacts

Writing debugging information to a Nagios log file

Monitoring Nagios performance with Nagiostats

Improving startup times with pre-cached object files

Setting up a redundant monitoring host

Automating and Extending Nagios Core

Introduction

Allowing and submitting passive checks

Submitting passive checks from a remote host with NSCA

Submitting passive checks in response to SNMP traps

Setting up an event handler script

Tracking host and service states with Nagiosgraph

Reading status into a MySQL database with NDOUtils

Writing customized Nagios Core reports

Getting extra visualizations with NagVis

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Creating a new network host

In this recipe, we'll start with the default Nagios Core configuration, and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up, and will automatically check it (probably using PING) on a regular basis. In this example, I'll use my Nagios Core monitoring server with a Domain Name System (DNS) name of olympus.naginet, and add a host definition for a webserver with a DNS name of sparta.naginet. This is all on my local network – 10.128.0.0/24.

Getting ready

You'll need a working Nagios Core 3.0 or greater installation with a web interface, with all the Nagios Core Plugins installed. If you have not yet installed Nagios Core, then you should start with the QuickStart guide: http://nagios.sourceforge.net/docs/3_0/quickstart.html.

We'll assume that the configuration file that Nagios Core reads on startup is located at /usr/local/nagios/etc/nagios.cfg, as is the case with the default install. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point, but it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server, and be able to write text files using an editor of your choice; I'll use vi. You will need root privileges on the server via su or sudo.

You should know how to restart Nagios Core on the server, so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole server to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios, which I'll use here.

You should also get the hostname or IP address of the server you'd like to monitor ready. It's good practice to use the IP address if you can, which will mean your checks keep working even if DNS is unavailable. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the PING tool would need for its own check_ping command.

Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server via PING by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:

tom@olympus:~$ ping 10.128.0.21
PING sparta.naginet (10.128.0.21) 56(84) bytes of data.
64 bytes from sparta.naginet (10.128.0.21): icmp_req=1 ttl=64 time=0.149 ms

How to do it...

We can create the new host definition for sparta.naginet as follows:

Change directory to /usr/local/nagios/etc/objects, and create a new file called sparta.naginet.cfg:
```
# cd /usr/local/nagios/etc/objects
# vi sparta.naginet.cfg
```

Write the following into the file, changing the values in bold as appropriate for your own setup:

define host {
    host_name              sparta.naginet
    alias                  sparta
    address                10.128.0.21
    max_check_attempts     3
    check_period           24x7
    check_command          check-host-alive
    contacts               nagiosadmin
    notification_interval  60
    notification_period    24x7
}

Change directory to /usr/local/nagios/etc, and edit the nagios.cfg file:
```
# cd ..
# vi nagios.cfg
```

At the end of the file add the following line:

cfg_file=/usr/local/nagios/etc/objects/sparta.naginet.cfg

Restart the Nagios Core server:
```
# /etc/init.d/nagios restart
```

If the server restarted successfully, the web interface should show a brand new host in the Hosts list, in PENDING state as it waits to run a check that the host is alive:

In the next few minutes, it should change to green to show that the check passed and the host is UP, assuming that the check succeeded:

If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, then it would probably look similar to the following screenshot:

How it works...

The configuration we included in this section adds a host to Nagios Core's list of hosts. It will periodically check the host by sending a PING request, checking to see if it receives a reply, and updating the host's status as shown in the Nagios Core web interface accordingly. We haven't defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core, and we can view its state in the web interface at any time.

The directives we defined in the preceding configuration are explained as follows:

host_name: This defines the hostname of the machine, used internally by Nagios Core to refer to its host. It will end up being used in other parts of the configuration.
alias: This defines a more recognizable human-readable name for the host; it appears in the web interface. It could also be used for a full-text description of the host.
address: This defines the IP address of the machine. This is the actual value that Nagios Core will use for contacting the server; using an IP address rather than a DNS name is generally best practice, so that the checks continue to work even if DNS is not functioning.
max_check_attempts: This defines the number of times Nagios Core should try to repeat the check if checks fail. Here, we've defined a value of 3, meaning that Nagios Core will try two more times to PING the target host after first finding it down.
check_period : This references the time period that the host should be checked. 24x7 is a time period defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, and not how often it will notify anyone.
check_command: This references the command that will be used to check whether the host is UP, DOWN, or UNREACHABLE. In this case, a QuickStart Nagios Core configuration defines check-host-alive as a PING check, which is a good test of basic network connectivity, and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.
contacts : This references the contact or contacts that will be notified about state changes in the host. In this instance, we've used nagiosadmin, which is defined in the QuickStart Nagios Core configuration.
notification_interval: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of 60, which corresponds to 60 minutes or one hour.
notification_period: This references the time period during which Nagios Core should send out notifications, if there are problems. Here, we're again using the 24x7 time period; for other hosts, another time period such as workhours might be more appropriate.

Note that we added the definition in its own file called sparta.naginet.cfg , and then referred to it in the main nagios.cfg configuration file. This is simply a conventional way of laying out hosts, and it happens to be quite a tidy way to manage things to keep definitions in their own files.

There's more...

There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.

While this is a perfectly valid way of specifying a host, it's more typical to define a host based on some template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core's QuickStart sample configuration defines a simple template host called generic-host, which could be used by extending the host definition with the use directive:

define host {
    use                 generic-host
    name                sparta
    host_name           sparta.naginet
    address             10.128.0.21
    max_check_attempts  3
    contacts            nagiosadmin
}

This uses all the parameters defined for generic-host, and then adds on the details of the specific host that needs to be checked. Note that if you use generic-host, then you will need to define check_command in your host definition. If you're curious to see what's defined in generic-host, then you can find its definition in /usr/local/nagios/etc/objects/templates.cfg.

Nagios Core Administration Cookbook

By : Tom Ryder

Nagios Core Administration Cookbook

By: Tom Ryder

Overview of this book

Related Content you might be interested in

Current Title:

Nagios Core Administration Cookbook

Creating a new network host

Getting ready

How to do it...

How it works...

There's more...

See also