Nagios Core Administration Cookbook Second Edition

Nagios Core Administration Cookbook Second Edition - Second Edition

By : Tom Ryder

Buy this Book

Nagios Core Administration Cookbook Second Edition - Second Edition

By: Tom Ryder

Buy this Book

Overview of this book

Nagios Core is an open source monitoring framework suitable for any network that ensures both internal and customer-facing services are running correctly and manages notification and reporting behavior to diagnose and fix outages promptly. It allows very fine configuration of exactly when, where, what, and how to check network services to meet both the uptime goals of your network and systems team and the needs of your users. This book shows system and network administrators how to use Nagios Core to its fullest as a monitoring framework for checks on any kind of network services, from the smallest home network to much larger production multi-site services. You will discover that Nagios Core is capable of doing much more than pinging a host or to see whether websites respond. The recipes in this book will demonstrate how to leverage Nagios Core's advanced configuration, scripting hooks, reports, data retrieval, and extensibility to integrate it with your existing systems, and to make it the rock-solid center of your network monitoring world.

Nagios Core Administration Cookbook Second Edition

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Understanding Hosts, Services, and Contacts

Introduction

Creating a new network host

Creating a new HTTP service

Creating a new e-mail contact

Verifying configuration

Creating a new hostgroup

Creating a new servicegroup

Creating a new contactgroup

Creating a new time period

Running a service on all hosts on a group

Working with Commands and Plugins

Creating a new command

Customizing an existing command

Using an alternative check command for hosts

Writing a new plugin from scratch

Implementing threshold checks in a plugin

Using macros as environment variables in a plugin

Working with Checks and States

Introduction

Specifying how frequently to check a host or service

Changing thresholds for PING RTT and packet loss

Changing thresholds for disk usage

Scheduling downtime for a host or service

Managing brief outages with flapping

Adjusting flapping percentage thresholds for a service

Configuring Notifications

Introduction

Configuring notification periods

Configuring notifications for groups

Choosing states for notification

Specifying the number of failed checks before notification

Automating contact rotation

Defining an escalation for repeated notifications

Defining a custom notification method

Filtering notifications based on a host or service value

Monitoring Methods

Introduction

Monitoring PING for any host

Monitoring SSH for any host

Checking an alternative SSH port

Monitoring mail services

Monitoring web services

Checking that a website returns a given string

Monitoring database services

Monitoring the output of an SNMP query

Monitoring a RAID or other hardware device

Creating an SNMP OID for monitoring

Enabling Remote Execution

Introduction

Monitoring local services on a remote machine with NRPE

Setting the listening address for NRPE

Setting allowed client hosts for NRPE

Creating new NRPE command definitions securely

Giving limited sudo(8) privileges to NRPE

Using check_by_ssh with key authentication instead of NRPE

Using check_mk instead of NRPE

Using the Web Interface

Introduction

Using the Tactical Overview

Viewing and interpreting availability reports

Viewing and interpreting trends

Viewing and interpreting notification history

Adding comments on hosts or services in the web interface

Viewing configuration in the web interface

Scheduling checks from the web interface

Acknowledging a problem via the web interface

Managing Network Layout

Introduction

Creating a network host hierarchy

Using the network map

Choosing icons for hosts

Establishing a host dependency

Establishing a service dependency

Monitoring individual nodes in a cluster

Using the network map as an overlay

Managing Configuration

Introduction

Grouping configuration files in directories

Keeping a configuration under version control

Configuring host roles using groups

Building groups using regular expressions

Using inheritance to simplify configuration

Defining macros in a resource file

Using another object's directives in a host or service check

Using custom directives

Dynamically building host definitions

Security and Performance

Introduction

Using authentication for the Nagios Core web interface

Using authenticated contacts

Writing debugging information to the Nagios log file

Monitoring Nagios performance with nagiostats

Setting up a redundant monitoring host

Automating and Extending Nagios Core

Introduction

Allowing and submitting passive checks

Submitting passive checks from a remote host with NSCA

Submitting passive checks in response to SNMP traps

Setting up an event handler script

Tracking host and service states with Nagiosgraph

Reading status in a MySQL database with NDOUtils

Reading status from a Unix socket with MK Livestatus

Writing customized Nagios Core reports

Getting extra visualizations with NagVis

Writing custom Nagios Core management scripts

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Creating a new network host

In this recipe, we'll start with the default Nagios Core configuration and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up and will automatically check it (probably using PING) on a regular basis. In this example, I'll use the example of my Nagios Core monitoring server with the DNS name olympus.example.net and add a host definition for a web server with the DNS name sparta.example.net. This is all on an example network 192.0.2.0/24.

Getting ready

You'll need a working Nagios Core 4.0 or greater installation with a web interface and all the Nagios Core plugins installed. If you have not yet installed Nagios Core, you should start with the quick start guide at http://nagios.sourceforge.net/docs/nagioscore/4/en/quickstart.html that is appropriate to your operating system.

We'll assume that the configuration file Nagios Core reads on startup is at /usr/local/nagios/etc/nagios.cfg, as is the case with the default installation. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point. However, it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server and be able to write text files using an editor of your choice; I'll use vi. You will need root privileges on the server via su or sudo.

You should know how to reload Nagios Core on the server so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole machine to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios, which I'll use here. On modern GNU/Linux systems, it may be a better practice to use system nagios reload.

You should also get the hostname or IP address of the server you'd like to monitor ready. We'll use IP addresses rather than DNS hostnames here, which means that our checks will keep working even if DNS is unavailable. You may prefer to use hostnames if your addresses change regularly. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the ping(8) tool would need for its own check_ping command.

Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server using ping(8) by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:

user@olympus:~$ ping 192.0.2.21
PING sparta.example.net (192.0.2.21) 56(84) bytes of data.
64 bytes from sparta.example.net (192.0.2.21): icmp_req=1 ttl=64 time=0.149 ms

How to do it...

We can create the new host definition for sparta.example.net as follows:

Change the directory to /usr/local/nagios/etc/objects and create a new file called sparta.example.net.cfg:
```
# cd /usr/local/nagios/etc/objects
# vi sparta.example.net.cfg
```

Write the following code into the file, changing the values in bold as appropriate for your own setup:

define host {
    host_name              sparta.example.net
    alias                  sparta
    address                192.0.2.21
    max_check_attempts     3
    check_period           24x7
    check_command          check-host-alive
    contacts               nagiosadmin
    notification_interval  60
    notification_period    24x7
}

Change the directory to /usr/local/nagios/etc and edit the nagios.cfg file:
```
# cd ..
# vi nagios.cfg
```
At the end of the file, add the following line:
```
cfg_file=/usr/local/nagios/etc/objects/sparta.example.net.cfg
```
Reload the configuration:
```
# /etc/init.d/nagios reload
```

If the server restarted successfully, the web interface should now show a brand new host in the hosts list and a PENDING state as it waits to verify that the host is alive:

In the next few minutes, the host's background should change to green to show that the verification was complete and the host status should change to UP, assuming that the checks succeeded:

If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, it would probably look something like this:

How it works...

The configuration we included in the preceding adds a host to Nagios Core's list of hosts to check, Nagios Core will periodically send a PING request to 192.0.2.21, checking whether it receives a reply, and will update the status as shown in the Nagios Core web interface appropriately. We have neither defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core and we can view its state in the web interface at any time.

The directives we defined in the preceding configuration are as follows:

host_name: This defines the hostname of the machine that is used internally by Nagios Core to refer to this host. It will end up being used in other parts of the configuration.
alias: This defines a more recognizable human-readable name for the host; this appears in the web interface. It could also be used for a full-text description of the host.
address: This defines the IP address of the machine. This is the actual value that Nagios Core will use to contact the server; using an IP address rather than a DNS name is generally a best practice, so the checks continue to work even if DNS is not functioning. In Nagios 4.0 or newer, if you leave this field blank, the value of host_name will be used instead. Before using Nagios 4.0, you must define it.
max_check_attempts: This defines the number of times Nagios Core should try to run the check if the checks fail. Here, we've defined a value of 3, meaning that Nagios Core will make a total of three attempts to contact the host before flagging it as DOWN.
check_period: This references the time period during which this host should be checked. The 24x7 time period is defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, not how often it will notify anyone.
check_command: This references the command that will be used to check whether the host is UP, DOWN, or UNREACHABLE. In this case, a standard Nagios Core configuration defines check-host-alive as a PING check, which suits as a good test of basic network connectivity and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.
contacts: This references the contact or contacts that will be told about state changes in the host. In this instance, we've used nagiosadmin, which is defined in the default Nagios Core configuration.
notification_interval: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of 60, which corresponds to 60 minutes, or 1 hour.
notification_period: This references the time period during which Nagios Core should send out notifications if there are problems. Here, we're again use the 24x7 time period, but for other hosts, another time period such as workhours might be more appropriate.

Note that we added the definition in its own file called sparta.example.net.cfg and then referred to it in the main configuration file nagios.cfg. This is simply a conventional way of laying out hosts and it happens to be a tidy way to manage things to keep definitions in their own files.

There's more...

There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.

While this is a perfectly valid way of specifying a host, it's more typical to define a host based on a template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core defines a simple template host called generic-host, which could be used by extending the host definition, as with the use directive:

define host {
    use                 generic-host
    name                sparta
    host_name           sparta.example.net
    address             192.0.2.21
    max_check_attempts  3
    contacts            nagiosadmin
    check_period        24x7
    check_command       check-host-alive
}

This uses all the parameters defined for generic-host and then adds on the details of the specific host that needs to be checked. If you're curious to see what's defined in generic-host, you'll find its definition by navigating to /usr/local/nagios/etc/objects/templates.cfg.

Nagios Core Administration Cookbook Second Edition - Second Edition

By : Tom Ryder

Nagios Core Administration Cookbook Second Edition - Second Edition

By: Tom Ryder

Overview of this book

Related Content you might be interested in

Current Title:

Nagios Core Administration Cookbook Second Edition - Second Edition

Creating a new network host

Getting ready

How to do it...

How it works...

There's more...

See also