Nagios Core Administration Cookbook

Nagios Core Administration Cookbook

By : Tom Ryder

Buy this Book

Nagios Core Administration Cookbook

By: Tom Ryder

Buy this Book

Overview of this book

Network monitoring requires significantly more than just pinging hosts. This cookbook will help you to comprehensively test your networks' major functions on a regular basis."Nagios Core Administration Cookbook" will show you how to use Nagios Core as a monitoring framework that understands the layers and subtleties of the network for intelligent monitoring and notification behaviour. Nagios Core Administration Guide introduces the reader to methods of extending Nagios Core into a network monitoring solution. The book begins by covering the basic structure of hosts, services, and contacts and then goes on to discuss advanced usage of checks and notifications, and configuring intelligent behaviour with network paths and dependencies. The cookbook emphasizes using Nagios Core as an extensible monitoring framework. By the end of the book, you will learn that Nagios Core is capable of doing much more than pinging a host or to check if websites respond.

Nagios Core Administration Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Understanding Hosts, Services, and Contacts

Introduction

Creating a new network host

Creating a new HTTP service

Creating a new e-mail contact

Verifying configuration

Creating a new hostgroup

Creating a new servicegroup

Creating a new contactgroup

Creating a new time period

Running a service on all hosts in a group

Working with Commands and Plugins

Creating a new command

Customizing an existing command

Using an alternative check command for hosts

Writing a new plugin from scratch

Working with Checks and States

Introduction

Specifying how frequently to check a host or service

Changing thresholds for PING RTT and packet loss

Changing thresholds for disk usage

Scheduling downtime for a host or service

Managing brief outages with flapping

Adjusting flapping percentage thresholds for a service

Configuring Notifications

Introduction

Configuring notification periods

Configuring notification for groups

Specifying which states to be notified about

Tolerating a certain number of failed checks

Automating contact rotation

Defining an escalation for repeated notifications

Defining a custom notification method

Monitoring Methods

Introduction

Monitoring PING for any host

Monitoring SSH for any host

Checking an alternative SSH port

Monitoring mail services

Monitoring web services

Checking that a website returns a given string

Monitoring database services

Monitoring the output of an SNMP query

Monitoring a RAID or other hardware device

Creating an SNMP OID to monitor

Enabling Remote Execution

Introduction

Monitoring local services on a remote machine with NRPE

Setting the listening address for NRPE

Setting allowed client hosts for NRPE

Creating new NRPE command definitions securely

Giving limited sudo privileges to NRPE

Using check_by_ssh with key authentication instead of NRPE

Using the Web Interface

Introduction

Using the Tactical Overview

Viewing and interpreting availability reports

Viewing and interpreting trends

Viewing and interpreting notification history

Adding comments on hosts or services in the web interface

Viewing configuration in the web interface

Scheduling checks from the web interface

Acknowledging a problem via the web interface

Managing Network Layout

Introduction

Creating a network host hierarchy

Using the network map

Choosing icons for hosts

Establishing a host dependency

Establishing a service dependency

Monitoring individual nodes in a cluster

Using the network map as an overlay

Managing Configuration

Introduction

Grouping configuration files in directories

Keeping configuration under version control

Configuring host roles using groups

Building groups using regular expressions

Using inheritance to simplify configuration

Defining macros in a resource file

Dynamically building host definitions

Security and Performance

Introduction

Requiring authentication for the web interface

Using authenticated contacts

Writing debugging information to a Nagios log file

Monitoring Nagios performance with Nagiostats

Improving startup times with pre-cached object files

Setting up a redundant monitoring host

Automating and Extending Nagios Core

Introduction

Allowing and submitting passive checks

Submitting passive checks from a remote host with NSCA

Submitting passive checks in response to SNMP traps

Setting up an event handler script

Tracking host and service states with Nagiosgraph

Reading status into a MySQL database with NDOUtils

Writing customized Nagios Core reports

Getting extra visualizations with NagVis

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Creating a new HTTP service

In this recipe, we'll create a new service to check on an existing host. Specifically, we'll check our sparta.naginet server to see if it's responding to HTTP requests on the usual HTTP TCP port 80. To do this, we'll be using a predefined command called check_http , which in turn uses one of the standard set of Nagios Core plugins, also called check_http. If you don't yet have a web server defined as a host in Nagios Core, then you may like to try the Creating a new network host recipe in this chapter.

After we've done this, not only will our host be checked for a PING response by check_command, but Nagios Core will also run a periodic check to ensure that an HTTP service on that machine is responding to requests on the same host.

Getting ready

You'll need a working Nagios Core 3.0 or greater installation with a web interface, all the Nagios Plugins installed, and at least one host defined. If you need to set up a host definition for your web server first, then you might like to read the Creating a new network host recipe in this chapter, for which the requirements are the same.

It would be a good idea to test that the Nagios Core server is actually able to contact the web server first, to ensure that the check we're about to set up should succeed. The standard telnet tool is a fine way to test that a response comes back from TCP port 80 as we would expect from a web server:

tom@olympus:~$ telnet sparta.naginet 80
Trying 10.128.0.21...
Connected to sparta.naginet.
Escape character is '^]'.

How to do it...

We can create the service definition for sparta.naginet as follows:

Change to the directory containing the file in which the sparta.naginet host is defined, and edit it as follows:
```
# cd /usr/local/nagios/etc/objects
# vi sparta.naginet.cfg
```

Add the following code snippet to the end of the file, substituting in the value of the host's host_name directive:

define service {
    host_name              sparta.naginet
    service_description    HTTP
    check_command          check_http
    max_check_attempts     3
    check_interval         5
    retry_interval         1
    check_period           24x7
    notification_interval  60
    notification_period    24x7
    contacts               nagiosadmin
}

Restart the Nagios Core server:
```
# /etc/init.d/nagios restart
```

If the server restarted successfully, the web interface should show a new service under the Services section, in PENDING state as the service awaits its first check:

Within a few minutes, the service's state should change to OK once the check has run and succeeded with an HTTP/1.1 200 OK response, or similar:

If the check had problems, perhaps because the HTTP daemon isn't running on the target server, then the check may show CRITICAL instead. This probably doesn't mean the configuration is broken; it more likely means the network or web server isn't working:

How it works...

The configuration we've added adds a simple service check definition for an existing host, to check up to three times whether the HTTP daemon on that host is responding to a simple HTTP/1.1 request. If Nagios Core can't get a response to its check, then it will flag the state of the service as CRITICAL, and will try again up to two more times before sending a notification. The service will be visible in the Nagios Core web interface and we can check its state at any time. Nagios Core will continue testing the server on a regular basis and flagging whether the checks were successful or not.

It's important to note that the service is like a property of a particular host; we define a service to check for a specific host, in this case, the sparta.naginet web server. That's why it's important to get the definition for host_name right.

The directives we defined in the preceding configuration are as follows:

host_name : This references the host definition for which this service should apply. This will be the same as the host_name directive for the appropriate host.
service_description : This is a name for the service itself, something human-recognizable that will appear in alerts and in the web interface for the service. In this case, we've used HTTP.
check_command : This references the command that should be used to check the service's state. Here, we're referring to a command defined in Nagios Core's default configuration called check_http, which refers to a plugin of the same name in the Nagios Core Plugins set.
max_check_attempts : This defines the number of times Nagios Core should attempt to re-check the service after finding it in a state other than OK.
check_interval : This defines how long Nagios Core should wait between checks when the service is OK, or after the number of checks given in max_check_attempts has been exceeded.
retry_interval : This defines how long Nagios Core should wait between retrying checks after first finding them in a state other than OK.
check_period : This references the time period during which Nagios Core should run checks of the service. Here we've used the sensible 24x7 time period, as defined in Nagios Core's default configuration. Note that this can be different from notification_period; we can check the service's status without necessarily notifying a contact.
notification_interval : This defines how long Nagios Core should wait between re-sending notifications when a service is in a state other than OK.
notification_period : This references the time period during which Nagios Core should send notifications if it finds a host in a problem state. Here we've again used 24x7, but for some less critical services it might be appropriate to use a time period such as workhours.

Note that we added the service definition in the same file as defining the host, and directly after it. We can actually place the definition anywhere we like, but this happens to be a good way of keeping things organized.

There's more...

The service we've set up to monitor on sparta.naginet is an HTTP service, but that's just one of many possible services we could monitor on our network. Nagios Core defines many different commands for its core plugin set, such as check_smtp, check_dns, and so on. These commands, in turn, all point to programs that actually perform a check and return the results to the Nagios Core server to be dealt with. The important thing to take away from this is that a service can monitor pretty much anything, and there are hundreds of plugins available for common network monitoring checks available on the Nagios Exchange website: http://exchange.nagios.org/.

There are a great deal more possible directives for services, and in practice it's more likely for even simple setups that we'll want to extend a service template for our service. This allows us to define values that we might want for a number of services, such as how long they should be in a CRITICAL state before a notification event takes place and someone gets contacted to deal with the problem.

One such template that Nagios Core's default configuration defines is called generic-service, and we can use it as a basis for our new service by referring to it with the use keyword:

define service {
    use                    generic-service
    host_name              sparta.naginet
    service_description    HTTP
    check_command          check_http
}

This may work well for you, as there are a lot of very sensible default values set by the generic-service template, which makes things a lot easier. We can inspect these values by looking at the template's definition in /usr/local/nagios/etc/objects/templates.cfg. This is the same file that includes the generic-host definition that we may have used earlier.

Nagios Core Administration Cookbook

By : Tom Ryder

Nagios Core Administration Cookbook

By: Tom Ryder

Overview of this book

Related Content you might be interested in

Current Title:

Nagios Core Administration Cookbook

Creating a new HTTP service

Getting ready

How to do it...

How it works...

There's more...

See also