Book Image

Zabbix: Enterprise Network Montioring Made Easy

By : Rihards Olups, Patrik Uytterhoeven, Andrea Dalle Vacche
Book Image

Zabbix: Enterprise Network Montioring Made Easy

By: Rihards Olups, Patrik Uytterhoeven, Andrea Dalle Vacche

Overview of this book

Nowadays, monitoring systems play a crucial role in any IT environment. They are extensively used to not only measure your system’s performance, but also to forecast capacity issues. This is where Zabbix, one of the most popular monitoring solutions for networks and applications, comes into the picture. With an efficient monitoring system in place, you’ll be able to foresee when your infrastructure runs under capacity and react accordingly. Due to the critical role a monitoring system plays, it is fundamental to implement it in the best way from its initial setup. This avoids misleading, confusing, or, even worse, false alarms that can disrupt an efficient and healthy IT department. This course is for administrators who are looking for an end-to-end monitoring solution. It will get you accustomed with the powerful monitoring solution, starting with installation and explaining the fundamentals of Zabbix. Moving on, we explore the complex functionalities of Zabbix in the form of enticing recipes. These recipes will help you to gain control of your infrastructure. You will be able to organize your data in the form of graphs and charts along with building intelligent triggers for monitoring your network proactively. Toward the end, you will gain expertise in monitoring your networks and applications using Zabbix. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Zabbix Network Monitoring-Second Edition Zabbix Cookbook Mastering Zabbix-Second Edition
Table of Contents (51 chapters)
Zabbix: Enterprise Network Montioring Made Easy
Zabbix: Enterprise Network Montioring Made Easy
Credits
Preface
6
Detecting Problems with Triggers
7
Acting upon Monitored Conditions
Bibliography
Index

Common issues


Here are a few issues you might face:

Installation

There are several common stumbling blocks in the installation process, some caused by well-hidden factors.

Compilation

  • Q: I am trying to compile Zabbix on a 64-bit system. I have the corresponding development packages installed, but Zabbix claims they are not present.

  • A: Double-check that the 64-bit development packages are installed, not just the 32-bit ones.

  • Q: I am trying to compile Zabbix from an SVN checkout, but the configuration script fails with this error:

    syntax error near unexpected token `IKSEMEL,iksemel,'
  • A: Install the pkg-config package and rerun the commands to generate the configuration script.

  • Q: I am trying to compile Zabbix, but it fails.

  • A: It is useful to reduce the number of possible causes. Verify that you are not compiling with --enable-static, which is known to cause compilation problems. If compilation fails without that flag, check the config.log file contents in the source directory. It often contains exact error details.

Frontend

  • Q: I have installed the Zabbix frontend. What's the default username and password?

  • A: The username is Admin, and the password is zabbix.

  • Q: I'm setting up Zabbix from an SVN checkout. When I switch languages in the frontend, nothing happens.

  • A: In the frontend directory, in the locale subdirectory, there's a make_mo.sh script. It compiles the needed mo files out of the translation source po files—run it. Note that it will need Gettext tools, and the webserver might have to be restarted afterwards.

Backend

  • Q: Zabbix is working correctly, but some/all graphs are not displayed.

  • A: Refer to the Apache error log for more details. Usually, this is caused by the PHP script memory limit being too low—if that is the case, increase it by setting the memory_limit parameter to a higher value and restarting the webserver. Another possible cause is a broken conf/zabbix.conf.php file—verify that it does not have any weird characters, especially at the end of the file.

  • Q: Complex views, such as screens with many elements, sometimes fail to load. What could be causing this?

  • A: Like the previous problem, check that the PHP memory limit has not been exceeded. Additionally, check the PHP script timeout (max_execution_time parameter) and increase it if necessary.

  • Q: My graphs have gaps.

  • A: It's not only graphs—data is missing in the database as well. This problem should be resolved by finding out what causes the data loss. Common reasons for this are:

    • Network problems: If the network is unreliable, data will be missing.

    • An overloaded monitored device: For example, if you have added a switch with many ports and are monitoring several items on each port very often, try increasing the intervals and disabling unneeded items.

    • An overloaded Zabbix server: It's usually the database. Check the system load on the Zabbix database server, especially iowait.

  • Q: I had Zabbix installed and running, but it is suddenly showing me the installation screen again.

  • A: Check the accessibility of the conf/zabbix.conf.php file.

  • Q: The conf/zabbix.conf.php file is there, but I still see the installation screen.

  • A: In some distribution packages, the frontend might expect the frontend configuration file to be in /etc/zabbix/web or a similar location. Check the package documentation.

  • Q: I am trying to open a large page with many elements, but refresh kicks in before the page even finishes loading. How can I solve this?

  • A: Increase the refresh period in your user profile. While the page loading speed won't be improved by that, at least the page will get a chance to load completely.

  • Q: The clock on my server is correct, but the frontend shows incorrect times.

  • A: Check that the time zone is set correctly in the PHP configuration.

  • Q: Zabbix server is running, but the frontend claims it is not.

  • A: This could be caused by multiple factors:

    • Check the conf/zabbix.conf.php file—the frontend uses the server address and port specified there to query the Zabbix server process.

    • Make sure no firewall is blocking connections from the frontend to the Zabbix server.

    • Make sure SELinux is not blocking connections from the frontend to the Zabbix server.

    • Make sure you have at least one trapper process enabled—they accept frontend connections. It is also possible that there are not enough trappers to service all requests. This is especially likely if the message about the server being unavailable appears only every now and then. Monitor the busy rate of the trapper processes like we did in Chapter 22, Zabbix Maintenance.

  • Q: I am having a problem with the frontend that is not listed here.

  • A: Check the Apache error log and PHP log—these often offer an insight into the cause. Also go to Administration | Users or User groups and add your user to the Enabled debug mode group. Afterwards, all frontend pages will have a small Debug control in the lower-right corner. Clicking on it will show a lot of detail about that specific page, including the exact API and SQL queries that were performed. Debug mode can use more resources—if some frontend pages stop working after enabling debug mode, try disabling it.

Locked out of the frontend

A common mistake, performed by both new and seasoned users, is locking oneself out of the frontend. This can happen in several ways, but we're more interested here in how to get back in.

  • Q: I forgot my password and tried to log in until the Zabbix frontend stopped responding.

  • A: By default, Zabbix denies access for 30 seconds after five failed login attempts, so just wait for 30 seconds. You can customize these values in includes/defines.inc.php:

    • ZBX_LOGIN_ATTEMPTS: The number of unsuccessful attempts after which Zabbix denies access

    • ZBX_LOGIN_BLOCK: How long to deny access for, in seconds

  • Q: I have forgotten my Admin user password, or I have been tasked with managing a Zabbix installation where the Admin user's password is not known.

  • A: You can easily reset the Admin user password by directly modifying the database:

    mysql> update zabbix.users set passwd=MD5('somepassword') where alias='Admin';
    

    Note

    Of course, replace somepassword with some other string. Keep in mind that MySQL by default saves console commands in the ~/.mysql_history file, so you might want to set the password to some temporary version and update it in the frontend later.

  • Q: I changed the authentication method, but it didn't work as planned and now I can't log in anymore.

  • A: You can restore Zabbix's internal authentication method by editing the database:

    mysql> update zabbix.config set authentication_type='0' where configid='1';
    

Authentication type 0 is the internal one. For the record, other types are 1 (LDAP) and 2 (HTTP). Zabbix expects only one config table entry with a configid value of 1.

Monitoring

Sometimes monitoring something proceeds without a hitch; sometimes it just won't work.

General monitoring

  • Q: I added a host or item, but I don't see it in Monitoring | Latest data.

  • A: Check that the filter there includes the host or its group. Make sure that the Show items without data checkbox is marked and that other filter options do not exclude the item you are looking for.

  • Q: I can see my host in Latest data, and new values are coming in—but it is missing in Monitoring | Overview.

  • A: Overview is probably set to display triggers—verify that the host has triggers configured. Hosts without triggers are not displayed in trigger mode.

Monitoring with the Zabbix agent

  • Q: I am trying to monitor a host using passive Zabbix agent checks, but it doesn't work.

  • A: Common reasons why Zabbix agent items won't work include the following:

    • The Zabbix agent daemon is not running. Simple, right? Still, start by checking that it is actually running.

    • The Zabbix daemon is not listening on the correct port or interface. You can check which port and interface the Zabbix agent daemon is listening on by running netstat -ntpl on the monitored host. The default agent daemon port is 10050.

    • The server IP address in the agent daemon configuration file is incorrect. Check the configuration file and make sure the server directive specifies the IP that the Zabbix server will be connecting from.

    • Network problems prevent the server from connecting to the agent daemon properly. This includes things such as local and network firewalls blocking connections, but also some network devices and setups actually changing the source IP address of the Zabbix server outgoing connections. Test connectivity by executing telnet <monitored host IP> 10050 from the Zabbix server. If you have customized the agent listen port, use that port in this command. If the connection is not opened, debug it as a network problem. If the connection is immediately closed, the Zabbix agent daemon does not see the connection as coming from the IP address set in the configuration file. Note that, in some cases, you might have to actually use the IPv6 address, as the Zabbix agent is receiving that as one of the incoming connections.

  • Q: I am trying to monitor a host using active Zabbix agent checks, but it does not work.

  • A: Active items are a bit more tricky. Here are some things to verify:

    • Check network connectivity as with normal items—from the monitored machine, execute telnet <Zabbix server IP> 10051. If you have customized the agent listen port, use that port in this command.

      Note

      The Zabbix proxy IP address and port should be used in almost all commands if the host is monitored by a proxy.

    • Make sure to wait for the Zabbix server to refresh its configuration cache and that the time specified in the RefreshActiveChecks option in the agent daemon configuration file has passed before expecting results from the active items. If you want to force the agent to reload the list of items from the server, restart the agent.

    • Check whether the host name specified in the agent daemon configuration file in the Hostname option matches the one configured for the host in the frontend. Note that this is not the IP address or DNS name; only the host name will work—it is also not the visible name, but the so-called technical host name. Like nearly everything else in Zabbix, it is case-sensitive.

    • Make sure that the Zabbix server you want to send active checks to (or retrieve them from) is listed in the ServerActive option in the agent daemon configuration file.

  • Q: I am verifying that I can get the value on the monitored host, but the Zabbix agent says it is not supported or gives me wrong data.

  • A: There are several possible cases:

    • You are checking things such as the process count or using the zabbix_agentd -t syntax as root, but various permission limitations, including grsecurity and SELinux, can prevent access for the Zabbix agent. This includes the Zabbix agent showing the number of unique running processes as 0 even when with root access you can see the actual number.

    • Another case when the local process count differs from what the Zabbix agent returns: various interpreted processes, such as Python or Perl ones, can appear to the agent as interpreter processes, only with user processes as a parameter. Processes known to display this problem include amavisd and xend. In those situations, you can use a different approach, for example with the proc.num[python,,,xend] item key. This will look for Python processes having the xend string in their parameters.

    • The monitored instance is missing. For example, if you are asking for a metric with the net.if.in[eth0,bytes] key and the Zabbix agent claims it is not supported, verify that the eth0 interface actually exists.

    • Another server has an active Zabbix agent configured with the same host name and is also sending in data for this host.

  • Q: I modified a parameter in the agent daemon configuration file, but it ignores my changes.

  • A: Check several things:

    • Verify that the modified line is not commented out.

    • Make sure you are editing the file on the correct system.

    • Check that the Zabbix agent daemon uses the modified configuration file. All Zabbix daemons log the configuration file they are using when starting up.

    • Check for Include directives. Pay extra attention to ones that include all files in a directory, and nested includes.

    • Make sure you properly restarted the daemon. Note that simply running zabbix_daemon or zabbix_daemon restart will not restart the daemon.

      Note

      Some distribution packages may provide a configuration file and a convenient symlink to that file. If you use sed -i on a symlink, it does not edit the target file—it replaces the symlink with a regular file instead. Some versions of sed may provide an option called --follow-symlinks to edit the target file.

  • Q: I see the configuration file specifying one value for a parameter, but the Zabbix agent uses a different value.

  • A: Refer to the answer to the previous question, especially the parts about making sure it's the correct file on the correct system and that Include directives do not override the first instance of the parameter.

  • Q: I'm trying to use active items or autoregistration on a Windows system, but the automatically acquired hostname is all uppercase and cut at 15 characters.

  • A: Set HostnameItem=system.hostname[host] in the agent daemon configuration file. We discussed this in Chapter 14, Monitoring Windows.

  • Q: I verified that an item works as expected when running zabbix_agentd -t or -p, but it does not work when I check the values in the frontend.

  • A: When manually running zabbix_agentd, the user and environment are likely different from a running daemon, so permissions and environment values differ. Check the detailed operations that that item is expected to perform and what could prevent it from succeeding with the Zabbix agent daemon permissions. Do not test zabbix_agentd directly as root. The best approach is testing against a running agent daemon with zabbix_get.

  • Q: I can get item values in Zabbix Server or with zabbix_get, but when I test with zabbix_agentd -t or -p, I get an error [m|ZBX_NOTSUPPORTED] [Collector is not started.].

  • A: Some items, including system.cpu.util and proc.cpu.util, have their values calculated by a running agent, as they need multiple samples before providing a useful value. Such items only work when an agent daemon is queried by Zabbix Server or zabbix_get.

User parameters

The following list details queries related to user parameters:

  • Q: My user parameter does not work.

  • A: Common causes that break user parameters are as follows:

    • A missing environment is one of the biggest stumbling blocks when setting up user parameters. The Zabbix agent does not explicitly initialize environment details, such as the HOME variable or other information. This can lead to an inability to read the required configuration files and other issues. Make sure to set the environment as required either by setting variables in the user parameter directly or in a wrapper script.

    • Again, restricted permissions for the Zabbix user will be confusing to debug if you run commands for testing as root, so always test user parameters as the Zabbix user. If you need root access for some check, configure access via sudo.

    • Returning unclean data can also easily break data retrieval. When retrieving data with user parameters, make sure it does not contain characters making it unsuitable for storage (such as returning 26.6 C for a float datatype item) or has other weird characters (such as having a CR/LF newline at the end of the data string).

    • By default, agent items will time out after three seconds. It is not suggested to increase this timeout in most cases, although it might be reasonably safe to do so if the userparameter variable is used as an active item. Remember that active items are not parallel—only one agent process works on them, one item at a time. Consider using zabbix_sender for such items instead.

SNMP devices

  • Q: My SNMP items do not work.

  • A: Double-check that the SNMP version and community string are set correctly. Specifying an incorrect SNMP version will often cause timeouts, making it harder to debug. Of course, check general connectivity and permissions by using the snmpwalk and snmpget commands from the Zabbix server.

    Note

    Additionally, make sure you are not overloading the monitored device by querying lots of values too frequently.

  • Q: My SNMP items either do not work at all or fail frequently.

  • A: Perhaps your device does not properly support SNMP GETBULK—try disabling bulk get support in the host properties for the SNMP interface.

  • Q: I imported a template, but the LLD fails with Invalid SNMP OID: pairs of macro and OID are expected.

  • A: The Zabbix SNMP LLD key syntax changed in Zabbix 2.4. Unfortunately, the XML import process was not updated accordingly, and the imported LLD rule uses the old syntax. Refer to Chapter 12, Automating Configuration, for details on the key syntax.

  • Q: I added MIB files, and they work with the command line tools, but Zabbix Server seems to be ignoring the MIB files.

  • A: Make sure to restart the server daemon—MIB definitions are loaded only upon startup.

  • Q: My SNMP items work, but some OIDs on a specific device do not, even though data appears in snmpwalk output.

  • A: Try snmpget with those OIDs. Some UPSes are known to have buggy firmware that prevents these metrics from working with GET requests, but they do work with the GETNEXT requests that snmpwalk uses. If this is the case, upgrade the firmware on the device.

  • Q: I listed all SNMP MIBs I want to use in /etc/snmp/snmp.conf, but Net-SNMP utilities do not use them all properly.

  • A: Some Net-SNMP versions silently trim lines in this file to 1024 symbols, including the newline character. Try splitting options on multiple lines so that a single line does not exceed 1023 printable characters.

  • Q: I'm monitoring network traffic, but it returns incorrect data.

  • A: If it's a high-speed interface; make sure to use 64-bit counter OIDs such as ifHCInOctets and ifHCOutOctets.

  • Q: I'm adding SNMP devices to Zabbix, but adding new devices stops the monitoring of the previous devices. If I query each device with snmpget, they still respond as expected.

  • A: If it's SNMPv3, make sure all devices have a unique snmpEngineID variable.

IPMI monitoring

  • Q: I can't get the IPMI item to work.

  • A: There are several things to verify when IPMI items do not work:

    • Make sure that the Zabbix server is configured with IPMI support. Simple, but easy to miss.

    • Check whether the StartIPMIPollers option in the server's configuration file is set to the default value, 0. If it is, set it to 1 and restart Zabbix Server.

    • Make sure that the sensor names are correct. You can get the sensor names with IPMItool, and you have to use the name as it appears in the IPMItool output, with spaces and without quoting it.

    • Check using the latest OpenIPMI version. Older OpenIPMI versions are known to have various issues.

ICMP checks

  • Q: All of my ICMP checks are failing.

  • A: Here are a few possible reasons:

    Check that fping may be run setuid as root by the user runs on Zabbix Server.

    Make sure SELinux does not prevent Zabbix from running fping. The grep fping /var/log/audit/audit.log command might reveal more information.

Problems with simple checks

Problems with zabbix_sender and trapper items

  • Q: I send in values with a timestamp, but a different timestamp is entered in the server database.

  • A: The Zabbix sender includes the current time on the host in a clock property for the whole request, and Zabbix server adjusts the timestamp for all values accordingly. It is not possible to tell the server not to do so or the sender not to send it. Either fix the time on the sending system or implement the basic protocol without sending the request timestamp.

General issues

  • Q: I am monitoring network traffic, but the numbers are unrealistically huge.

  • A: As the data is likely provided as a counter, make sure the result is stored as delta (speed per second) on Zabbix.

  • Q: I'm monitoring a 10G interface speed in bytes per second, and when the interface is loaded, I lose values.

  • A: Make sure Type of information is set to Numeric (unsigned). This way, you'll lose the precision of a fraction of a bit, but keep all the values.

  • Q: Zabbix does not like the formula for my calculated item.

  • A: Make sure to use proper quoting, especially if the referenced item keys have quotes. For example, if the referenced item key is key["parameter",param], in the calculated item formula, it can be used like this: last("key[\"parameter\",param]"). Notice the escaping of the inner double quotes with backslashes.

  • Q: I'm trying to use an item key such as proc.num['apache'], but it does not work.

  • A: Zabbix supports only double quotes; do not use single quotes for quoting.

  • Q: I'm trying to use a trigger expression such as {host:item.LAST()=13}, but it does not work.

  • A: Case sensitivity—almost everything is case-sensitive in Zabbix: item keys, their parameters, host names, trigger functions, and so on. If you come from Windows, keep reminding yourself that case matters.

Triggers

  • Q: My trigger does not work, or Zabbix refuses to add my trigger.

  • A: Check the trigger's syntax, paying close attention to parentheses—is the correct type used? Are they all properly closed? The same goes for quotes, and remember about case sensitivity. Try splitting up complex expressions to pinpoint the error.

Actions

  • Q: My actions do not work.

  • A: If the notifications do not appear in Reports | Action log, make sure the user you want to send notifications to has read permission to at least one of the hosts that participated in generating the event. Also check the action conditions, host maintenance settings, and action operations. Make sure your actions are not disabled—Zabbix can silently and automatically disable actions if the resources referenced in action conditions or operations are deleted. Also check the user media settings, such as severity and time filter, and whether the configured media type is enabled. If the messages do appear in the action log and there are error messages, hopefully the error is helpful. If the messages appear in the action log as successfully sent, check the logs on your MTA or other receiving system.

  • Q: My e-mail notifications are not sent, and I can see error messages such as [127.0.0.1] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA in e-mail server log files.

  • A: These messages are most likely caused by Zabbix monitoring the SMTP service, not by notification attempts. Check the permissions as mentioned in the previous question, and check the action log in Reports | Action log to find out why notifications are failing.

  • Q: Something happened, and my Zabbix server is sending out a terrible number of messages. Can I quickly stop that?

  • A: There exists a harsh method to stop runaway or excessive escalations—you can delete all the currently active escalations. Note that even when deleting the active escalations, Zabbix will create new ones—a good way to solve that is to have the action operation condition only send out messages when the trigger is not acknowledged, and acknowledge the problematic triggers. Beware: this will also remove correct escalations. In the correct database, execute this:

    mysql> delete from escalations;
    

Discoveries and autoregistration

  • Q: I remove a host from some host group, but it gets mysteriously re-added later.

  • A: Check network discovery and active agent autoregistration actions—most likely, they re-add the host.

  • Q: I move a host to be monitored by a specific proxy or Zabbix server instance, but it changes back to another proxy or Zabbix server instance later.

  • A: Check active agent autoregistration actions and the ServerActive parameter on the agent. The created host will be assigned to the proxy or server that last received the autoregistration request.

  • Q: I disable an LLD prototype, but the downstream items or triggers are not disabled.

  • A: Unfortunately, that's by design and cannot be changed. You can disable individual items and triggers in the configuration list. For changing the state of many downstream items or triggers, try using the Zabbix API.