Book Image

Monitoring Hadoop

By : Aman Singh
Book Image

Monitoring Hadoop

By: Aman Singh

Overview of this book

Table of Contents (14 chapters)

System logging


Logging is an important part of any application or a system, as it tells you about the progress, errors, states of services, security breaches, and repeated user failures, and this helps you in troubleshooting and analyzing these events. The important features about logs are collecting, transporting, storing, alerting, and analyzing the events.

Collection

Logs can be generated in many ways. They can be generated either through system facilities, such as syslog or through applications that can directly write their logs. In either case, the collection of the logs must be organized so that they can be easily retrieved when needed.

Transportation

The logs can be transferred from multiple nodes to a central location, so that instead of parsing logs on hundreds of servers individually, you can maintain them in an easy way by central logging. The size of the logs transferred across the network, and how often we need to transfer them, are also matters of concern.

Storage

The storage needs will depend upon the retention policy of the logs, and the cost will also vary according to the storage media or the location of storage, such as cloud storage or local storage.

Alerting and analysis

The logs collected need to be parsed and the alerts should be sent for any errors. The errors need to be detected in a speculated time frame and remediation should be provided.

Analyzing the logs to identify the traffic patterns of a website is important. The apache web server hosting a website and its logs needs to be analyzed, which IPs were visited, using which user agent or operating system. All of this information can be used to target advertisements at various sections of the internet user base.

The syslogd and rsyslogd daemons

The logging into the Linux system is controlled by the syslogd daemons and recently by rsyslogd daemons. There is one more logger called klogd, which logs kernel messages.

The syslogd is configured by /etc/syslogd.conf and the format of the file is defined as facility.priority log_location.

The logging facility and priority is described in the tables as follows:

Facility

Description

authpriv

These are the security / authorization messages.

cron

These are the clock daemons (atd and crond).

kern

These are the kernel messages.

local[0-7]

These are reserved for local use.

mail

This is the e-mail system.

The table shown here describes the priority:

Priority

Description

debug

This displays the debugging information.

info

This displays the general informative messages.

warning

This displays the warning messages.

err

This displays an error condition.

crit

This displays the critical condition.

alert

This displays an immediate action that is required.

emerg

This displays that the system is no longer available.

For example, the logging events for an e-mail event can be configured as follows:

mail.* /var/log/mail

This command logs all the e-mail messages to the /var/log/messages file.

Here's another example; start the logging daemon and it will start capturing the logs from the various daemons and applications. Use the following command to perform this action:

$ service syslogd/rsyslog restart

Note

In the versions released after RHEL 5 or Centos 5, syslog has been replaced by rsyslogd.