Monitoring network computers with the Icinga Nagios fork

Server Observer

Article from ADMIN 01/2010
By
A network monitor supports administrators by displaying a full set of critical information at a central location and alerting in case of trouble.

A server can struggle for many reasons: System resources like the CPU, RAM, or hard disk space could be overloaded, or network services might have crashed. Depending on the applications that run on a server, consequences can be dire – from irked users to massive financial implications. Therefore, it is more important than ever in a highly networked world to be able to monitor the state of your server and take action immediately. Of course, you could check every server and service individually, but it is far more convenient to use a monitoring tool like Icinga.

Nagios Fork

Icinga [1] is a relatively young project that was forked from Nagios [2] when development of the popular open source network monitor stagnated. Icinga delivers improved database connectors (for MySQL, Oracle, and PostgreSQL), a more user-friendly web interface, and an API that lets administrators integrate numerous extensions without complicated modification of the Icinga core. The Icinga developers also seek to reflect community needs more closely and to integrate patches more quickly. The first stable version, 1.0, was released in December 2009, and the version counter has risen every couple of months ever since.

Icinga comprises three components: the core, the API, and the optional web interface. The core collects system health information generated by plugins and passes it via the IDOMOD interface to the Icinga Data Out Database (IDODB) or the IDO2DB service daemon. The PHP-based API accepts information from the IDODB and displays it in a web-based interface. Additionally, the API facilitates the development of add-ons and plugins. Icinga Web is designed to be a state-of-the-art web interface that is easily customized and with which administrators can keep an eye on the state of the systems they manage. At the time of writing, Icinga Web is still in beta, and it has a couple of bugs that make it difficult for me to recommend for production use.

If you only need to monitor a single host, Icinga is installed easily. Some distributions offer binaries in their repositories, but if this is not the case or if you prefer to use the latest version, the easy-to-understand documentation includes a quick-start guide (for the database via libdbi with IDOUtils), which can help you set up the network monitor in next to no time for access at http://Server/icinga. The challenges come later, because it is highly likely you will want to monitor a larger number of computers.

Icinga can monitor the private services on a computer, including CPU load, RAM, and disk usage, as well as public services like web, SSH, mail, and so on. My lab network environment consists of three computers, one of which acts as the Icinga server; the other two are a web server and a file server that send information to the monitoring server. Because no native approach lets you request information externally about CPU load, RAM, or disk space usage, you need to install a verbose add-on, such as NRPE [3], on each machine. The remote Icinga server will tell it to execute the plugins on the local machine and transmit the required information.

Icinga sends the system administrator all the information needed and alerts the admin in case of an emergency. Advanced features that are a genuine help in daily work include groups, redundant monitoring environments, notification escalation, or check schedules.

Icinga differentiates between active and passive checks. Active checks are initiated by the Icinga service and run regularly at times specified by the administrator. For a passive check, an external application does the work and forwards the results to the Icinga server, which is useful if you can't actively check the computer because it resides behind a firewall, for example. A large number of plugins [4] already exist for various styles in Nagios and Icinga. But before the first check, the administrator needs to configure the computers and the services to monitor in Icinga.

The individual elements involved in a check are referred to as objects in Icinga. Objects include hosts, services, contacts, commands, and time slots. To facilitate the daily work, you can group hosts, services, and contacts. The individual objects are defined in CFG files, which reside below Icinga's etc/objects directory. The network monitor includes a number of sample definitions of various objects that administrators only need to customize.

In principle, you can define multiple objects in a CFG file, but you can just as easily create separate files for each object in a directory below /path-to-Icinga /etc/objects. Lines that start with a hash mark within an object definition are regarded as comments, as is everything within a line to the right of a semicolon.

Defining Hosts and Services

Listing 1 provides a sample host definition. The host is the web server at a language center (display_name) and is displayed accordingly in the web interface. To inform the administrator (contacts) when the server goes down (notification_options), I want Icinga to ping (check_command) the server every 5 minutes (check_interval). If the server is still down 60 minutes (notification_interval) after notifying the administrator, I want to send another message. Icinga is capable of deciding whether a host is down or unreachable (see Table 1). However, to determine that a host is unreachable, you have to define the nodes passed along the route to the host as parents – and this will only work if the routes for outgoing packets are known. The file server definition looks similar.

Listing 1

my_hosts.cfg

# Webserver
define host{
        host_name               webserver
        alias                   languagecenter
        display_name                    Server at language center
        address                 141.20.108.124
        active_checks_enabled   1
        passive_checks_enabled  0
        max_check_attempts              3
        check_command                   check-host-alive
        check_interval                  5
        retry_interval                  1
        contacts                                spz_admin
        notification_period             24x7
        notification_interval   60
        notification_options    d
        }
# Fileserver
define host{
        host_name               fileserver
        alias                   Fileserver
        display_name                    Fileserver
        address                 192.168.10.127
        active_checks_enabled   1
        passive_checks_enabled  0
        max_check_attempts              3
        check_command                   check-host-alive
        check_interval                  5
        retry_interval                  1
        contacts                                admin
        notification_period             24x7
        notification_interval   60
        notification_options    d,u,r
        }

Table 1

States

Option Status
Server
o OK
d Down
u Unreachable
r Recovered
Services
o OK
w Warning
c Critical
r Recovered
u Unknown

Once the servers are defined, the administrator configures the respective services that Icinga will monitor (Listing 2), along with the matching commands (Listing 3), the intervals (Listing 4), and the stakeholding administrators (Listing 5). The individual configuration files have a similar structure. For each service, you need to consider the interval between checks. One useful feature is the ability to define time slots, within which Icinga will perform checks and, if necessary, notify the administrator. Here, time limitations or holidays can be defined. The contact configuration can include email addresses or cell phone numbers, but to integrate each contact with, for example, an Email2SMS gateway or a Text2Speech system (e.g., Festival), you need a matching command.

Listing 2

my_services.cfg (Excerpt)

# SERVICE DEFINITIONS
define service{
                host_name                               webserver
                service_description     HTTP
                active_checks_enabled   1
                passive_checks_enabled  0
                check_command                   check_http
                max_check_attempts              3 ;how often to perform the check before Icinga notifies
                check_interval                  5
                retry_interval                  1
                check_period                    24x7
                contacts                                spz_admin
                notifications_enabled   1
                notification_period             weekdays
                notification_interval   60
                notification_options    w,c,u,r
                }
define service{
                host_name                               fileserver, webserver
                service_description             SSH
                active_checks_enabled   1
                passive_checks_enabled  0
                check_command                   check_ssh
                max_check_attempts              3
                check_interval                  15
                retry_interval                  1
                check_period                    24x7
                contacts                                admin
                notifications_enabled   0
                }

Listing 3

commands.cfg (Excerpt)

# 'notify-service-by-email' command definition
define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "**$NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTA
TE$ **" $CONTACTEMAIL$
        }
# 'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p
 5
        }

Listing 4

timeperiods.cfg (Excerpt)

define timeperiod{
        timeperiod_name 24x7
        alias                   24 Hours A Day, 7 Days A Week
        sunday                  00:00-24:00
        monday                  00:00-24:00
        tuesday                 00:00-24:00
        wednesday               00:00-24:00
        thursday                00:00-24:00
        friday                  00:00-24:00
        saturday                00:00-24:00
        }
define timeperiod{
        timeperiod_name wochentags
        alias                   Robot Robot
        monday                  07:00-17:00
        tuesday                 07:00-17:00
        wednesday               07:00-17:00
        thursday                07:00-17:00
        friday                  07:00-17:00
        }

Listing 5

contacts.cfg (Excerpt)

define contact{
        contact_name                                    icingaadmin
        alias                                                   Falko Benthin
        host_notifications_enabled              1
        service_notifications_enabled   1
        host_notification_period                24x7
        service_notification_period             24x7
        host_notification_options               d,u,r
        service_notification_options    w,u,c,r
        host_notification_commands              notify-host-by-email
        service_notification_commands   notify-service-by-email
        email                                                   root@localhost
        }

Icinga can use macros, which noticeably simplifies and accelerates many tasks because you can use a single command for multiple hosts and services. Listings 2 and 3 give examples of macros. All services defined for monitoring the file server include a check_nrpe instruction with an exclamation mark. Each exclamation mark can be followed by an argument, which in turn is evaluated by the macros in other definitions. Macros are nested in $ signs.

After creating the configuration files and storing them in etc/objects, you still need to tell Icinga by adding a new cfg_file=/usr/local/icinga/etc/objects/object .cfg to the main configuration file, /etc/icinga.cfg. After doing so, you should verify the configuration, /path-to-Icinga /bin/icinga -v /path-to-Icinga /etc/icinga.cfg; assuming there are no errors, restart Icinga (/etc/init.d/icinga restart).

GUI and Messages

Icinga works without a graphical interface, but it's much nicer to have one. The standard interface can't deny its Nagios ancestry, but it is clear-cut and intuitive.

If everything is working, you'll see a lot of green in the user interface (Figure 1), but if something goes wrong somewhere, the color will change and move closer and closer to red to reflect the status of the hosts or services (Figures 2 and 3). Status messages are typically linked so that clicking one takes you to more detailed information.

Figure 1: If the hosts are healthy, the admin is happy.
Figure 2: Everything is working, but the nrp plugin is causing problems.
Figure 3: A manual check of the commands defined in commands.cfg reveals the culprit.

If something is so drastically wrong that a message is necessary, Icinga will check its complex ruleset to see whether it should send a message and, if so, to whom (Figure 4). The filters through which the message passes check the following: whether notifications are required, if the problem occurred at a time when the host and service should be running, if messages should be sent for this service in the current time slot, and what the contacts linked to the service actually want. Each contact can define its own rules to stipulate when it wants to receive messages and for what status. If multiple administrators exist and belong to a single group, Icinga will notify all of them. Again, you can define individual notification periods so that each admin will be responsible for one period.

Figure 4: Mail dispatched by Icinga is short and to the point.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Understanding Autodiscovery

    A lack of information about your infrastructure can result in faulty system configuration and other difficulties. Automatic discovery of all hosts and services would seem to be the best solution – but can it also prove itself in practice?

  • Monitoring with collectd 4.3
    Collectd 4.3 is a comprehensive monitoring tool with a removable plugin architecture.
  • Nagios Passive Checks
    Why spam yourself with useless notifications every time a script completes successfully? You can use Nagios to screen the notices and just send the ones that need action.
  • OpenNebula – Open source data center virtualization
    The OpenNebula enterprise cloud management platform emerged in 2005, so it has been on the market longer than many comparable products. In the current version 4.2 (code-named Flame), it has presented itself in a new guise.
  • Integrating OCS information into monitoring with OpenNMS
    If you want to manage large IT environments efficiently, you need automation. In this article, we describe how to transfer information automatically from the OCS network inventory system to the OpenNMS network monitoring tool.
comments powered by Disqus

Special Edition

  • Happy SysAdmin Day!

    Download the free special edition “10 More Terrific Tools for the Busy Admin” courtesy of ADMIN  magazine.

Newsletter

Subscribe to ADMIN Update for IT news and technical tips.

ADMIN Magazine on Twitter

Follow us on twitter