Saturday, 28 June 2014

How To : Create Nagios Plugin Using a BASH Script

In the previous article on Nagios, we have learned how Nagios is installed on Linux Systems. Check out the article- How To : Install Nagios Core 4 on Ubuntu Linux.

Now, it's time to move one step forward. In this tutorial, we would learn how a simple BASH script can be used to create a powerful Nagios Plugin in order to monitor any remote Linux server.

Testing Environment Used:

  • Nagios Server: Red Hat Enterprise Linux 6.1
  • Nagios Client: Red Hat Enterprise Linux 6.1
  • Nagios Version: NagiosXI


Some Crucial Points:

  • exit 0 - Whenever the status of the output of the executed script is "OK", Nagios Server would highlight the check with Green color.
  • exit 1 - Whenever the status of the output of the executed script is "WARNING", Nagios Server would highlight the check with Yellow color.
  • exit 2 - Whenever the status of the output of the executed script is "CRITICAL", Nagios Server would highlight the check with Red color.
  • exit 3 - Whenever the status of the output of the executed script is "UNKNOWN", Nagios Server would highlight the check with Grey color.

Syntax of Script's Output

[STATUS]- [INFORMATION TO BE DISPLAYED ON NAGIOS SERVER CONSOLE] | [INFORMATION TO BE DISPLAYED GRAPHICALLY]

Now, create a BASH Script and put it in /usr/local/nagios/libexec/ directory.

Demo Script:

#!/bin/bash

loadavg=$( uptime | awk -F: '{print $4}' | xargs )

load1int=$( echo $loadavg | cut -d "." -f 1 )
load5int=$( echo $loadavg | awk -F, '{print $2}' | xargs | cut -d "." -f 1 )
load15int=$( echo $loadavg | awk -F, '{print $3}' | xargs | cut -d "." -f 1 )

load1=$( echo $loadavg | awk -F, '{print $1}' )
load5=$( echo $loadavg | awk -F, '{print $2}' )
load15=$( echo $loadavg | awk -F, '{print $3}' )

output="Load Average: $loadavg | Load_1min=$load1, Load_5min=$load5, Load_15min=$load15" 

if [ $load1int -le 1 -a $load5int -le 1 -a $load15int -le 1 ]
then
    echo "OK- $output"
    exit 0
elif [ $load1int -le 2 -a $load5int -le 2 -a $load15int -le 2 ]
then
    echo "WARNING- $output"
    exit 1
elif [ $load1int -gt 2 -a $load5int -gt 2 -a $load15int -gt 2 ]
then
    echo "CRITICAL- $output"
    exit 2
else
echo "UNKNOWN- $output"
exit 3
fi
Output:

nagiosserver root [libexec]> ./load_average
OK- Load Average: 0.78, 0.76, 0.78 | Load_1min=0.78, Load_5min= 0.76, Load_15min= 0.78

On Nagios Server:

Add the command to the Nagios Checks

vi /usr/local/nagios/etc/commands.cfg
define command{
        command_name    load_average
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c load_average
        }
Add the Service for the client (RemoteBox) in the client's configuration file.

vi /usr/local/nagios/libexec/services/RemoteBox.cfg
define host {
        use                             linux server
        host_name                       RemoteBoxBox
        alias                           RemoteBox
        address                         172.22.73.15
    .
    .
    .
    .
define service {
        use                             generic-service
        host_name                       RemoteBox
        service_description             Load Average on CPU
        check_command                   load_average
        }
Start Nagios Service.

/etc/init.d/nagios restart
# OR
service nagios restart

On Nagios Client:

Edit Nagios client configuration file and add the following lines.

vi /usr/local/nagios/etc/nrpe.cfg

    .
    .
    .
command[load_average]=/usr/local/nagios/libexec/load_average
Now, from Nagios server, test if the check is working fine.

nagiosserver root [root]> /usr/local/nagios/libexec/check_nrpe -H 172.22.73.15 -c load_average
OK- Load Average: 0.78, 0.76, 0.78 | Load_1min=0.78, Load_5min= 0.76, Load_15min= 0.78

16 comments:

  1. Thank You, Kyiv, Ukraine

    ReplyDelete
  2. Nice but last control should be "or" not "and"

    ReplyDelete
    Replies
    1. Thanks for spotting. That will be more correct.

      Delete
  3. There should be space here "exit 1elif " >> exit 1 elif

    echo "WARNING- $output"
    exit 1elif [ $load1int -gt 2 -a $load5int -gt 2 -a $load15int -gt 2 ]
    then

    ReplyDelete
  4. The script is developed on Nagios Server; later is that copied over to Remote machine? Does the script need to be present on both Nagios server and client?

    ReplyDelete
  5. Very useful.. i have worked some scripts using this as scartch

    ReplyDelete
  6. cool post sir very usefull!!!!!!!!!

    ReplyDelete
  7. Hi,

    Thanks, this was very much useful.

    can we send email notifications a well for critical alerts with our custom scripts?

    ReplyDelete
    Replies
    1. Sure we can, this is a bash script after all. But imho this does not make sense. Nagios itself already can send email alerts.

      Delete
  8. It will be great to have a plugin to check the uptime of all services in percentage terms; may be a reporting tool for nagios but outputs uptime of all services being checked in percentage terms.

    ReplyDelete
  9. Nagios Server:
    /etc/nagios/objects/command.cfg
    define command{
    command_name check_loadavg
    command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_loadavg
    }
    /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.11 -c check_loadavg
    NRPE: Command 'check_loadavg' not defined
    cat /usr/lib64/nagios/plugins/check_loadavg
    #!/bin/bash

    loadavg=$( uptime | awk -F: '{print $5}' | xargs )

    load1int=$( echo $loadavg | cut -d "." -f 1 )
    load5int=$( echo $loadavg | awk -F, '{print $2}' | xargs | cut -d "." -f 1 )
    load15int=$( echo $loadavg | awk -F, '{print $3}' | xargs | cut -d "." -f 1 )

    load1=$( echo $loadavg | awk -F, '{print $1}' )
    load5=$( echo $loadavg | awk -F, '{print $2}' )
    load15=$( echo $loadavg | awk -F, '{print $3}' )

    output="Load Average: $loadavg | Load_1min=$load1, Load_5min=$load5, Load_15min=$load15"

    if [ $load1int -le 1 -a $load5int -le 1 -a $load15int -le 1 ]
    then
    echo "OK- $output"
    exit 0
    elif [ $load1int -le 2 -a $load5int -le 2 -a $load15int -le 2 ]
    then
    echo "WARNING- $output"
    exit 1
    elif [ $load1int -gt 2 -a $load5int -gt 2 -a $load15int -gt 2 ]
    then
    echo "CRITICAL- $output"
    exit 2
    else
    echo "UNKNOWN- $output"
    exit 3
    fi
    If I execute one of the in-built nagios scripts I have to provide parameters
    /usr/lib64/nagios/plugins/check_load
    check_load: Could not parse arguments
    Usage:
    check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15

    For check_loadavg, I don't have to provide any parameters.

    /usr/lib64/nagios/plugins/check_loadavg
    OK- Load Average: 0.41, 0.47, 0.54 | Load_1min=0.41, Load_5min= 0.47, Load_15min= 0.54

    Permissions on files.

    ls -l /usr/lib64/nagios/plugins/check_loadavg
    -rwxr-xr-x. 1 root root 861 Apr 15 00:40 /usr/lib64/nagios/plugins/check_loadavg

    ls -l /usr/lib64/nagios/plugins/check_load
    -rwxr-xr-x. 1 root root 49312 Sep 12 2015 /usr/lib64/nagios/plugins/check_load


    At client machine, where nrpe is running:
    File:
    /etc/nagios/nrpe.cfg
    command[check_loadavg]=/usr/lib64/nagios/plugins/check_loadavg
    /usr/lib64/nagios/plugins/check_loadavg
    #!/bin/bash

    loadavg=$( uptime | awk -F: '{print $5}' | xargs )

    load1int=$( echo $loadavg | cut -d "." -f 1 )
    load5int=$( echo $loadavg | awk -F, '{print $2}' | xargs | cut -d "." -f 1 )
    load15int=$( echo $loadavg | awk -F, '{print $3}' | xargs | cut -d "." -f 1 )

    load1=$( echo $loadavg | awk -F, '{print $1}' )
    load5=$( echo $loadavg | awk -F, '{print $2}' )
    load15=$( echo $loadavg | awk -F, '{print $3}' )

    output="Load Average: $loadavg | Load_1min=$load1, Load_5min=$load5, Load_15min=$load15"

    if [ $load1int -le 1 -a $load5int -le 1 -a $load15int -le 1 ]
    then
    echo "OK- $output"
    exit 0
    elif [ $load1int -le 2 -a $load5int -le 2 -a $load15int -le 2 ]
    then
    echo "WARNING- $output"
    exit 1
    elif [ $load1int -gt 2 -a $load5int -gt 2 -a $load15int -gt 2 ]
    then
    echo "CRITICAL- $output"
    exit 2
    else
    echo "UNKNOWN- $output"
    exit 3
    fi

    ReplyDelete
  10. "Now, from Nagios server, test if the check is working fine.

    nagiosserver root [root]> /usr/local/nagios/libexec/check_nrpe -H 172.22.73.15 -c load_average
    OK- Load Average: 0.78, 0.76, 0.78 | Load_1min=0.78, Load_5min= 0.76, Load_15min= 0.78"

    This command works, however, the output is not displayed in the web interface. Could anyone know why?

    ReplyDelete
  11. hi all ,

    i am trying to get the heap value of tomcat for this i written small code and executed in remote server its working fine, but when i am executing the script from nagios server the details are wrong. can anybody help the below is my script.

    #!/bin/bash
    process=`ps -ef | grep java | grep -v grep | cut -f6 -d ' '| head -n 1`
    max=`jmap -heap $process | grep MaxHeapSize | awk '{printf $4}' | tr -s "." " " | awk '{print $1}' | tr -s "(" " " | awk '{print $1}'`
    if [ "$max" -gt 2000 ]; then

    echo "CRITICAL: $max"
    exit 2
    else
    echo "OK: $max"
    exit 0
    fi


    when i am executing from server its quite opposite like if remote server gives ok server is giving critical. please help.

    ReplyDelete
  12. Hello, Is there any way to create a Nagios custom plugin without using the exist codes?

    Kindly advice.

    ReplyDelete