Introduction to Linux - A Hands on Guide | Linux Bible | Linux From Scratch | A Newbie's Getting Started Guide to Linux | Linux Command Line Cheat Sheet | More Linux eBooks



Wednesday, 17 September 2014

Nagios XI: A Bash Plugin to Monitor Load Average on Remote Linux Server

  • This is a plugin designed using Bash script.
  • This plugin is created so that 'Performance Gauges' in Nagios XI work properly, showing appropriate 'Warning' and 'Critical' regions.

See Also: How To : Create Nagios Plugin Using a BASH Script




For Performance Gauges to show details properly, performance data should be in following format:

'VarName'=CurrentValue Unit;WarningValue;CriticalValue;MininumValue;MaximumValue

Important:

1. It uses the output of 'uptime' command.
2. Warning and Critical thresholds are 80% and 90% of the total number of processor cores.
For Ex: If you are using 16 core processor, Warning threshold will be 80% of 16 i.e. '12.8' and Critical threshold will be 90% of 16 i.e. '14.4'
3. If the calculated value, may it be 1 min load, 5 min load or 15 min load, exceeds the thresholds, status will be set accordingly (OK, WARNING or CRITICAL).

Script:


#!/bin/bash

UPTIME=`uptime`
load1=`echo $UPTIME | awk -F, '{print $4}' | cut -d: -f 2 | xargs`
load5=`echo $UPTIME | awk -F, '{print $5}' | xargs`
load15=`echo $UPTIME | awk -F, '{print $6}' | xargs`

intload1=`echo "scale=1; $load1*100" | bc -l | cut -d "." -f 1`
intload5=`echo "scale=1; $load5*100" | bc -l | cut -d "." -f 1`
intload15=`echo "scale=1; $load15*100" | bc -l | cut -d "." -f 1`

Nprocs=`grep "processor" /proc/cpuinfo | wc -l`

wthreshold=80
cthreshold=90

warn=`scale=1; echo "$wthreshold * $Nprocs" | bc -l | cut -d "." -f 1`
crit=`scale=1; echo "$cthreshold * $Nprocs" | bc -l | cut -d "." -f 1`

actwarn=`echo "scale=1; $warn/100" | bc -l`
actcrit=`echo "scale=1; $crit/100" | bc -l`

if [ "$intload1" -le "$warn"  -a  "$intload5" -le "$warn"  -a  "$intload15" -le "$warn" ]
then
        STATUS="OK"
        EXIT="0"
elif [ "$intload1" -gt "$warn" -o "$intload5" -gt "$warn" -o "$intload15" -gt "$warn" ]
then
        if [ "$intload1" -gt "$crit" -o "$intload5" -gt "$crit" -o "$intload15" -gt "$crit" ]
        then
        STATUS="CRITICAL"
        EXIT="2"
        else
        STATUS="WARNING"
        EXIT="1"
        fi
else
STATUS="UNKNOWN"
EXIT="3"
fi

if [ $STATUS = "UNKNOWN" ]
then
echo "$STATUS- No data" && exit $EXIT
else
echo "$STATUS- Load Average: $load1, $load5, $load15 | load1=$load1;$actwarn;$actcrit;; load5=$load5;$actwarn;$actcrit;; load15=$load15;$actwarn;$actcrit;;"

How To Use:

To monitor remote Linux server:
1. Keep the plugin in /usr/local/nagios/libexec directory.
2. Add following line to the nrpe.cfg file:

command[check_loadaverage.sh]=sudo /usr/local/nagios/libexec/check_loadaverage.sh
3. Add the following line to /etc/sudoers file:

nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_loadaverage.sh

Usage:

On Monitoring Server:

./check_nrpe -H 128.9.45.13 -c check_loadavg.sh
Output:

OK- Load Average: 0.09, 0.05, 0.01 | load1=0.09;6.4;7.2;; load5=0.05;6.4;7.2;; load15=0.01;6.4;7.2;;

1 comment: