nagios - nrpe

最後更新: 2018-05-29

介紹

 

 

 

 


Active 與 Passive checks

 

工作圖:

Active checks

initiated by the Nagios daemon (polling)

run on a regularly scheduled basis (check_interval and retry_interval options)

      Nagios ---> check_nrpe --> network --> nrpe --> check_??? ---> System_Tools
(Monitering Host)                        (Remote Host)

Passive checks

initiated and performed by external applications/processes

results are submitted to Nagios for processing

       Nagios <-- NSCA <-- network <-- send_nsca <--- Program
(Monitering Host)                    (Remote Host)

NRPE 可以 check 到以下資源:

  • CPU load
  • Memory usage
  • Disk usage
  • Logged in users
  • Running processes

 

Server Side Install & Login

 

Debian6:

apt-get install nagios3

就差不多完成, 符上所須設定檔

Panel:

http://IP/nagios3

用戶檔:

/etc/nagios3/htpasswd.users

nagiosadmin:????????

 


Server(負責 mon 人)

 

安裝:

Debian: apt-get install nagios-nrpe-plugin

Centos(epel): yum install nagios-plugins-nrpe

Testing:

Linux: /usr/lib/nagios/plugins/check_nrpe -H 192.168.123.41 -c check_load

Win: /usr/lib/nagios/plugins/check_nrpe -H 192.168.3.32 -c check_uptime

-H <對方 IP>

-c <對方支持的 cmd>

Error:

"CHECK_NRPE: Error - Could not complete SSL handshake."

-n  到 check_nrpe 去 check 對方就可以, 這是因為對方沒有用 SSL                <--   Do no use SSL

 


nagios check linux client setting

 

/etc/nagios/conf.d/nrpe.cfg

# 'check_nrpe' command definition
define command {
        command_name    check_nrpe
        command_line    /usr/lib/nagios/plugins/check_nrpe -H '$HOSTADDRESS$' -c $ARG1$
}

在 /etc/nagios3/nagios.cfg 定義 montior 一 Server

cfg_file=/etc/nagios3/objects/myserver_nrpe.cfg

define host     {
        use                             linux-server
        host_name                       myserver
        alias                           myserver
        address                         202.181.196.246
}
define service{
        use                             service-http
        host_name                       myserver
}
  • check_load
  • check_users
  • check_disk
  • check_procs
  • check_zombie_procs
  • check_total_procs
# Check Load
define service{
        use                     generic-service
        host_name               yourserver
        service_description     Check Load
        check_command           check_nrpe!check_load!
        }

 


Check Window

 

主要程式: check_nt

check_nt --help
check_nt -H host -v variable [-s YOUR_PW] [-p port] [-w warning] [-c critical] [-l params] [-d SHOWALL] [-u] [-t timeout]

variable:

  • CLIENTVERSION
  • CPULOAD
  • UPTIME
  • USEDDISKSPACE
  • MEMUSE
  • SERVICESTATE
  • PROCSTATE
  • COUNTER             <-- Check any performance counter
  • INSTANCES          <-- Windows Perfmon Counter object

 

首先看有無定義 check cmd

# 'check_nt' command definition
define command {
        command_name    check_nt
        command_line    /usr/lib/nagios/plugins/check_nt -H '$HOSTADDRESS$' -p 12489 -s <YOUR-PW>  -v $ARG1$ $ARG2$
}

Example object configure:

define service{
        use                     generic-service
        host_name               Server
        service_description     Uptime
        check_command           check_nt!UPTIME
        }

normal import setting:

define service{
        use                             local-service
        host_name                       Your_Server
        service_description             Loading
        check_command                   check_nrpe_nossl!check_load
}
define service{
        use                             local-service
        host_name                       Your_Server
        service_description             Total Procs
        check_command                   check_nrpe_nossl!check_total_procs
}
define service{
        use                             local-service
        host_name                       Your_Server
        service_description             Zombie Procs
        check_command                   check_nrpe_nossl!check_zombie_procs
}

其他有用的 check

check_nt!CPULOAD!-l 5,80,90                                     <--     -l <minutes range>,<warning threshold>,<critical threshold>

check_nt!MEMUSE!-w 80 -c 90

check_nt!USEDDISKSPACE!-l c -w 80 -c 90

check_nt!SERVICESTATE!-d SHOWALL -l Apache2,MySQL,"FileZilla Server",MSSQLSERVER

check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Paging File(_Total)\\%% Usage","Paging file usage is %.2f %%" -w 80 -c 90
check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Process(_Total)\\Thread Count","Thread Count: %.f" -w 600 -c 800
check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Server\\Server Sessions","Server Sessions: %.f" -w 20 -c 30//  -l "\\<performance object>\\counter","<description>

 



NRPE Linux Client(被 monitor 的機)

 

# U16

apt-get install nagios-nrpe-server

apt-get install nagios-plugins           <-- 提供 Plugin. i.e. /usr/lib/nagios/plugins/check_load

# C6

yum install nagios-nrpe nagios-plugins

# C7

# Nrpe is a system daemon that will execute various Nagios plugins locally

# on behalf of a remote (monitoring) host that uses the check_nrpe plugin.

yum install nrpe

# For local testing

# /usr/lib64/nagios/plugins/check_nrpe

nagios-plugins-nrpe nagios-common

# plugin

nagios-plugins nagios-plugins-disk nagios-plugins-load

# 設定

設定檔

  • /etc/nagios/nrpe.cfg
  • /etc/nagios/nrpe_local.cfg

在 nrpe.cfg 要有

include=/etc/nagios/nrpe_local.cfg

在 nrpe_local.cfg 加入 allow 那 IP

# allowed_hosts=IP1,IP2        注意 "," 之間是沒有空格的

allowed_hosts=192.168.123.13,127.0.0.1

及可以 check 什麼

command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk_sys]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
command[check_disk_bak]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sdb1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 1900 -c 2000

記得 restart Service:

# U16

/etc/init.d/nagios-nrpe-server restart

# C7

systemctl enable nrpe

service nrpe start

在 log ( log_facility=daemon ) 看到:

===================================
Mar  8 17:31:39 debian1 nrpe[1532]: Allowing connections from: 127.0.0.1, 192.168.123.13
Mar  8 17:31:43 debian1 nrpe[1535]: Host 192.168.123.13 is not allowed to talk to us!
===================================

測試 Port 有沒有反應:

netstat -nlp | grep 5666

tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN      8156/nrpe

測試 command 是否有效:

command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

測試 nrpe 是否有效:

/usr/lib/nagios/plugins/check_nrpe -H localhost -c check_load

# C7

/usr/lib64/nagios/plugins/check_nrpe

Firewall:

tcp port: 5666

測試:

tail -f /var/log/daemon.log

/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20

在 /etc/sudoers.d/ 建立檔案 nagios, 內容如下:

nagios          ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/

nrpe.cfg 定義什麼可以 check:

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_root]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
command[check_vbox]=/usr/lib/nagios/plugins/check_procs -c 1:1 -C vboxwebsrv

不啟用 SSL:

/etc/default/nagios-nrpe-server

DAEMON_OPTS="--no-ssl"
NICENESS=5

nrpe [-n] -c <config_file> <mode>

Options:

 -n             Do not use SSL

<mode>
 -i              Run as a service under inetd or xinetd
-d              Run as a standalone daemon

 



Plugin Setting

 

Plugin - check_tcp

 

check_tcp -H host -p port [-w <warning time>] [-c <critical time>]
[-s <send string>]
[-e <expect string>] [-q <quit string>][-m <maximum bytes>] [-d <delay>]
[-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j]
[-D <days to cert expiry>] [-S <use SSL>] [-E]

測試:

check_tcp -H 172.16.2.11 -p 3306

TCP OK - 0.000 second response time on port 80|time=0.000185s;;;0.000000;10.000000

設定

define command{
        command_name    check_tcp
        command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
        }

define service{
        use                             generic-service
        host_name                       host_name
        service_description             server ras
        check_command                   check_tcp!3391
}

 


Plugin - check_mysql

 

-h, --help
-V, --version

-u, --username=STRING
-p, --password=STRING
-d, --database=STRING
-H, --hostname=ADDRESS

Example

./check_mysql -H 127.0.0.1 -u test -p YOUR_PW -d test

 


Plugin - check_disk

 

Options:

-p, --path=PATH, --partition=PARTITION
    Path or partition (may be repeated)

Examples: check rootfs with 5% and 10%

check_disk -w 10% -c 5% -p /

 


Plugin - check_load

 

-w, --warning=WLOAD1,WLOAD5,WLOAD15      # Exit with WARNING status if load average exceeds WLOADn

-c,

i.e.

command[check_load]=/usr/lib/nagios/plugins/check_load -w 16,12,6 -c 30,25,20

 



NRPE 與  nsclient Win32 Client Side

 

在 nagios 上有兩個 client 可供選擇, 分別是 NSClient++ 和 NC_Net

由於 NC_Net 在 Window 須要安裝 .NetFramework 3.5 才能使用~

所以最後我選擇了 NSClient++

Port: TCP / 12489

Download:

https://www.nsclient.org/

安裝:

NSCA Client 可以不裝, 因為它是在 "active mode" 才用

這裡的 active mode 是指 Client 主動聯絡 Server, 向 Server 匯報現時的狀態.

NRPE server 亦可以不裝, 因為我們不用行 external script. 所以, 有 check_nt 就可.

 

設定:

必須修改 NSC.ini 才能成功 (Default 的 Next 安裝一定 crach)[0.3.9-Win32]

version 0.4.2 的設定檔是 'nsclient.ini'

C:\Program Files\NSClient++>nscp.exe

Usage:

-version, -about, -install, -uninstall, -start, -stop, -encrypt

Checking:

在 Linux 上行

/usr/lib/nagios/plugins/check_nt -H <IP> -p 12489 -s <PW> -v UPTIME

Result:

System Uptime - 99 day(s) 23 hour(s) 59 minute(s)

測試 start:

The best way to diagnose and find errors with your configuration and setup. 

d vice\logger_impl.cpp:420  Creating logger: console
d rvice\NSClient++.cpp:386  NSClient++ 0,4,1,105 2014-04-28 x64 Loading settings and logger...
d ngs_manager_impl.cpp:162  Boot.ini found in: C:/Program Files/NSClient++//boot.ini
d ngs_manager_impl.cpp:178  Boot order: ini://${exe-path}/nsclient.ini
d ngs_manager_impl.cpp:181  Activating: ini://${exe-path}/nsclient.ini
d ngs_manager_impl.cpp:73   Creating instance for: ini://${exe-path}/nsclient.ini
d mpl/settings_ini.hpp:303  Reading INI settings from: C:/Program Files/NSClient++//nsclient.ini
d mpl/settings_ini.hpp:253  Loading: C:/Program Files/NSClient++//nsclient.ini
d rvice\NSClient++.cpp:397  NSClient++ 0,4,1,105 2014-04-28 x64 booting...
d rvice\NSClient++.cpp:398  Booted settings subsystem...
d rvice\NSClient++.cpp:465  On crash: restart: NSClientpp
d rvice\NSClient++.cpp:477  Archiving crash dumps in: C:/Program Files/NSClient++//crash-dumps
d rvice\NSClient++.cpp:544  booting::loading plugins
d rvice\NSClient++.cpp:306  Found: CheckDisk
d rvice\NSClient++.cpp:306  Found: CheckEventLog
d rvice\NSClient++.cpp:306  Found: CheckExternalScripts
d rvice\NSClient++.cpp:306  Found: CheckHelpers
d rvice\NSClient++.cpp:306  Found: CheckNSCP
d rvice\NSClient++.cpp:306  Found: CheckSystem
d rvice\NSClient++.cpp:306  Found: NRPEServer
d rvice\NSClient++.cpp:306  Found: NSClientServer
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckDisk.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckEventLog.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckExternalScripts.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckHelpers.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckNSCP.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/CheckSystem.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/NRPEServer.dll as )
d rvice\NSClient++.cpp:867  addPlugin(C:/Program Files/NSClient++//modules/NSClientServer.dll as )
d rvice\NSClient++.cpp:844  Loading plugin: CheckDisk
d rvice\NSClient++.cpp:844  Loading plugin: Event log Checker.
d rvice\NSClient++.cpp:844  Loading plugin: Check External Scripts
d kExternalScripts.cpp:99   No wrappings found (adding default: vbs, ps1 and bat)
d rvice\NSClient++.cpp:844  Loading plugin: Helper function
d rvice\NSClient++.cpp:844  Loading plugin: Check NSCP
d rvice\NSClient++.cpp:844  Loading plugin: CheckSystem
d stem\CheckSystem.cpp:158  Found alternate key for uptime: \2\674
d stem\CheckSystem.cpp:169  Found alternate key for memory commit limit: \4\30
d stem\CheckSystem.cpp:180  Found alternate key for memory commit bytes: \4\26
d stem\CheckSystem.cpp:191  Found alternate key for cpu: \238(_total)\6
d rvice\NSClient++.cpp:844  Loading plugin: NRPE server
d tem\PDHCollector.cpp:94   Loading counter: cpu = \238(_total)\6
d erver\NRPEServer.cpp:133  Allowed hosts definition: 127.0.0.1(255.255.255.255)

d tem\PDHCollector.cpp:94   Loading counter: memory commit bytes = \4\26
d tem\PDHCollector.cpp:94   Loading counter: memory commit limit = \4\30
d tem\PDHCollector.cpp:94   Loading counter: uptime = \2\674
d de\socket/server.hpp:126  Binding to: [::]:5666(ipv6)
d de\socket/server.hpp:162  Attempting to bind to: :5666
d de\socket/server.hpp:121  Binding to: 0.0.0.0:5666(ipv4)
d de\socket/server.hpp:162  Attempting to bind to: :5666
d rvice\NSClient++.cpp:844  Loading plugin: NSClient server
d r\NSClientServer.cpp:139  Allowed hosts definition: 127.0.0.1(255.255.255.255)

d de\socket/server.hpp:126  Binding to: [::]:12489(ipv6)
d de\socket/server.hpp:162  Attempting to bind to: :12489
d de\socket/server.hpp:121  Binding to: 0.0.0.0:12489(ipv4)
d de\socket/server.hpp:162  Attempting to bind to: :12489
d rvice\NSClient++.cpp:616  NSClient++ - 0,4,1,105 2014-04-28 Started!
l ce\simple_client.hpp:32   Enter command to inject or exit to terminate...

當有 cmd 來時

d rvice\NSClient++.cpp:960  Injecting: checkuptime...
d rvice\NSClient++.cpp:985  Result checkuptime: OK

# 看所有 Setting 

nscp settings --list

# 測試

nrpe   (same as nscp client --module NRPEClient)
      Use a NRPE client to request information from other systems via NRPE similar to standard NRPE check_nrpe command.

  -H [ --host ] arg
  -P [ --port ] arg
  -n [ --no-ssl ]
  -q [ --query ]

Command: query:

  -c [ --command ] arg   The name of the query that the remote daemon should run
  -a [ --arguments ] arg list of arguments
  --query-command arg    The name of the query that the remote daemon should run
  --query-arguments arg  list of arguments

設定

[/modules]
...
CheckNSCP = 1
CheckSystem = 1
CheckDisk = 1
CheckExternalScripts = 0

# listens for incoming NRPE connection (port 5666)
NRPEServer = 0
# This is also only supported through NRPE.
CheckHelpers = 1
CheckEventLog = 1

# listens for incoming NSClient (check_nt) connection. (port 12489)
NSClientServer = 1
...

[/settings/default]
# coma separated list (可以用 '/' 及 '*')
allowed hosts=10.0.0.2

; A list of aliases available. An alias is an internal command that has been "wrapped"
[/settings/external scripts/alias]
................

resetart service

net stop nscp
net start nscp

server 設定

define service{
    use            generic-service
    host_name        windowshost
    service_description    CPU Load
    check_command        check_nrpe!alias_cpu
}

補充:

Every five seconds, NSClient query Windows to get the CPU load and store this information in a circular buffer which keeps the measures for the last 24 hours. It also collects the uptime, memory and disk utilization metrics every 5 seconds and stores them in global variables. When requested, the client returns these results from the global variables.

 


NSClient Setting

 

Check Version

C:\Program Files\NSClient++>nscp.exe --version

NSClient++, Version: 0.4.4.15 2015-11-25, Platform: x64

Find ini file location

C:\Program Files\NSClient++>nscp.exe settings --show

  INI settings: (ini://${exe-path}/nsclient.ini, C:\Program Files\NSClient++/nsclient.ini)

Testing

Running command from server

cd /usr/lib/nagios/plugins

./check_nrpe -n -H 192.168.3.32 -c check_uptime

OK: uptime: 1w 1d 07:43h, boot: 2018-Jan-21 00:36:14 (UTC)|'uptime'=719035s;172800;86400

./check_nrpe -n -H 192.168.3.32 -c check_memory

CRITICAL: 
 committed = 1.788GB, physical = 3.835GB|
 'committed'=1.78766GB;6.39785;7.19758;0;7.99731 
 'committed %'=22%;79;89;0;100 
 'physical'=3.83546GB;3.19965;3.5996;0;3.99956 
 'physical %'=95%;79;89;0;100

./check_nrpe -n -H 192.168.3.32 -c check_drivesize "crit=free<10%"

WARNING 
C:\: 60.826GB/72.689GB used,
Y:\: 33.943GB/39.062GB used, 
Z:\: 34.804GB/39.842GB used|
'C:\ used'=60.82619GB;58.15155;65.4205;0;72.68944 
'C:\ used %'=83%;79;89;0;100 
'D:\ used'=0B;0;0;0;0 
'Y:\ used'=33.94297GB;31.24999;35.15624;0;39.06249 
'Y:\ used %'=86%;79;89;0;100 
'Z:\ used'=34.80368GB;31.87343;35.85761;0;39.84179 
'Z:\ used %'=87%;79;89;0;100

External Scripts

[/modules]
; Load the CheckExternalScripts module
CheckExternalScripts=enabled

; Adding a script
[/settings/external scripts/scripts]
foo=scripts\foo.bat

[/settings/external scripts/scripts]
foo=scripts\\foo.bat "argument 1" "argument 2"

alias

* alias is an internal command *

An alias is an internal command that has been predefined to provide a single command without arguments.

[/settings/external scripts/alias]
alias_disk = check_drivesize "warn=free<5%" "crit=free<2%"
alias_cpu = check_cpu "warn=load > 80" "crit=load > 95" "time=5m" "time=3m"

check_memory

Alerm Default:

warning     used > 80%
critical     used > 90%

Type: The types to check

physical = Physical memory (RAM)

committed = total memory (RAM+PAGE)

Setting:

alias_mem = check_memory "warn=free < 5%" "crit=free < 128M" type=physical

nsclient.ini

[/settings/default]
allowed hosts = 192.168.3.69

[/settings/NRPE/server]
verify mode = none
use ssl = 0
allow nasty characters = 0
allow arguments = 0

[/modules]
CheckHelpers = 1
CheckNSCP = 1
CheckDisk = 1
CheckSystem = 1
NRPEServer = 1
CheckExternalScripts = 1
CheckEventLog = 0


[/settings/external scripts/alias]
alias_disk = check_drivesize "warn=free<5%" "crit=free<2%" "drive=c:"
alias_cpu = check_cpu "warn=load > 80" "crit=load > 95" "time=5m" "time=3m"
alias_uptime = check_uptime "warn=uptime < 4h" "crit=uptime < 1h"
alias_mem = check_memory "warn=free < 5%" "crit=free < 128M" type=physical

CheckExternalScripts

CheckExternalScripts=enabled

; This option determines whether or not the we will allow clients to specify arguments to commands that are executed.
allow arguments = true

; This option determines whether or not the we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters = false

Unit:

  • w
  • d
  • h
  • s

 

Creative Commons license icon Creative Commons license icon