最後更新: 2018-05-29
介紹
Active 與 Passive checks
工作圖:
Active checks
initiated by the Nagios daemon (polling)
run on a regularly scheduled basis (check_interval and retry_interval options)
Nagios ---> check_nrpe --> network --> nrpe --> check_??? ---> System_Tools (Monitering Host) (Remote Host)
Passive checks
initiated and performed by external applications/processes
results are submitted to Nagios for processing
Nagios <-- NSCA <-- network <-- send_nsca <--- Program (Monitering Host) (Remote Host)
NRPE 可以 check 到以下資源:
- CPU load
- Memory usage
- Disk usage
- Logged in users
- Running processes
Server Side Install & Login
Debian6:
apt-get install nagios3
就差不多完成, 符上所須設定檔
Panel:
http://IP/nagios3
用戶檔:
/etc/nagios3/htpasswd.users
nagiosadmin:????????
Server(負責 mon 人)
安裝:
Debian: apt-get install nagios-nrpe-plugin
Centos(epel): yum install nagios-plugins-nrpe
Testing:
Linux: /usr/lib/nagios/plugins/check_nrpe -H 192.168.123.41 -c check_load
Win: /usr/lib/nagios/plugins/check_nrpe -H 192.168.3.32 -c check_uptime
-H <對方 IP>
-c <對方支持的 cmd>
Error:
"CHECK_NRPE: Error - Could not complete SSL handshake."
加 -n 到 check_nrpe 去 check 對方就可以, 這是因為對方沒有用 SSL <-- Do no use SSL
nagios check linux client setting
/etc/nagios/conf.d/nrpe.cfg
# 'check_nrpe' command definition define command { command_name check_nrpe command_line /usr/lib/nagios/plugins/check_nrpe -H '$HOSTADDRESS$' -c $ARG1$ }
在 /etc/nagios3/nagios.cfg 定義 montior 一 Server
cfg_file=/etc/nagios3/objects/myserver_nrpe.cfg
define host { use linux-server host_name myserver alias myserver address 202.181.196.246 } define service{ use service-http host_name myserver }
- check_load
- check_users
- check_disk
- check_procs
- check_zombie_procs
- check_total_procs
# Check Load define service{ use generic-service host_name yourserver service_description Check Load check_command check_nrpe!check_load! }
Check Window
主要程式: check_nt
check_nt --help
check_nt -H host -v variable [-s YOUR_PW] [-p port] [-w warning] [-c critical] [-l params] [-d SHOWALL] [-u] [-t timeout]
variable:
- CLIENTVERSION
- CPULOAD
- UPTIME
- USEDDISKSPACE
- MEMUSE
- SERVICESTATE
- PROCSTATE
- COUNTER <-- Check any performance counter
- INSTANCES <-- Windows Perfmon Counter object
首先看有無定義 check cmd
# 'check_nt' command definition define command { command_name check_nt command_line /usr/lib/nagios/plugins/check_nt -H '$HOSTADDRESS$' -p 12489 -s <YOUR-PW> -v $ARG1$ $ARG2$ }
Example object configure:
define service{ use generic-service host_name Server service_description Uptime check_command check_nt!UPTIME }
normal import setting:
define service{ use local-service host_name Your_Server service_description Loading check_command check_nrpe_nossl!check_load } define service{ use local-service host_name Your_Server service_description Total Procs check_command check_nrpe_nossl!check_total_procs } define service{ use local-service host_name Your_Server service_description Zombie Procs check_command check_nrpe_nossl!check_zombie_procs }
其他有用的 check
check_nt!CPULOAD!-l 5,80,90 <-- -l <minutes range>,<warning threshold>,<critical threshold>
check_nt!MEMUSE!-w 80 -c 90
check_nt!USEDDISKSPACE!-l c -w 80 -c 90
check_nt!SERVICESTATE!-d SHOWALL -l Apache2,MySQL,"FileZilla Server",MSSQLSERVER
check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Paging File(_Total)\\%% Usage","Paging file usage is %.2f %%" -w 80 -c 90 check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Process(_Total)\\Thread Count","Thread Count: %.f" -w 600 -c 800 check_nt -H 192.168.1.1 -p 1248 -v COUNTER -l "\\Server\\Server Sessions","Server Sessions: %.f" -w 20 -c 30// -l "\\<performance object>\\counter","<description>
NRPE Linux Client(被 monitor 的機)
# U16
apt-get install nagios-nrpe-server
apt-get install nagios-plugins <-- 提供 Plugin. i.e. /usr/lib/nagios/plugins/check_load
# C6
yum install nagios-nrpe nagios-plugins
# C7
# Nrpe is a system daemon that will execute various Nagios plugins locally
# on behalf of a remote (monitoring) host that uses the check_nrpe plugin.
yum install nrpe
# For local testing
# /usr/lib64/nagios/plugins/check_nrpe
nagios-plugins-nrpe nagios-common
# plugin
nagios-plugins nagios-plugins-disk nagios-plugins-load
# 設定
設定檔
- /etc/nagios/nrpe.cfg
- /etc/nagios/nrpe_local.cfg
在 nrpe.cfg 要有
include=/etc/nagios/nrpe_local.cfg
在 nrpe_local.cfg 加入 allow 那 IP
# allowed_hosts=IP1,IP2 注意 "," 之間是沒有空格的
allowed_hosts=192.168.123.13,127.0.0.1
及可以 check 什麼
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_disk_sys]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p / command[check_disk_bak]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sdb1 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 1900 -c 2000
記得 restart Service:
# U16
/etc/init.d/nagios-nrpe-server restart
# C7
systemctl enable nrpe
service nrpe start
在 log ( log_facility=daemon ) 看到:
=================================== Mar 8 17:31:39 debian1 nrpe[1532]: Allowing connections from: 127.0.0.1, 192.168.123.13 Mar 8 17:31:43 debian1 nrpe[1535]: Host 192.168.123.13 is not allowed to talk to us! ===================================
測試 Port 有沒有反應:
netstat -nlp | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 8156/nrpe
測試 command 是否有效:
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
測試 nrpe 是否有效:
/usr/lib/nagios/plugins/check_nrpe -H localhost -c check_load
# C7
/usr/lib64/nagios/plugins/check_nrpe
Firewall:
tcp port: 5666
測試:
tail -f /var/log/daemon.log
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
在 /etc/sudoers.d/ 建立檔案 nagios, 內容如下:
nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/
nrpe.cfg 定義什麼可以 check:
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_root]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 command[check_vbox]=/usr/lib/nagios/plugins/check_procs -c 1:1 -C vboxwebsrv
不啟用 SSL:
/etc/default/nagios-nrpe-server
DAEMON_OPTS="--no-ssl"
NICENESS=5
nrpe [-n] -c <config_file> <mode>
Options:
-n Do not use SSL
<mode>
-i Run as a service under inetd or xinetd
-d Run as a standalone daemon
Plugin Setting
Plugin - check_tcp
check_tcp -H host -p port [-w <warning time>] [-c <critical time>]
[-s <send string>]
[-e <expect string>] [-q <quit string>][-m <maximum bytes>] [-d <delay>]
[-t <timeout seconds>] [-r <refuse state>] [-M <mismatch state>] [-v] [-4|-6] [-j]
[-D <days to cert expiry>] [-S <use SSL>] [-E]
測試:
check_tcp -H 172.16.2.11 -p 3306
TCP OK - 0.000 second response time on port 80|time=0.000185s;;;0.000000;10.000000
設定
define command{ command_name check_tcp command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ } define service{ use generic-service host_name host_name service_description server ras check_command check_tcp!3391 }
Plugin - check_mysql
-h, --help
-V, --version
-u, --username=STRING
-p, --password=STRING
-d, --database=STRING
-H, --hostname=ADDRESS
Example
./check_mysql -H 127.0.0.1 -u test -p YOUR_PW -d test
Plugin - check_disk
Options:
-p, --path=PATH, --partition=PARTITION
Path or partition (may be repeated)
Examples: check rootfs with 5% and 10%
check_disk -w 10% -c 5% -p /
Plugin - check_load
-w, --warning=WLOAD1,WLOAD5,WLOAD15 # Exit with WARNING status if load average exceeds WLOADn
-c,
i.e.
command[check_load]=/usr/lib/nagios/plugins/check_load -w 16,12,6 -c 30,25,20
NRPE 與 nsclient Win32 Client Side
在 nagios 上有兩個 client 可供選擇, 分別是 NSClient++ 和 NC_Net
由於 NC_Net 在 Window 須要安裝 .NetFramework 3.5 才能使用~
所以最後我選擇了 NSClient++
Port: TCP / 12489
Download:
https://www.nsclient.org/
安裝:
NSCA Client 可以不裝, 因為它是在 "active mode" 才用
這裡的 active mode 是指 Client 主動聯絡 Server, 向 Server 匯報現時的狀態.
NRPE server 亦可以不裝, 因為我們不用行 external script. 所以, 有 check_nt 就可.
設定:
必須修改 NSC.ini 才能成功 (Default 的 Next 安裝一定 crach)[0.3.9-Win32]
version 0.4.2 的設定檔是 'nsclient.ini'
C:\Program Files\NSClient++>nscp.exe
Usage:
-version, -about, -install, -uninstall, -start, -stop, -encrypt
Checking:
在 Linux 上行
/usr/lib/nagios/plugins/check_nt -H <IP> -p 12489 -s <PW> -v UPTIME
Result:
System Uptime - 99 day(s) 23 hour(s) 59 minute(s)
測試 start:
The best way to diagnose and find errors with your configuration and setup.
d vice\logger_impl.cpp:420 Creating logger: console d rvice\NSClient++.cpp:386 NSClient++ 0,4,1,105 2014-04-28 x64 Loading settings and logger... d ngs_manager_impl.cpp:162 Boot.ini found in: C:/Program Files/NSClient++//boot.ini d ngs_manager_impl.cpp:178 Boot order: ini://${exe-path}/nsclient.ini d ngs_manager_impl.cpp:181 Activating: ini://${exe-path}/nsclient.ini d ngs_manager_impl.cpp:73 Creating instance for: ini://${exe-path}/nsclient.ini d mpl/settings_ini.hpp:303 Reading INI settings from: C:/Program Files/NSClient++//nsclient.ini d mpl/settings_ini.hpp:253 Loading: C:/Program Files/NSClient++//nsclient.ini d rvice\NSClient++.cpp:397 NSClient++ 0,4,1,105 2014-04-28 x64 booting... d rvice\NSClient++.cpp:398 Booted settings subsystem... d rvice\NSClient++.cpp:465 On crash: restart: NSClientpp d rvice\NSClient++.cpp:477 Archiving crash dumps in: C:/Program Files/NSClient++//crash-dumps d rvice\NSClient++.cpp:544 booting::loading plugins d rvice\NSClient++.cpp:306 Found: CheckDisk d rvice\NSClient++.cpp:306 Found: CheckEventLog d rvice\NSClient++.cpp:306 Found: CheckExternalScripts d rvice\NSClient++.cpp:306 Found: CheckHelpers d rvice\NSClient++.cpp:306 Found: CheckNSCP d rvice\NSClient++.cpp:306 Found: CheckSystem d rvice\NSClient++.cpp:306 Found: NRPEServer d rvice\NSClient++.cpp:306 Found: NSClientServer d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckDisk.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckEventLog.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckExternalScripts.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckHelpers.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckNSCP.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/CheckSystem.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/NRPEServer.dll as ) d rvice\NSClient++.cpp:867 addPlugin(C:/Program Files/NSClient++//modules/NSClientServer.dll as ) d rvice\NSClient++.cpp:844 Loading plugin: CheckDisk d rvice\NSClient++.cpp:844 Loading plugin: Event log Checker. d rvice\NSClient++.cpp:844 Loading plugin: Check External Scripts d kExternalScripts.cpp:99 No wrappings found (adding default: vbs, ps1 and bat) d rvice\NSClient++.cpp:844 Loading plugin: Helper function d rvice\NSClient++.cpp:844 Loading plugin: Check NSCP d rvice\NSClient++.cpp:844 Loading plugin: CheckSystem d stem\CheckSystem.cpp:158 Found alternate key for uptime: \2\674 d stem\CheckSystem.cpp:169 Found alternate key for memory commit limit: \4\30 d stem\CheckSystem.cpp:180 Found alternate key for memory commit bytes: \4\26 d stem\CheckSystem.cpp:191 Found alternate key for cpu: \238(_total)\6 d rvice\NSClient++.cpp:844 Loading plugin: NRPE server d tem\PDHCollector.cpp:94 Loading counter: cpu = \238(_total)\6 d erver\NRPEServer.cpp:133 Allowed hosts definition: 127.0.0.1(255.255.255.255) d tem\PDHCollector.cpp:94 Loading counter: memory commit bytes = \4\26 d tem\PDHCollector.cpp:94 Loading counter: memory commit limit = \4\30 d tem\PDHCollector.cpp:94 Loading counter: uptime = \2\674 d de\socket/server.hpp:126 Binding to: [::]:5666(ipv6) d de\socket/server.hpp:162 Attempting to bind to: :5666 d de\socket/server.hpp:121 Binding to: 0.0.0.0:5666(ipv4) d de\socket/server.hpp:162 Attempting to bind to: :5666 d rvice\NSClient++.cpp:844 Loading plugin: NSClient server d r\NSClientServer.cpp:139 Allowed hosts definition: 127.0.0.1(255.255.255.255) d de\socket/server.hpp:126 Binding to: [::]:12489(ipv6) d de\socket/server.hpp:162 Attempting to bind to: :12489 d de\socket/server.hpp:121 Binding to: 0.0.0.0:12489(ipv4) d de\socket/server.hpp:162 Attempting to bind to: :12489 d rvice\NSClient++.cpp:616 NSClient++ - 0,4,1,105 2014-04-28 Started! l ce\simple_client.hpp:32 Enter command to inject or exit to terminate...
當有 cmd 來時
d rvice\NSClient++.cpp:960 Injecting: checkuptime... d rvice\NSClient++.cpp:985 Result checkuptime: OK
# 看所有 Setting
nscp settings --list
# 測試
nrpe (same as nscp client --module NRPEClient)
Use a NRPE client to request information from other systems via NRPE similar to standard NRPE check_nrpe command.
-H [ --host ] arg
-P [ --port ] arg
-n [ --no-ssl ]
-q [ --query ]
Command: query:
-c [ --command ] arg The name of the query that the remote daemon should run
-a [ --arguments ] arg list of arguments
--query-command arg The name of the query that the remote daemon should run
--query-arguments arg list of arguments
設定
[/modules] ... CheckNSCP = 1 CheckSystem = 1 CheckDisk = 1 CheckExternalScripts = 0 # listens for incoming NRPE connection (port 5666) NRPEServer = 0 # This is also only supported through NRPE. CheckHelpers = 1 CheckEventLog = 1 # listens for incoming NSClient (check_nt) connection. (port 12489) NSClientServer = 1 ... [/settings/default] # coma separated list (可以用 '/' 及 '*') allowed hosts=10.0.0.2 ; A list of aliases available. An alias is an internal command that has been "wrapped" [/settings/external scripts/alias] ................
resetart service
net stop nscp
net start nscp
server 設定
define service{ use generic-service host_name windowshost service_description CPU Load check_command check_nrpe!alias_cpu }
補充:
Every five seconds, NSClient query Windows to get the CPU load and store this information in a circular buffer which keeps the measures for the last 24 hours. It also collects the uptime, memory and disk utilization metrics every 5 seconds and stores them in global variables. When requested, the client returns these results from the global variables.
NSClient Setting
Check Version
C:\Program Files\NSClient++>nscp.exe --version
NSClient++, Version: 0.4.4.15 2015-11-25, Platform: x64
Find ini file location
C:\Program Files\NSClient++>nscp.exe settings --show
INI settings: (ini://${exe-path}/nsclient.ini, C:\Program Files\NSClient++/nsclient.ini)
Testing
Running command from server
cd /usr/lib/nagios/plugins
./check_nrpe -n -H 192.168.3.32 -c check_uptime
OK: uptime: 1w 1d 07:43h, boot: 2018-Jan-21 00:36:14 (UTC)|'uptime'=719035s;172800;86400
./check_nrpe -n -H 192.168.3.32 -c check_memory
CRITICAL:
committed = 1.788GB, physical = 3.835GB|
'committed'=1.78766GB;6.39785;7.19758;0;7.99731
'committed %'=22%;79;89;0;100
'physical'=3.83546GB;3.19965;3.5996;0;3.99956
'physical %'=95%;79;89;0;100
./check_nrpe -n -H 192.168.3.32 -c check_drivesize "crit=free<10%"
WARNING
C:\: 60.826GB/72.689GB used,
Y:\: 33.943GB/39.062GB used,
Z:\: 34.804GB/39.842GB used|
'C:\ used'=60.82619GB;58.15155;65.4205;0;72.68944
'C:\ used %'=83%;79;89;0;100
'D:\ used'=0B;0;0;0;0
'Y:\ used'=33.94297GB;31.24999;35.15624;0;39.06249
'Y:\ used %'=86%;79;89;0;100
'Z:\ used'=34.80368GB;31.87343;35.85761;0;39.84179
'Z:\ used %'=87%;79;89;0;100
External Scripts
[/modules] ; Load the CheckExternalScripts module CheckExternalScripts=enabled ; Adding a script [/settings/external scripts/scripts] foo=scripts\foo.bat [/settings/external scripts/scripts] foo=scripts\\foo.bat "argument 1" "argument 2"
alias
* alias is an internal command *
An alias is an internal command that has been predefined to provide a single command without arguments.
[/settings/external scripts/alias] alias_disk = check_drivesize "warn=free<5%" "crit=free<2%" alias_cpu = check_cpu "warn=load > 80" "crit=load > 95" "time=5m" "time=3m"
check_memory
Alerm Default:
warning used > 80%
critical used > 90%
Type: The types to check
physical = Physical memory (RAM)
committed = total memory (RAM+PAGE)
Setting:
alias_mem = check_memory "warn=free < 5%" "crit=free < 128M" type=physical
nsclient.ini
[/settings/default] allowed hosts = 192.168.3.69 [/settings/NRPE/server] verify mode = none use ssl = 0 allow nasty characters = 0 allow arguments = 0 [/modules] CheckHelpers = 1 CheckNSCP = 1 CheckDisk = 1 CheckSystem = 1 NRPEServer = 1 CheckExternalScripts = 1 CheckEventLog = 0 [/settings/external scripts/alias] alias_disk = check_drivesize "warn=free<5%" "crit=free<2%" "drive=c:" alias_cpu = check_cpu "warn=load > 80" "crit=load > 95" "time=5m" "time=3m" alias_uptime = check_uptime "warn=uptime < 4h" "crit=uptime < 1h" alias_mem = check_memory "warn=free < 5%" "crit=free < 128M" type=physical
CheckExternalScripts
CheckExternalScripts=enabled
; This option determines whether or not the we will allow clients to specify arguments to commands that are executed.
allow arguments = true
; This option determines whether or not the we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters = false
Unit:
- w
- d
- h
- s