最後更新: 2024-04-18
目錄
- Monitoring a Process
- Service Timeout
- Connection Test
- Protocol Test
- Check Services(Protocol)
Monitoring a Process
If the pid-file does not exist or does not contain the PID number of a running process,
Monit will call the entry's start method if defined.
check process apache with pidfile /usr/local/apache/logs/httpd.pid start program = "/etc/init.d/httpd start" # 比 60 秒 Apache Stop. Default 是 30 timeout stop program = "/etc/init.d/httpd stop" with timeout 60 seconds # Status Checking if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 200.0 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if failed host datahunter.org port 80 protocol http and request "/somefile.html" then restart if failed port 443 type tcpssl protocol http with timeout 15 seconds then restart if 3 restarts within 5 cycles then timeout depends on apache_bin group server
Start / Stop / Restart
<START | STOP | RESTART> [PROGRAM] = "program" [[AS] UID <number | string>] [[AS] GID <number | string>] [[WITH] TIMEOUT <number> SECOND(S)]
當沒用設定 restart 時, 那 monit 會行 stop 之後再行 start
By default the program is executed as the user under which Monit is running.
If Monit is running as root, you may optionally specify the UID and GID the executed program should switch to.
start program = "/usr/local/mmonit/bin/mmonit" as uid "mmonit" and gid "mmonit"
最簡單的 Stop
stop = "/bin/sh -c 'kill -9 `cat /var/run/process.pid`'"
Resource type:
IF resource operator value THEN action
resource
System only resource tests
- CPU (user|system|wait)
- SWAP (percent / Byte, kB, MB, GB)
Process only resource tests
- CPU is the CPU usage of the process itself
- TOTALCPU is the total CPU usage of the process and its children in (percent).
- CHILDREN is the number of child processes of the process.
System and process resource tests
- MEMORY is the memory usage of the system or of a process (without children)
- LOADAVG(1min|5min|15min)
operator: "<", ">", "!=", "=="
value: "%" "kB" (1024 Byte) "MB" (1024 KiloByte)
action: "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"
Service Timeout
* timeout mechanism is based on number of service restarts and number of poll-cycles
* Default timeout 5 seconds
check process apache with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" with timeout 60 seconds stop program = "/etc/init.d/httpd stop" with timeout 60 seconds if failed port 80 then restart if failed port 443 with timeout 15 seconds then restart
Connection Test
Port Test
IF <FAILED|SUCCEEDED> <UNIXSOCKET path> [TYPE <TCP|UDP>] [PROTOCOL protocol | <SEND|EXPECT> "string",...] [RESPONSETIME number <MILLISECONDS|SECONDS>] [TIMEOUT number SECONDS] [RETRY number] THEN action
Default value
- TIMEOUT 5
- RETRY 0 # fail on first error
RETRY
[2024-04-17T16:04:51+0800] warning : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused (attempt 1/3) [2024-04-17T16:04:51+0800] warning : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused (attempt 2/3) [2024-04-17T16:04:51+0800] error : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused
i.e.
[1]
if failed port 80 then alert
[2]
if failed port 80 for 3 cycles then alert
[3]
if failed port 80 for 3 times within 5 cycles then alert
PROTOCOL
# Web
- APACHE-STATUS
- FTP
- HTTP HTTPS
- WEBSOCKET
- SIEVE
- SMTP SMTPS
- IMAP IMAPS
- POP POPS
# DB
- MYSQL PGSQL MONGODB
- RADIUS MEMCACHE
# Admin
- SSH RSYNC FAIL2BAN
# Other
- DNS
Protocol Test
Usage:
IF FAILED [host] <port> [type] [protocol | {send/expect}+] [timeout] [retry] THEN action
Example
if failed port 80 protocol http then ... if failed port 25 protocol smtp then ... if failed port 53 type udp protocol dns then alert
Action:
"ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"
i.e.
if failed port 80 then alert
if failed port 53 type udp protocol dns then alert
Usage:
IF FAILED PING[4|6] [COUNT number] [SIZE number] [TIMEOUT number SECONDS] [ADDRESS string] THEN action
Parameter:
ADDRESS n.n.n.n | string
If a DNS host name was used in the "CHECK HOST" statement and
the host name resolve to several addresses (either IPv4 or IPv6),
Monit will ping the first available address and continue with the next address
until one connection succeed or until there are no more addresses left to try.
COUNT (Default: 3) <-- up to 66% packet loss is tolerated (其中一個成功就當成功)
How many consecutive echo requests will be send to the host in one cycle.
(一個接一個 send 的)
If you require 100% ping success, set the count to 1
TIMEOUT (Default: 5s)
no reply came within TIMEOUT frame, Monit reports error. (每一個 package 都使用這 timeout value)
Example:
# 每 3 秒 ping 10.3.3.2 一次, 總共 ping 5 次
CHECK HOST hk_china_vpn ADDRESS 10.3.3.2 IF FAILED PING COUNT 5 TIMEOUT 3 SECONDS 2 TIMES WITHIN 3 CYCLES THEN alert
send & expect:
SEND/EXPECT can be used with any socket type, such as TCP sockets, UNIX sockets and UDP sockets.
The SEND statement sends a string to the server port and
(You can use non-printable characters in a SEND string if needed. Use the hex notation, \0xHEXHEX)
the EXPECT statement compares a string read from the server (Default: 255 bytes)
(you can use regular expressions in the EXPECT string)
CHECK HOST localhost ADDRESS 127.0.0.1 if failed port 3333 TYPE TCP and send '{"id": 1,"jsonrpc": "2.0","method": "miner_getstat1"}\r\n' expect ".*result.*" with timeout 5 seconds for 2 cycles # 1 min then exec /root/scripts/eth/restart-ethminer.sh repeat every 6 cycles # 3 min
i.e.
# smtp
if failed port 25 expect "^220.*\r\n" send "HELO localhost.localdomain\r\n" expect "^250.*\r\n" send "QUIT\r\n" expect "^221.*\r\n" ...
Check Services(protocol)
Service
- apache-status
- http
- ssh
- ftp
- smtp
- pop3
- imap
- mysql
- samba
Check server performance by examination of the status page
Apache Settings: http://datahunter.org/apache_server-info
- "_" Waiting for Connection
- "K" Keepalive (read)
- "S" Starting up
- "L" Logging
Usage
PROTOCOL APACHE-STATUS [PATH <path>] [USERNAME <string>] [PASSWORD <string>] [<property> <operator> <number>]+
PATH # Default: "/server-status"
property:
- (1) logging (loglimit) # "L"
- (2) closing connections (closelimit) # "C"
- (3) performing DNS lookups (dnslimit) # "D"
- (4) in keepalive with a client (keepalivelimit) # "K"
- (5) replying to a client (replylimit) # "W"
- (6) receiving a request (requestlimit) # "R"
- (7) initialising (startlimit) # "S"
- (8) waiting for incoming connections (waitlimit) # "_"
- (9) gracefully closing down (gracefullimit) # "G"
- (10) performing cleanup procedures (cleanuplimit) # "I"
Operator is one of "<", "=", ">"
i.e.
if failed port 80 protocol apache-status replylimit > 60% or requestlimit > 60% or waitlimit < 10% then alert
# if 60% or more Apache child precesses are simultaneously writing to the logs.
# 90% host read
# 10% child is free
check process httpd with pidfile /var/run/httpd/httpd.pid start program = "/usr/bin/systemctl start httpd" stop program = "/usr/bin/systemctl stop httpd" # Resource if totalmem > 512 MB for 5 cycles then restart if cpu > 70% for 2 cycles then alert if cpu > 90% for 5 cycles then restart if loadavg(5min) greater than 10 for 8 cycles then stop # apache-status if failed host 127.0.0.1 port 80 protocol apache-status replylimit > 60% or keepalivelimit > 90% or waitlimit < 20% then alert
waitlimit 設定
/server-status
_______________________W_.........................______________ ___________..................................................... ................................................................ ..........................................................
# "_" = 49, "W" = 1, "." = 200
由於 "." 都計在 total 內, 設定 "waitlimit < 20%" 也會有機會出 Alert
Apache 設定:
MaxSpareThreads 75 MinSpareThreads 25 MaxRequestWorkers 250
所以 waitlimit 應該係跟 MinSpareThreads
MinSpareThreads/MaxRequestWorkers X 100% = 10
waitlimit < 10%
PROTO(COL) HTTP [USERNAME "string"] [PASSWORD "string"] [REQUEST "string"] [METHOD <GET|HEAD>] [STATUS operator number] [CHECKSUM checksum] [HTTP HEADERS list of headers] [CONTENT < "=" | "!=" > STRING]
i.e.
if failed host 192.168.1.100 port 8080 protocol http and request '/testing' hostheader 'datahunter.org' with timeout 5 seconds for 3 cycles then alert
check process sshd with pidfile /var/run/sshd.pid start program "/etc/init.d/ssh start" with timeout 30 seconds stop program "/etc/init.d/ssh stop" if failed port 22 protocol ssh then restart if 5 restarts within 5 cycles then timeout group core
ftp
check process proftpd with pidfile /var/run/proftpd.pid start program = "/etc/init.d/proftpd start" stop program = "/etc/init.d/proftpd stop" if failed port 21 protocol ftp then restart if 5 restarts within 5 cycles then timeout
smtp
check process postfix with pidfile /var/spool/postfix/pid/master.pid start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout
pop3
check process qpopper with pidfile /var/run/popper.pid group mail start program = "/etc/init.d/qpopper start" stop program = "/etc/init.d/qpopper stop" if 5 restarts within 5 cycles then timeout if failed port 110 type TCP protocol POP then restart
imap
check process dovecot with pidfile /var/run/dovecot/master.pid start program = "/etc/init.d/dovecot start" stop program = "/etc/init.d/dovecot stop" group mail if failed host mail.yourdomain.tld port 993 type tcpssl sslauto protocol imap for 5 cycles then restart if 3 restarts within 5 cycles then timeout
mysql
Perform connection test with login
1) 建立 User A/C
CREATE USER 'monit'@'127.0.0.1' IDENTIFIED BY 'mysecretpassword';
FLUSH PRIVILEGES;
Notes
- User: 'monit'@'localhost' # unixsocket
- User: 'monit'@'127.0.0.1' # port 3306
2) mysqld.monit # chmod 600 mysqld.monit
# RHEL 8 check process mysql with pidfile /run/mysqld/mysqld.pid start program = "/usr/bin/systemctl start mysqld" stop program = "/usr/bin/systemctl stop mysqld" if failed port 3306 protocol mysql username "monit" password "XXXX" then alert if 2 restarts within 6 cycles then timeout
samba
check process smbd with pidfile /opt/samba2.2/var/locks/smbd.pid group samba start program = "/etc/init.d/smbd start" stop program = "/etc/init.d/smbd stop" if failed host 192.168.1.1 port 139 type TCP then restart if 5 restarts within 5 cycles then timeout
# To have Monit start the server if it's not running, add a start statement:
check process nginx with pidfile /var/run/nginx.pid start program = "/etc/init.d/nginx start" stop program = "/etc/init.d/nginx stop"
# test the checksum for a document on a remote server.
check host datahunter.org with address datahunter.org if failed port 80 protocol http and request "/monit/dist/monit-5.7.tar.gz" with checksum ????????????????????????????????????? then alert