2. monit - Process & Service Check

最後更新: 2024-04-18

目錄

 


Monitoring a Process

 

If the pid-file does not exist or does not contain the PID number of a running process,
Monit will call the entry's start method if defined.

check process apache with pidfile /usr/local/apache/logs/httpd.pid
  start program = "/etc/init.d/httpd start"
  # 比 60 秒 Apache Stop. Default 是 30 timeout
  stop program  = "/etc/init.d/httpd stop" with timeout 60 seconds    
  # Status Checking
  if cpu > 60% for 2 cycles then alert
  if cpu > 80% for 5 cycles then restart
  if totalmem > 200.0 MB for 5 cycles then restart
  if children > 250 then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if failed host datahunter.org port 80 protocol http
     and request "/somefile.html"
     then restart
  if failed port 443 type tcpssl protocol http
     with timeout 15 seconds
     then restart
  if 3 restarts within 5 cycles then timeout
  depends on apache_bin
  group server

Start / Stop / Restart

<START | STOP | RESTART> [PROGRAM] = "program" [[AS] UID <number | string>] [[AS] GID <number | string>] [[WITH] TIMEOUT <number> SECOND(S)]

當沒用設定 restart 時, 那 monit 會行 stop 之後再行 start

By default the program is executed as the user under which Monit is running.

If Monit is running as root, you may optionally specify the UID and GID the executed program should switch to.

start program = "/usr/local/mmonit/bin/mmonit" as uid "mmonit" and gid "mmonit"

最簡單的 Stop

stop = "/bin/sh -c 'kill -9 `cat /var/run/process.pid`'"

Resource type:

IF resource operator value THEN action

resource

System only resource tests

  • CPU (user|system|wait)
  • SWAP (percent / Byte, kB, MB, GB)

Process only resource tests

  • CPU is the CPU usage of the process itself
  • TOTALCPU is the total CPU usage of the process and its children in (percent).
  • CHILDREN is the number of child processes of the process.

System and process resource tests

  • MEMORY is the memory usage of the system or of a process (without children)
  • LOADAVG(1min|5min|15min)

operator:  "<", ">", "!=", "=="

value: "%" "kB" (1024 Byte) "MB" (1024 KiloByte)

action:   "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"

 


Service Timeout

 

* timeout mechanism is based on number of service restarts and number of poll-cycles

* Default timeout 5 seconds

check process apache with pidfile /var/run/httpd.pid
      start program = "/etc/init.d/httpd start" with timeout 60 seconds
      stop program  = "/etc/init.d/httpd stop" with timeout 60 seconds
      if failed port 80 then restart
      if failed port 443 with timeout 15 seconds then restart

 


Connection Test

 

Port Test

IF <FAILED|SUCCEEDED>
    <UNIXSOCKET path>
    [TYPE <TCP|UDP>]
    [PROTOCOL protocol | <SEND|EXPECT> "string",...]
    [RESPONSETIME number <MILLISECONDS|SECONDS>]
    [TIMEOUT number SECONDS]
    [RETRY number]
THEN action

Default value

  • TIMEOUT 5
  • RETRY 0       # fail on first error

RETRY

[2024-04-17T16:04:51+0800] warning  : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused (attempt 1/3)
[2024-04-17T16:04:51+0800] warning  : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused (attempt 2/3)
[2024-04-17T16:04:51+0800] error    : 'websocket' failed protocol test [DEFAULT] at [localhost]:4432 [TCP/IP] -- Connection refused

i.e.

[1]

if failed port 80 then alert

[2]

if failed
    port 80
    for 3 cycles
then alert

[3]

if failed
    port 80
    for 3 times within 5 cycles
then alert

PROTOCOL

# Web

  • APACHE-STATUS
  • FTP
  • HTTP HTTPS
  • WEBSOCKET

# Mail

  • SIEVE
  • SMTP SMTPS
  • IMAP IMAPS
  • POP POPS

# DB

  • MYSQL PGSQL MONGODB
  • RADIUS MEMCACHE

# Admin

  • SSH RSYNC FAIL2BAN

# Other

  • DNS

 


Protocol Test

 

Usage:

IF FAILED [host] <port> [type] [protocol | {send/expect}+] [timeout] [retry] THEN action

Example

if failed port 80 protocol http then ...
if failed port 25 protocol smtp then ...
if failed port 53 type udp protocol dns then alert

Action:

"ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"

i.e.

if failed port 80 then alert

if failed port 53 type udp protocol dns then alert

ping

Usage:

IF FAILED PING[4|6]
   [COUNT number]
   [SIZE number]
   [TIMEOUT number SECONDS]
   [ADDRESS string]
THEN action

Parameter:

ADDRESS n.n.n.n | string

  If a DNS host name was used in the "CHECK HOST" statement and

  the host name resolve to several addresses (either IPv4 or IPv6),

  Monit will ping the first available address and continue with the next address

  until one connection succeed or until there are no more addresses left to try.

COUNT (Default: 3) <-- up to 66% packet loss is tolerated (其中一個成功就當成功)

  How many consecutive echo requests will be send to the host in one cycle.
  (一個接一個 send 的)

  If you require 100% ping success, set the count to 1

TIMEOUT (Default: 5s)

  no reply came within TIMEOUT frame, Monit reports error. (每一個 package 都使用這 timeout value)

Example:

# 每 3 秒 ping 10.3.3.2 一次, 總共 ping 5 次

CHECK HOST hk_china_vpn ADDRESS 10.3.3.2
  IF FAILED PING
    COUNT 5 TIMEOUT 3 SECONDS
    2 TIMES WITHIN 3 CYCLES
  THEN alert

send & expect:

SEND/EXPECT can be used with any socket type, such as TCP sockets, UNIX sockets and UDP sockets.

The SEND statement sends a string to the server port and

(You can use non-printable characters in a SEND string if needed. Use the hex notation, \0xHEXHEX)

the EXPECT statement compares a string read from the server (Default: 255 bytes)

(you can use regular expressions in the EXPECT string)

CHECK HOST localhost ADDRESS 127.0.0.1
        if failed port 3333 TYPE TCP and
        send '{"id": 1,"jsonrpc": "2.0","method": "miner_getstat1"}\r\n'
        expect ".*result.*"
        with timeout 5 seconds for 2 cycles                    # 1 min
        then exec /root/scripts/eth/restart-ethminer.sh
        repeat every 6 cycles                                  # 3 min

i.e.

# smtp

if failed port 25
   expect "^220.*\r\n"
   send "HELO localhost.localdomain\r\n"
   expect "^250.*\r\n"
   send "QUIT\r\n"
   expect "^221.*\r\n" ...

 


Check Services(protocol)

 

Service

apache-status

Check server performance by examination of the status page

Apache Settings: http://datahunter.org/apache_server-info

  • "_" Waiting for Connection
  • "K" Keepalive (read)
  • "S" Starting up
  • "L" Logging

Usage

PROTOCOL APACHE-STATUS 
[PATH <path>] [USERNAME <string>] [PASSWORD <string>] [<property> <operator> <number>]+

PATH # Default: "/server-status"

property:

  • (1) logging (loglimit)                                             # "L"
  • (2) closing connections (closelimit)                         # "C"
  • (3) performing DNS lookups (dnslimit)                    # "D"
  • (4) in keepalive with a client (keepalivelimit)           # "K"
  • (5) replying to a client (replylimit)                          # "W"
  • (6) receiving a request (requestlimit)                      # "R"
  • (7) initialising (startlimit)                                       # "S"
  • (8) waiting for incoming connections (waitlimit)       # "_"
  • (9) gracefully closing down (gracefullimit)               # "G"
  • (10) performing cleanup procedures (cleanuplimit)  # "I"

Operator is one of "<", "=", ">"

i.e.

if failed port 80 protocol apache-status
    replylimit > 60% or
    requestlimit > 60% or
    waitlimit < 10%
then alert

# if 60% or more Apache child precesses are simultaneously writing to the logs.
# 90% host read
# 10% child is free

check process httpd with pidfile /var/run/httpd/httpd.pid
        start program = "/usr/bin/systemctl start httpd"
        stop program  = "/usr/bin/systemctl stop httpd"
        # Resource
        if totalmem > 512 MB for 5 cycles then restart
        if cpu > 70% for 2 cycles then alert
        if cpu > 90% for 5 cycles then restart
        if loadavg(5min) greater than 10 for 8 cycles then stop
        # apache-status
        if failed host 127.0.0.1 port 80 protocol apache-status
                replylimit > 60% or
                keepalivelimit > 90% or
                waitlimit < 20%
                then alert

waitlimit 設定

/server-status

_______________________W_.........................______________
___________.....................................................
................................................................
..........................................................

# "_" = 49, "W" = 1, "." = 200

由於 "." 都計在 total 內, 設定 "waitlimit < 20%" 也會有機會出 Alert

Apache 設定:

MaxSpareThreads 75
MinSpareThreads 25
MaxRequestWorkers 250

所以 waitlimit 應該係跟 MinSpareThreads

MinSpareThreads/MaxRequestWorkers X 100% = 10

waitlimit < 10%

 

http

PROTO(COL) HTTP
    [USERNAME "string"]
    [PASSWORD "string"]
    [REQUEST "string"]
    [METHOD <GET|HEAD>]
    [STATUS operator number]
    [CHECKSUM checksum]
    [HTTP HEADERS list of headers]
    [CONTENT < "=" | "!=" > STRING]

i.e.

if failed host 192.168.1.100 port 8080 protocol http
   and request '/testing' hostheader 'datahunter.org'
   with timeout 5 seconds for 3 cycles
   then alert

ssh

check process sshd with pidfile /var/run/sshd.pid
  start program "/etc/init.d/ssh start" with timeout 30 seconds
  stop program "/etc/init.d/ssh stop"
  if failed port 22 protocol ssh then restart
  if 5 restarts within 5 cycles then timeout
  group core

ftp

check process proftpd with pidfile /var/run/proftpd.pid
  start program = "/etc/init.d/proftpd start"
  stop program  = "/etc/init.d/proftpd stop"
  if failed port 21 protocol ftp then restart
  if 5 restarts within 5 cycles then timeout

smtp

check process postfix with pidfile /var/spool/postfix/pid/master.pid
  start program = "/etc/init.d/postfix start"
  stop  program = "/etc/init.d/postfix stop"
  if failed port 25 protocol smtp then restart
  if 5 restarts within 5 cycles then timeout

pop3

check process qpopper with pidfile /var/run/popper.pid
   group mail
   start program = "/etc/init.d/qpopper start"
   stop  program = "/etc/init.d/qpopper stop"
   if 5 restarts within 5 cycles then timeout
   if failed port 110 type TCP protocol POP then restart

imap

check process dovecot with pidfile /var/run/dovecot/master.pid
   start program = "/etc/init.d/dovecot start"
   stop program = "/etc/init.d/dovecot stop"
   group mail
   if failed host mail.yourdomain.tld port 993 type tcpssl sslauto protocol imap for 5 cycles then restart
   if 3 restarts within 5 cycles then timeout

mysql

Perform connection test with login

1) 建立 User A/C

CREATE USER 'monit'@'127.0.0.1' IDENTIFIED BY 'mysecretpassword';
FLUSH PRIVILEGES;

Notes

  • User: 'monit'@'localhost'         # unixsocket
  • User:  'monit'@'127.0.0.1'       # port 3306

2) mysqld.monit    # chmod 600 mysqld.monit

# RHEL 8
check process mysql with pidfile /run/mysqld/mysqld.pid
        start program = "/usr/bin/systemctl start mysqld"
        stop program = "/usr/bin/systemctl stop mysqld"
        if failed
                port 3306
                protocol mysql username "monit" password "XXXX"
        then alert
        if 2 restarts within 6 cycles then timeout

samba

check process smbd with pidfile /opt/samba2.2/var/locks/smbd.pid
   group samba
   start program = "/etc/init.d/smbd start"
   stop  program = "/etc/init.d/smbd stop"
   if failed host 192.168.1.1 port 139 type TCP then restart
   if 5 restarts within 5 cycles then timeout

# To have Monit start the server if it's not running, add a start statement:

check process nginx with pidfile /var/run/nginx.pid
       start program = "/etc/init.d/nginx start"
       stop program  = "/etc/init.d/nginx stop"

# test the checksum for a document on a remote server.

check host datahunter.org with address datahunter.org
    if failed
        port 80 protocol http and
        request "/monit/dist/monit-5.7.tar.gz"
        with checksum ?????????????????????????????????????
        then alert

 

 

Creative Commons license icon Creative Commons license icon