最後更新: 2018-12-07
目錄
- Install
- Check version
- Main Configure File
- Host Definition
- Service Definition
- Notification
- Simple Example
- Variables
- Check Host without Ping
- Disable Flap Detection
- History
Install
Server
# C6 - epel
yum install nagios nagios-plugins-all
Check version
nagios -V
Nagios Core 3.2.3
Main Configure File
nagios.cfg
log_file=/var/log/nagios/nagios.log
cfg_file= .....
# 目錄內每一個檔案都是 .cfg 尾
cfg_dir=/etc/nagios/vps_server
# $USERx$ macro (usernames, passwords, etc), restrictive permissions (600)
resource_file=/etc/nagios/resource.cfg
# nagios 用什麼身份去行
nagios_user=nagio
nagios_group=nagios
# OBJECT CACHE FILE
# This option determines where object definitions are cached when
# Nagios starts/restarts. The CGIs read object definitions from
# this cache file (rather than looking at the object config files
# directly) in order to prevent inconsistencies that can occur
# when the config files are modified after Nagios starts.
object_cache_file=/var/spool/nagios/objects.cache
# STATUS FILE
# This is where the current status of all monitored services and
# hosts is stored. Its contents are read and processed by the CGIs.
# The contents of the status file are deleted every time Nagios
# restarts.
status_file=/var/log/nagios/status.dat
cgi.cfg
main_config_file=/etc/nagios/nagios.cfg physical_html_path=/usr/share/nagios/html # 決定 htpasswd 內什麼人做到什麼 # comma-delimited list of all usernames # SYSTEM/PROCESS INFORMATION ACCESS authorized_for_system_information=nagiosadmin # CONFIGURATION INFORMATION ACCESS authorized_for_configuration_information=nagiosadmin # GLOBAL HOST/SERVICE VIEW ACCESS authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin # SYSTEM/PROCESS COMMAND ACCESS authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin # GLOBAL HOST/SERVICE COMMAND ACCESS authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin # READ-ONLY USERS authorized_for_read_only=user1,user2
Host Definition
define host{ host_name host_name address address # 當 service 或 server 有問題時, 會通知邊個, Multiple contacts should be separated by commas ) contacts contacts1, contacts2 contact_groups contact_groups1, contact_groups2 # It is used in host group and service definitions to reference this particular host hostgroups hostgroup_names # A longer name or description alias alias # Displayed in the web interface for this host (Default: host_name) # 此設定取消了 display_name display_name check_interval N retry_interval N check_period timeperiod_name max_check_attempts # check_command command_name initial_state [o,d,u] active_checks_enabled [0/1] passive_checks_enabled [0/1] obsess_over_host [0/1] check_freshness [0/1] freshness_threshold # event_handler command_name event_handler_enabled [0/1] low_flap_threshold # high_flap_threshold # flap_detection_enabled [0/1] flap_detection_options [o,d,u] process_perf_data [0/1] retain_status_information [0/1] retain_nonstatus_information [0/1] notification_interval # first_notification_delay # notification_period timeperiod_name notification_options [d,u,r,f,s] notifications_enabled [0/1] stalking_options [o,d,u] notes note_string notes_url url action_url url icon_image image_file icon_image_alt alt_string vrml_image image_file statusmap_image image_file 2d_coords x_coord,y_coord 3d_coords x_coord,y_coord,z_coord }
# Other Setting
# hostgroups
This directive is used to identify the short name(s) of the hostgroup(s) that the host belongs to. Multiple hostgroups should be separated by commas. This directive may be used as an alternative to (or in addition to) using the members directive in hostgroup definitions.
============================
# icon_image:
40x40 pixels, /usr/local/nagios/share/images/logos
Service Definition
一個 Host 可以有多個 Service
define service {
use service-smtp
host_name myserver
# 個別設定這 service 通知其他人
contact_groups notifybyemailgroup
}
# Service check period
# check
# check_interval: (minutes)
scheduling the next "regular" check of the service.
# retry_interval: (minutes)
interval when they have changed to a non-OK state.
當 retry 的次數到達 "max_check_attempts" 時, 那 interval 會用返 "check_interval"
# max_check_attempts:
Nagios will retry the host check command if it returns any state other than an OK state.
Setting "1" will cause Nagios to generate an alert without retrying the service check again.
# check_period:
active checks of this host can be made.
i.e.
check_interval 5 retry_interval 1 max_check_attempts 5 check_period 24x7
Notification
notifications_enabled 0/1
Note: If you have state retention enabled, Nagios will ignore this setting when it (re)starts and
use the last known setting for this option (as stored in the state retention file),
unless you disable the use_retained_program_state option.
notification_interval minutes
Resend notifications every N minutes
first_notification_delay N
# "time units" to wait before sending out the first problem notification
# when this service enters a non-OK state.
notification_period timeperiod_name
- 24x7 ; Send host notifications at any time
notification_options [d,u,r,f,s]
- w = send notifications on a WARNING state,
- u = send notifications on an UNREACHABLE state,
- c = send notifications on a CRITICAL state,
- d = send notifications on a DOWN state,
- r = send notifications on recoveries (OK state),
- f = send notifications when the host starts and stops flapping
Example
# 當 Service 死時, 只會在 15 min 後才有 Alarm
first_notification_delay 15
Log
# 每次出 notification 都會有 log
grep NOTIFICATION /var/log/nagios/nagios.log
Simple Example
# host
define host{ use linux-server host_name ns1.mydomain.net # Web Panel 及 設定會同到 alias mydomain # 在 E-Mail notification 會見到 address x.x.x.x #hostgroups hostgroup_name # 如果不在這裡定義, 那可以在 hostgroup 的 members 設定 }
# hostgroup
define hostgroup{ hostgroup_name hostgroup_name alias alias members host1, host2, host3 ... hostgroup_members hostgroups notes note_string notes_url url action_url url }
# contact
define contact{ contact_name contact_name email email_address host_notifications_enabled [0/1] service_notifications_enabled [0/1] alias alias contactgroups contactgroup_names use generic-contact ; Inherit default values (defined above) host_notification_period timeperiod_name service_notification_period timeperiod_name host_notification_options [d,u,r,f,s,n] service_notification_options [w,u,c,r,f,s,n] host_notification_commands command_name service_notification_commands command_name pager pager_number or pager_email_gateway addressx additional_contact_address can_submit_commands [0/1] retain_status_information [0/1] retain_nonstatus_information [0/1] }
addressx
These addresses can be anything - cell phone numbers, instant messaging addresses, etc.
used to send out an alert to the contact.
Up to six addresses can be defined using these directives (address1 through address6).
The $CONTACTADDRESSx$ macro will contain this value.
can_submit_commands
the contact can submit external commands to Nagios from the CGIs
External applications can submit commands by writing to the command_file,
which is periodically(command_check_interval) processed by the Nagios daemon
External commands format
[time] command_id;command_arguments
retain_X_information
status-related / non-status information about the contact is retained across program restarts
This is only useful if you have enabled state retention using the retain_state_information directive.
This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down.
retain_state_information=1 # Default: 1 state_retention_file=/var/log/nagios/retention.dat retention_update_interval=60 # Unit: minutes
# contactgroup
define contactgroup{ contactgroup_name novell-admins ; 必須 members jdoe,rtobert,tzach ; 必須 alias Novell Administrators }
# DOC
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html
Variables
- $ARG1$ <-- "!" 隔開的 input ($ARG1$ through $ARG32$)
- $HOSTNAME$ <-- host_name directive
- $HOSTALIAS$ <-- alias directive
- $HOSTADDRESS$ <-- address directive
Check Host without Ping
方法1
http://xxxx/nagios/cgi-bin/extinfo.cgi?type=1&host=myserver
"Submit passive check result for this host"
在 "Check Output" 填上 "check_tcp -p 80"
That only updates the status once
方法2
修改 templates.cfg
define host {
name server-medium-noping
use generic-host
check_period 24x7
check_interval 6
retry_interval 1
max_check_attempts 6
check_command check-host-alive-noping
notification_period period-medium
notification_interval 30
notification_options d,u,r
contact_groups server-admin-group
register 0
}
commands.cfg
# Dummy SUCCESS command, used for host we no means of checking
define command{
command_name server-medium-noping
command_line $USER1$/check_dummy 0
}
Usage:
# check_dummy <integer state> [optional text]
# 0 => OK
# 1 => WARNING
# 2 => CRITICAL
# 4 => UNKNOWN
之後用新的 "server-medium-noping" 去 define Server
define host {
use server-medium-noping
host_name myserver
alias myserver
address x.x.x.x
}
Disable Flap Detection
flapping occurs when a service or host changes state too frequently
=> resulting in a storm of problem and recovery notifications.
enable_flap_detection 0
Nagios checks to see if the services has flapping by last 21 service checks
Since we keep the results of the last 21 service checks in the array (there is a possibility of having 20 state changes.)
calculated amount of change in state over a period of time to determine whether or not a service is flapping
more recent state changes are given more weight than older state changes
when calculating the overall or total percent state change for a particular service.
# If a host was previously not flapping and its total computed state change percentage is
# equal to or greater than "high_host_flap_threshold"
# Nagios considers the host to have just started flapping
high_host_flap_threshold=50
# If the host was previously flapping and
# its total computed state change percentage is less than or equal
# Nagios considers the host to have just stopped flapping
low_host_flap_threshold=25
History
過去的 Server Warning 及 Critical 記錄
設定位置:
log_archive_path=/usr/nagios/var/archives/
log_rotation_method=<n/h/d/w/m> // 多久 rotate 一次, Default: n = None
Doc
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html