nagios - configure

最後更新: 2018-12-07

目錄

  • Install
  • Check version
  • Main Configure File
  • Host Definition
  • Service Definition
  • Notification
  • Simple Example
  • Variable
  • Check Host without Ping
  • Disable Flap Detection
  • History

 


Install

 

Server

# C6 - epel

yum install nagios nagios-plugins-all

 


Check version

 

nagios -V

Nagios Core 3.2.3

 


Main Configure File

 

nagios.cfg

log_file=/var/log/nagios/nagios.log

cfg_file= .....

# 目錄內每一個檔案都是 .cfg 尾
cfg_dir=/etc/nagios/vps_server

# $USERx$ macro (usernames, passwords, etc), restrictive permissions (600)
resource_file=/etc/nagios/resource.cfg

# nagios 用什麼身份去行

nagios_user=nagio
nagios_group=nagios


# OBJECT CACHE FILE
# This option determines where object definitions are cached when
# Nagios starts/restarts.  The CGIs read object definitions from
# this cache file (rather than looking at the object config files
# directly) in order to prevent inconsistencies that can occur
# when the config files are modified after Nagios starts.

object_cache_file=/var/spool/nagios/objects.cache

# STATUS FILE
# This is where the current status of all monitored services and
# hosts is stored.  Its contents are read and processed by the CGIs.
# The contents of the status file are deleted every time Nagios
#  restarts.

status_file=/var/log/nagios/status.dat

cgi.cfg

main_config_file=/etc/nagios/nagios.cfg

physical_html_path=/usr/share/nagios/html

# 決定 htpasswd 內什麼人做到什麼
# comma-delimited list of all usernames

# SYSTEM/PROCESS INFORMATION ACCESS
authorized_for_system_information=nagiosadmin

# CONFIGURATION INFORMATION ACCESS
authorized_for_configuration_information=nagiosadmin

# GLOBAL HOST/SERVICE VIEW ACCESS
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin

# SYSTEM/PROCESS COMMAND ACCESS
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

# GLOBAL HOST/SERVICE COMMAND ACCESS
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

# READ-ONLY USERS
authorized_for_read_only=user1,user2

 


Host Definition

 

define host{
    host_name           host_name
    address             address


    # 當 service 或 server 有問題時, 會通知邊個, Multiple contacts should be separated by commas )
    contacts            contacts1, contacts2
    contact_groups      contact_groups1, contact_groups2


    hostgroups          hostgroup_names

    alias               alias
    display_name        display_name

    check_interval      N
    retry_interval      N
    check_period        timeperiod_name
    max_check_attempts  #


    check_command    command_name

    initial_state                 [o,d,u]

    active_checks_enabled         [0/1]
    passive_checks_enabled        [0/1]
    obsess_over_host              [0/1]
    check_freshness               [0/1]
    freshness_threshold           #
    event_handler                 command_name
    event_handler_enabled         [0/1]
    low_flap_threshold            #
    high_flap_threshold           #
    flap_detection_enabled        [0/1]
    flap_detection_options        [o,d,u]
    process_perf_data             [0/1]
    retain_status_information     [0/1]
    retain_nonstatus_information  [0/1]

    notification_interval       #
    first_notification_delay    #
    notification_period         timeperiod_name
    notification_options        [d,u,r,f,s]
    notifications_enabled       [0/1]

    stalking_options            [o,d,u]
    notes                       note_string
    notes_url                   url
    action_url                  url
    icon_image                  image_file
    icon_image_alt              alt_string
    vrml_image                  image_file
    statusmap_image             image_file
    2d_coords                   x_coord,y_coord
    3d_coords                   x_coord,y_coord,z_coord
}

# Other Setting

# hostgroups

This directive is used to identify the short name(s) of the hostgroup(s) that the host belongs to. Multiple hostgroups should be separated by commas. This directive may be used as an alternative to (or in addition to) using the members directive in hostgroup definitions.

============================ Name

# host_name

define a short name used to identify the host.

It is used in host group and service definitions to reference this particular host.

# alias

A longer name or description

# display_name:

This directive is used to define an alternate name that should be displayed in the web interface for this host.

(Default: host_name)

============================

# icon_image:

40x40 pixels, /usr/local/nagios/share/images/logos

 


Service Definition

 

一個 Host 可以有多個 Service

define service  {
        use                             service-smtp
        host_name                       myserver

        # 個別設定這 service 通知其他人
        contact_groups                  notifybyemailgroup
}

 

# Service check period

# check

# check_interval: (minutes)

scheduling the next "regular" check of the service.

# retry_interval: (minutes)

interval when they have changed to a non-OK state.

當 retry 的次數到達 "max_check_attempts" 時, 那 interval 會用返 "check_interval"

# max_check_attempts:

Nagios will retry the host check command if it returns any state other than an OK state.

Setting "1" will cause Nagios to generate an alert without retrying the service check again.

# check_period:   

active checks of this host can be made.

i.e.

check_interval            5
retry_interval            1
max_check_attempts        5
check_period              24x7

 


Notification

 

# notifications_enabled 0/1

# notification_interval minutes

Resend notifications every N minutes

# first_notification_delay N

# "time units" to wait before sending out the first problem notification
# when this service enters a non-OK state.

# notification_period  timeperiod_name

  • 24x7            ; Send host notifications at any time

# notification_options [d,u,r,f,s]

  • w = send notifications on a WARNING state,
  • u = send notifications on an UNREACHABLE state,
  • c = send notifications on a CRITICAL state,
  • d = send notifications on a DOWN state,
  • r = send notifications on recoveries (OK state),
  • f = send notifications when the host starts and stops flapping

Example

# 當 Service 死時, 只會在 15 min 後才有 Alarm

first_notification_delay 15

每次出 notification 都會有 log

grep NOTIFICATION /var/log/nagios/nagios.log

 


Simple Example

 

# host

define host{
    use                linux-server
    host_name          ns1.mydomain.net       # Web Panel 及 設定會同到
    alias              mydomain               # 在 E-Mail notification 會見到
    address            x.x.x.x
    #hostgroups         hostgroup_name        # 如果不在這裡定義, 那可以在 hostgroup 的 members 設定
}

# hostgroup

define hostgroup{
    hostgroup_name     hostgroup_name
    alias              alias
    members            host1, host2, host3 ...
    hostgroup_members  hostgroups
    notes              note_string
    notes_url          url
    action_url         url
}

 

# contact

define contact{
    contact_name                   contact_name
    email                          email_address

    host_notifications_enabled     [0/1]
    service_notifications_enabled  [0/1]

    alias                          alias

    contactgroups                  contactgroup_names

    use                            generic-contact         ; Inherit default values (defined above)

    host_notification_period       timeperiod_name
    service_notification_period    timeperiod_name
    host_notification_options      [d,u,r,f,s,n]
    service_notification_options   [w,u,c,r,f,s,n]

    host_notification_commands     command_name
    service_notification_commands  command_name

    pager                          pager_number or pager_email_gateway
    addressx                       additional_contact_address
    can_submit_commands            [0/1]
    retain_status_information      [0/1]
    retain_nonstatus_information   [0/1]
}

addressx

These addresses can be anything - cell phone numbers, instant messaging addresses, etc.
used to send out an alert to the contact.
Up to six addresses can be defined using these directives (address1 through address6).
The $CONTACTADDRESSx$ macro will contain this value.

can_submit_commands

the contact can submit external commands to Nagios from the CGIs

External applications can submit commands by writing to the command_file,

which is periodically(command_check_interval) processed by the Nagios daemon

External commands format

[time] command_id;command_arguments

retain_X_information

status-related / non-status information about the contact is retained across program restarts

This is only useful if you have enabled state retention using the retain_state_information directive.

This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down.

retain_state_information=1                             # Default: 1
state_retention_file=/var/log/nagios/retention.dat
retention_update_interval=60                           # Unit: minutes

# contactgroup

define contactgroup{
    contactgroup_name  novell-admins            ; 必須
    members            jdoe,rtobert,tzach       ; 必須
    alias              Novell Administrators
}

 

# DOC

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html

 


Variable

 

  • $ARG1$                  <-- "!" 隔開的 input ($ARG1$ through $ARG32$)
  • $HOSTNAME$          <-- host_name directive
  • $HOSTALIAS$          <-- alias directive
  • $HOSTADDRESS$     <-- address directive

 


Check Host without Ping

 

方法1

http://xxxx/nagios/cgi-bin/extinfo.cgi?type=1&host=myserver

"Submit passive check result for this host"

在 "Check Output" 填上 "check_tcp -p 80"

That only updates the status once

方法2

修改 templates.cfg

define host     {
    name                            server-medium-noping
    use                             generic-host
    check_period                    24x7
    check_interval                  6
    retry_interval                  1
    max_check_attempts              6
    check_command                   check-host-alive-noping
    notification_period             period-medium
    notification_interval           30
    notification_options            d,u,r
    contact_groups                  server-admin-group
    register                        0
}

commands.cfg

# Dummy SUCCESS command, used for host we no means of checking
define command{
    command_name server-medium-noping
    command_line $USER1$/check_dummy 0
}

Usage:

# check_dummy <integer state> [optional text]
# 0 => OK
# 1 => WARNING
# 2 => CRITICAL
# 4 => UNKNOWN

之後用新的 "server-medium-noping" 去 define Server

define host     {
        use                             server-medium-noping
        host_name                       myserver
        alias                           myserver
        address                         x.x.x.x
}

 


Disable Flap Detection

 

flapping occurs when a service or host changes state too frequently

=> resulting in a storm of problem and recovery notifications.

enable_flap_detection 0

Nagios checks to see if the services has flapping by last 21 service checks

Since we keep the results of the last 21 service checks in the array (there is a possibility of having 20 state changes.)

calculated amount of change in state over a period of time to determine whether or not a service is flapping

more recent state changes are given more weight than older state changes

when calculating the overall or total percent state change for a particular service.

# If a host was previously not flapping and its total computed state change percentage is
# equal to or greater than "high_host_flap_threshold"
# Nagios considers the host to have just started flapping

high_host_flap_threshold=50

# If the host was previously flapping and
# its total computed state change percentage is less than or equal
# Nagios considers the host to have just stopped flapping

low_host_flap_threshold=25

 


History

 

過去的 Server Warning 及 Critical 記錄

設定位置:

log_archive_path=/usr/nagios/var/archives/                            

log_rotation_method=<n/h/d/w/m>                                        // 多久 rotate 一次, Default: n = None

 


Doc

 

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html