SmokePing - Network monitoring

最後更新: 2015-08-20

介紹

Homepage: http://oss.oetiker.ch/smokeping/

Program: Perl

* measure by using fping
* collected measurements located in "/var/lib/smokeping"

 

Install on Centos6

 

# yum package

yum install rrdtool-perl perl-FCGI perl-CGI perl-libwww-perl perl-ExtUtils-MakeMaker setools \
  fping curl perl perl-Net-Telnet perl-Net-DNS perl-LDAP perl-libwww-perl perl-IO-Socket-SSL \
  perl-Socket6 perl-CGI-SpeedyCGI gcc httpd zip unzip

yum install httpd mod_fcgid

# Compile

cd /usr/src

wget http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz

tar -xzvf smokeping-2.6.11.tar.gz

cd smokeping-2.6.11

# Compile modules (要等好耐)

# 15 distributions installed

./setup/build-perl-modules.sh

mkdir /opt/smokeping

# 14Mbyte

cp -a thirdparty /opt/smokeping/

./configure --prefix=/opt/smokeping

gmake install

# 建立 config file 內設定好的 Folder

cd /opt/smokeping

mkdir data var cache

chown apache. /opt/smokeping/cache

ln -s /opt/smokeping/cache /opt/smokeping/htdocs/cache

# Checking

/opt/smokeping/bin/smokeping/smokeping --version

2.006011

 


Start / Stop

 

Starting the Smokeping Daemon

./bin/smokeping --config=/opt/smokeping/etc/config --debug

./bin/smokeping --config=/opt/smokeping/etc/config --logfile=smoke.log

script:

wget http://oss.oetiker.ch/smokeping/pub/contrib/smokeping-start-script

mv smokeping-start-script /etc/init.d/smokeping

chmod 755 /etc/init.d/smokeping

# chkconfig: - 84 15
# the path to your PID file
PIDFILE=/opt/smokeping/var/smokeping.pid
# path to smokeping script
SMOKEPING=/opt/smokeping/bin/smokeping

service smokeping restart

chkconfig smokeping on

Checking

ps aux | grep smokeping

root     26789  0.0  2.8  26456 14400 ?        Ss   17:32   0:00 /opt/smokeping/bin/smokeping [FPing]

Log

Aug 19 17:32:40 iZ947akbat7Z smokeping[26789]: FPing: probing 2 targets with step 60 s and offset 46 s.

 


HTTP

 

cp /opt/smokeping/htdocs/smokeping.fcgi.dist /opt/smokeping/htdocs/smokeping.fcgi

vi /etc/httpd/conf.d/smokeping.conf

Alias /smokeping /opt/smokeping/htdocs
<Directory "/opt/smokeping/htdocs">
    Options FollowSymLinks +ExecCGI
    Allow from w.x.y.z
</Directory>

/etc/init.d/httpd restart

chkconfig httpd on

 


Configure

 

建立 config file

cd /opt/smokeping/etc/

for foo in *.dist; do cp $foo `basename $foo .dist`; done

mkdir backup

mv *.dist backup

/opt/smokeping/etc/config

'#' denotes a comment up
'\' continued line on the next line

# 必須修改的設定
contact=x@y
cgiurl = http://112.74.89.205/smokeping/smokeping.fcgi
smokemail = /opt/smokeping/etc/smokemail
tmail = /opt/smokeping/etc/tmail
template = /opt/smokeping/etc/basepage.html
syslogfacility = local0

設定 Default 的 DB Save Setting

*** Database ***

step     = 60
pings    = 30

Probe 的設定

*** Probes ***

+ FPing

binary = /usr/bin/fping
offset = 50%
step = 60
timeout = 2
pings = 6

# pings: how many pings should be sent in each time interval.
# step: duration of time interval (in seconds) for probing. (Default: 5 minutes)
# timeout: timeout value to be used in a given probing tool.

# Send an extra ping and then discarge the first answer
blazemode = true

# fping "-p" parameter, but in seconds
# sets the time that fping waits between successive packets
 hostinterval = 1.5
 
# fping "-i" parameter, but in seconds
$ minimum amount of time between sending a ping packet
 mininterval = 0.001

# offset 
# offset: how varied multiple concurrent probes are in terms of their launch time within a given time interval.
# eg. 8:00, 8:05, 8:10
# eg. 8:02:30, 8:07:30
# ++ FPingNormal
# offset = 0%
# 
# ++ FPingLarge
# packetsize = 5000
# offset = 50%

+ TCPPing

binary = /usr/bin/tcpping
forks = 5
offset = 50%
step = 60
timeout = 10
pings = 5

Targets 的設定

*** Targets ***

# 設定 Target Default 會用什麼 probe
probe = FPing

# 一定有要一個 menu 在"頂"menu = Latency
title = Latency Measurement
remark = SmokePing Latency Test.

# 自定 menu, 每個 menu 都要有 "menu" 這個關鍵字
 + mysite1
 menu = Site 1
 title = Hosts in Site 1

# 在 menu 內定義一個 host
# URL link 會用到
 ++ myhost1
# URL page 的 Tile
 title = My myhost1 on mysite1
 host = myhost1.mysite1.example

 + mysite2
 menu = Site 2
 title = Hosts in Site 2
 
 ++ myhost3
 host = myhost3.mysite2.example
 ++ myhost4
 host = myhost4.mysite2.example

簡易版

*** Targets ***

probe = FPing

 menu = Top
 title = Network Latency Grapher

 + DongGuan
 title = router on DongGuan
 host = 192.168.234.17

 + SiChuan
 title = router on SiChuan
 host = 192.168.234.21

 

Alert 的設定

The Alert section lets you setup "loss(%)" and "rtt(ms)" pattern detectors.

After "each round of polling", SmokePing will examine its data and determine which detectors match.

Detectors are enabled per target and get "inherited" by the targets children.

pattern detectors

OP (==, >=, >, <)

# target's RTT goes from constantly below 10ms to constantly 100ms and more

old ------------------------------> new

<10,<10,<10,<10,<10,>10,>100,>100,>100

This has the disadvantage, that they will fail to find conditions which were already present when launching smokeping.

==S

# to detect lines that have been losing more than 20% of the packets for two periods after "startup".

# Detectors normally act on state changes. This has the disadvantage, that they will fail to find conditions which were

already present when launching smokeping.

==S,>20%,>20%

*X*

want to throw an alert if they occur several times within a certain amount of times. The operator *X* will ignore up to

X values and still let the pattern match:

>10%,*10*,>10%

U

U which is true for unknown data together with the == and =! operators.

log

0Sep 18 15:20:05 iZ947akbat7Z smokeping[21812]: Alert lossdetect was raised for TEST
Sep 18 15:21:06 iZ947akbat7Z smokeping[21812]: Alert lossdetect was cleared for TEST

Code

*** Alerts ***

to = alertee@address.somewhere
from = smokealert@company.xy

+lossdetect
type = loss
# in percent
pattern = ==S,>70%,>70%,>70%
edgetrigger=yes
comment = suddenly there is packet loss

+anydelay
type = rtt
# in milliseconds
pattern = >100
edgetrigger=yes
comment = measurement has 100ms delay

edgetrigger=no

The alert notifications and/or the programs executed are normally triggered every time the alert matches. If this variable is set to 'yes', they will be triggered only when the alert's state is changed, ie. when it's raised and when it's cleared.

taget 要有設定

*** Targets ***

................

 + myhost3
 host = 192.168.234.21

 # Comma separated list of alert names
 # Use an empty alerts definition to remove inherited alerts from the current target and its children.
 alerts = lossdetect

Pattern

==S,>70%,>70%

lossdetect was raised on TEST

Data (old --> now)
------------------
loss: S, 100%, 100%

lossdetect was cleared on TEST (要"edgetrigger=yes" 才有 "cleared" msg)

Data (old --> now)
------------------
loss: 100%, 100%, 100%

 


Checking

 

Configure check

smokeping --check

Configuration file '/opt/smokeping/bin/../etc/config' syntax OK.

log

Sep 18 12:44:34 iZ947akbat7Z smokeping[32184]: FPing: probing 3 targets with step 60 s and offset 54 s.

reload

--reload     Reload configuration in the running process without interrupting any probes

 


更改顏色

 

Config

*** Presentation ***

+ detail

width = 600
height = 200
unison_tolerance = 2

"Last 3 Hours"    3h
"Last 30 Hours"   30h
"Last 10 Days"    10d


# In the Detail view (valid on level 2)

++loss_colors
# Loss Color   Legend
  1    00ff00  "1"
  3    00ffff  "3/10"
  5    0000ff  "10/10"
  15   800080  "15/30"
  20   ff00ff  "20/30"
  29   ff0000  "29/30"

更改完設定後, 要行

service httpd restart

P.S.

Loss => larger or equal to this number
Legend => Description

#008000   綠     1
#00FFFF   青     3
#0000FF   藍     5
#800080   紫    10
#FFA500   燈    15
#FF00FF   粉    20
#FF0000   紅    29

 

 


Toubleshoot

 

當改了 timeout / pings 設定後, 會有以下 Error

Error: RRD parameter mismatch ('Different number of data sources: /opt/smokeping/data/DongGuan/myhost1.rrd has 9, create string has 33'). You must delete /opt/smokeping/data/DongGuan/myhost1.rrd or fix the configuration parameters.
/etc/init.d/smokeping start: smokeping could not be started

只好清了以下 Folder 內的 Data

/opt/smokeping/cache
/opt/smokeping/data

 


Time period alert

 

# script to on / off alert

#!/bin/bash

if [ "$1" == "on" ]; then
        echo "on"
        sed -i 's/#alerts/alerts/g' /opt/smokeping/etc/config
        /opt/smokeping/bin/smokeping --reload
elif [ "$1" == "off" ]
then
        echo "off"
        sed -i 's/alerts/#alerts/g' /opt/smokeping/etc/config
        /opt/smokeping/bin/smokeping --reload
else
        echo "Usage: on / off"
fi

Checking

grep -n 'alert' /opt/smokeping/etc/config

103:alerts = LossDetect,LossRecover
108:alerts = LossDetect,LossRecover
113:alerts = LossDetect,LossRecover

 

 


Reading the Graphs

 

av md: average median

av ls: average loss

av sd: the average standard deviation of the multiple measurements in each round

am/as: the ratio of average median and average standard deviation

Sometimes a test packet is sent out but never returns. This is called packet-loss.
The color of the median line changes according to the number of packets lost.

All this information together gives an indication of network health.
For example, packet loss is something which should not happen out of the blue.
It can mean that a device in the middle of the link is overloaded or a router configuration somewhere is wrong.

Heavy fluctuation of the RTT (round trip time) values also indicate that the network is overloaded.
This shows on the graph as smoke; the more smoke, the more fluctuation.
The dark area around the line shows the amount of variation between individual probes.

http://oss.oetiker.ch/smokeping/doc/reading.en.html