最後更新: 2015-08-20
介紹
Homepage: http://oss.oetiker.ch/smokeping/
Program: Perl
* measure by using fping
* collected measurements located in "/var/lib/smokeping"
Install on Centos6
# yum package
yum install rrdtool-perl perl-FCGI perl-CGI perl-libwww-perl perl-ExtUtils-MakeMaker setools \
fping curl perl perl-Net-Telnet perl-Net-DNS perl-LDAP perl-libwww-perl perl-IO-Socket-SSL \
perl-Socket6 perl-CGI-SpeedyCGI gcc httpd zip unzip
yum install httpd mod_fcgid
# Compile
cd /usr/src
wget http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz
tar -xzvf smokeping-2.6.11.tar.gz
cd smokeping-2.6.11
# Compile modules (要等好耐)
# 15 distributions installed
./setup/build-perl-modules.sh
mkdir /opt/smokeping
# 14Mbyte
cp -a thirdparty /opt/smokeping/
./configure --prefix=/opt/smokeping
gmake install
# 建立 config file 內設定好的 Folder
cd /opt/smokeping
mkdir data var cache
chown apache. /opt/smokeping/cache
ln -s /opt/smokeping/cache /opt/smokeping/htdocs/cache
# Checking
/opt/smokeping/bin/smokeping/smokeping --version
2.006011
Start / Stop
Starting the Smokeping Daemon
./bin/smokeping --config=/opt/smokeping/etc/config --debug
./bin/smokeping --config=/opt/smokeping/etc/config --logfile=smoke.log
script:
wget http://oss.oetiker.ch/smokeping/pub/contrib/smokeping-start-script
mv smokeping-start-script /etc/init.d/smokeping
chmod 755 /etc/init.d/smokeping
# chkconfig: - 84 15 # the path to your PID file PIDFILE=/opt/smokeping/var/smokeping.pid # path to smokeping script SMOKEPING=/opt/smokeping/bin/smokeping
service smokeping restart
chkconfig smokeping on
Checking
ps aux | grep smokeping
root 26789 0.0 2.8 26456 14400 ? Ss 17:32 0:00 /opt/smokeping/bin/smokeping [FPing]
Log
Aug 19 17:32:40 iZ947akbat7Z smokeping[26789]: FPing: probing 2 targets with step 60 s and offset 46 s.
HTTP
cp /opt/smokeping/htdocs/smokeping.fcgi.dist /opt/smokeping/htdocs/smokeping.fcgi
vi /etc/httpd/conf.d/smokeping.conf
Alias /smokeping /opt/smokeping/htdocs
<Directory "/opt/smokeping/htdocs">
Options FollowSymLinks +ExecCGI
Allow from w.x.y.z
</Directory>
/etc/init.d/httpd restart
chkconfig httpd on
Configure
建立 config file
cd /opt/smokeping/etc/
for foo in *.dist; do cp $foo `basename $foo .dist`; done
mkdir backup
mv *.dist backup
/opt/smokeping/etc/config
'#' denotes a comment up '\' continued line on the next line # 必須修改的設定 contact=x@y cgiurl = http://112.74.89.205/smokeping/smokeping.fcgi smokemail = /opt/smokeping/etc/smokemail tmail = /opt/smokeping/etc/tmail template = /opt/smokeping/etc/basepage.html syslogfacility = local0
設定 Default 的 DB Save Setting
*** Database *** step = 60 pings = 30
Probe 的設定
*** Probes *** + FPing binary = /usr/bin/fping offset = 50% step = 60 timeout = 2 pings = 6 # pings: how many pings should be sent in each time interval. # step: duration of time interval (in seconds) for probing. (Default: 5 minutes) # timeout: timeout value to be used in a given probing tool. # Send an extra ping and then discarge the first answer blazemode = true # fping "-p" parameter, but in seconds # sets the time that fping waits between successive packets hostinterval = 1.5 # fping "-i" parameter, but in seconds $ minimum amount of time between sending a ping packet mininterval = 0.001 # offset # offset: how varied multiple concurrent probes are in terms of their launch time within a given time interval. # eg. 8:00, 8:05, 8:10 # eg. 8:02:30, 8:07:30 # ++ FPingNormal # offset = 0% # # ++ FPingLarge # packetsize = 5000 # offset = 50% + TCPPing binary = /usr/bin/tcpping forks = 5 offset = 50% step = 60 timeout = 10 pings = 5
Targets 的設定
*** Targets *** # 設定 Target Default 會用什麼 probe probe = FPing # 一定有要一個 menu 在"頂"先 menu = Latency title = Latency Measurement remark = SmokePing Latency Test. # 自定 menu, 每個 menu 都要有 "menu" 這個關鍵字 + mysite1 menu = Site 1 title = Hosts in Site 1 # 在 menu 內定義一個 host # URL link 會用到 ++ myhost1 # URL page 的 Tile title = My myhost1 on mysite1 host = myhost1.mysite1.example + mysite2 menu = Site 2 title = Hosts in Site 2 ++ myhost3 host = myhost3.mysite2.example ++ myhost4 host = myhost4.mysite2.example
簡易版
*** Targets *** probe = FPing menu = Top title = Network Latency Grapher + DongGuan title = router on DongGuan host = 192.168.234.17 + SiChuan title = router on SiChuan host = 192.168.234.21
Alert 的設定
The Alert section lets you setup "loss(%)" and "rtt(ms)" pattern detectors.
After "each round of polling", SmokePing will examine its data and determine which detectors match.
Detectors are enabled per target and get "inherited" by the targets children.
pattern detectors
OP (==, >=, >, <)
# target's RTT goes from constantly below 10ms to constantly 100ms and more
old ------------------------------> new
<10,<10,<10,<10,<10,>10,>100,>100,>100
This has the disadvantage, that they will fail to find conditions which were already present when launching smokeping.
==S
# to detect lines that have been losing more than 20% of the packets for two periods after "startup".
# Detectors normally act on state changes. This has the disadvantage, that they will fail to find conditions which were
already present when launching smokeping.
==S,>20%,>20%
*X*
want to throw an alert if they occur several times within a certain amount of times. The operator *X* will ignore up to
X values and still let the pattern match:
>10%,*10*,>10%
U
U which is true for unknown data together with the == and =! operators.
log
0Sep 18 15:20:05 iZ947akbat7Z smokeping[21812]: Alert lossdetect was raised for TEST Sep 18 15:21:06 iZ947akbat7Z smokeping[21812]: Alert lossdetect was cleared for TEST
Code
*** Alerts *** to = [email protected] from = [email protected] +lossdetect type = loss # in percent pattern = ==S,>70%,>70%,>70% edgetrigger=yes comment = suddenly there is packet loss +anydelay type = rtt # in milliseconds pattern = >100 edgetrigger=yes comment = measurement has 100ms delay
edgetrigger=no
The alert notifications and/or the programs executed are normally triggered every time the alert matches. If this variable is set to 'yes', they will be triggered only when the alert's state is changed, ie. when it's raised and when it's cleared.
taget 要有設定
*** Targets *** ................ + myhost3 host = 192.168.234.21 # Comma separated list of alert names # Use an empty alerts definition to remove inherited alerts from the current target and its children. alerts = lossdetect
Pattern
==S,>70%,>70%
lossdetect was raised on TEST
Data (old --> now) ------------------ loss: S, 100%, 100%
lossdetect was cleared on TEST (要"edgetrigger=yes" 才有 "cleared" msg)
Data (old --> now) ------------------ loss: 100%, 100%, 100%
Checking
Configure check
smokeping --check
Configuration file '/opt/smokeping/bin/../etc/config' syntax OK.
log
Sep 18 12:44:34 iZ947akbat7Z smokeping[32184]: FPing: probing 3 targets with step 60 s and offset 54 s.
reload
--reload Reload configuration in the running process without interrupting any probes
更改顏色
Config
*** Presentation *** + detail width = 600 height = 200 unison_tolerance = 2 "Last 3 Hours" 3h "Last 30 Hours" 30h "Last 10 Days" 10d # In the Detail view (valid on level 2) ++loss_colors # Loss Color Legend 1 00ff00 "1" 3 00ffff "3/10" 5 0000ff "10/10" 15 800080 "15/30" 20 ff00ff "20/30" 29 ff0000 "29/30"
更改完設定後, 要行
service httpd restart
P.S.
Loss => larger or equal to this number
Legend => Description
#008000 綠 1
#00FFFF 青 3
#0000FF 藍 5
#800080 紫 10
#FFA500 燈 15
#FF00FF 粉 20
#FF0000 紅 29
Toubleshoot
當改了 timeout / pings 設定後, 會有以下 Error
Error: RRD parameter mismatch ('Different number of data sources: /opt/smokeping/data/DongGuan/myhost1.rrd has 9, create string has 33'). You must delete /opt/smokeping/data/DongGuan/myhost1.rrd or fix the configuration parameters.
/etc/init.d/smokeping start: smokeping could not be started
只好清了以下 Folder 內的 Data
/opt/smokeping/cache
/opt/smokeping/data
Time period alert
# script to on / off alert
#!/bin/bash if [ "$1" == "on" ]; then echo "on" sed -i 's/#alerts/alerts/g' /opt/smokeping/etc/config /opt/smokeping/bin/smokeping --reload elif [ "$1" == "off" ] then echo "off" sed -i 's/alerts/#alerts/g' /opt/smokeping/etc/config /opt/smokeping/bin/smokeping --reload else echo "Usage: on / off" fi
Checking
grep -n 'alert' /opt/smokeping/etc/config
103:alerts = LossDetect,LossRecover 108:alerts = LossDetect,LossRecover 113:alerts = LossDetect,LossRecover
Reading the Graphs
av md: average median
av ls: average loss
av sd: the average standard deviation of the multiple measurements in each round
am/as: the ratio of average median and average standard deviation
Sometimes a test packet is sent out but never returns. This is called packet-loss.
The color of the median line changes according to the number of packets lost.
All this information together gives an indication of network health.
For example, packet loss is something which should not happen out of the blue.
It can mean that a device in the middle of the link is overloaded or a router configuration somewhere is wrong.
Heavy fluctuation of the RTT (round trip time) values also indicate that the network is overloaded.
This shows on the graph as smoke; the more smoke, the more fluctuation.
The dark area around the line shows the amount of variation between individual probes.
http://oss.oetiker.ch/smokeping/doc/reading.en.html