最後更新: 2019-08-13
介紹
real-time performance and health monitoring solution
Designed to:
- Solve the centralization problem of monitoring (Scales to infinity)
- Replace the console for performance troubleshooting
HomePage: http://my-netdata.io
特點
- 1s granularity
- Zero disk I/O (預設所有資料放在 RAM)
Port:
19999/tcp
運作:
Query | Collect -> Store -> Stream -> Archive | Check
positive / negative values
positive values
read, input, inbound, received
negative values
write, output, outbound, sent
monitoring agent
- metrics collector
- time-series database
- metrics visualizer
- alarms notification engine
Security Design
- Netdata daemon runs as a normal system user
- plugins perform a hard coded data collection job
- plugins & Netdata slaves unidirectional: from the plugin towards the Netdata daemon
- dashboards are read-only
- data do not leave the server where they are collected
- Netdata servers do not talk to each other
- your browser connects all the Netdata servers
Install
netdata 一共有 4 重安裝方式
- Linux 64bit pre-built static binary
- Binary Packages
- Run Netdata in a Docker container
- Install Netdata on Linux manually
Static Binary
mkdir /usr/src/netdata
cd /usr/src/netdata
wget https://github.com/netdata/netdata/releases/download/v1.16.0/netdata-v1....
chmod 700 ./netdata-v1.16.0.gz.run
./netdata-v1.16.0.gz.run
systemctl enable netdata
systemctl start netdata
Remark
- 會安裝在 /opt/netdata
- 會建立 User: 'netdata', Group: 'netdata'
Checking
netstat -ntlp | grep netdata
tcp 0 0 127.0.0.1:8125 0.0.0.0:* LISTEN 22515/netdata tcp 0 0 0.0.0.0:19999 0.0.0.0:* LISTEN 22515/netdata
Manually
Source: https://github.com/firehol/netdata.git
# Debian / Ubuntu
apt-get install zlib1g-dev uuid-dev libuv1-dev liblz4-dev libjudy-dev libssl-dev libmnl-dev \ gcc make git autoconf autoconf-archive autogen automake pkg-config curl python
# CentOS / Red Hat Enterprise Linux
yum install autoconf automake curl gcc git make nc pkgconfig python\ libmnl-devel libuuid-devel openssl-devel libuv-devel lz4-devel Judy-devel zlib-devel
Usage
Web: http://your.server.ip:19999/
the current charts zooming (SHIFT + mouse wheel over a chart),
the highlighted time-frame (ALT + select an area on a chart),
Auto-detection of data collection sources
This auto-detection process happens only once, when Netdata starts.
Exceptions:
containers and VMs are auto-detected forever
Configure
Get running config:
http://127.0.0.1:19999/netdata.conf
Config File Location:
/opt/netdata/etc/netdata/netdata.conf
# 建立 config file
wget -O /opt/netdata/etc/netdata/netdata.conf http://localhost:19999/netdata.conf
# CPU & RAM Usage
[global] # Enable KSM to half Netdata memory requirement history = 3600 update every = 1
# Memory modes
[global] memory mode = save cache directory = /var/cache/netdata
ram
data are purely in memory. Data are never saved on disk. (Supports KSM)
save (the default)
data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata restart.
It also uses mmap() and supports KSM.
map
data are in memory mapped files. This works like the swap. (constant write on your disk)
(does not support KSM)
For each chart, Netdata maps the following files:
- chart/main.db # chart information. Every time data are collected for a chart, this is updated.
- chart/dimension_name.db # round robin database
none
without a database (collected metrics can only be streamed to another Netdata).
alloc, like ram but it uses calloc() and does not support KSM. (fallback)
dbengine
The data are in database files.
There is some amount of RAM dedicated to data caching and indexing
The number of history entries is not fixed (depends on the configured disk space)
("history" configuration option is meaningless for "memory mode = dbengine")
This is the only mode that supports changing "update_every" without losing the previously stored metrics
Suggest to use this mode on nodes that also run other applications
Database Engine uses direct I/O to avoid polluting the OS filesystem caches
The DB engine stores chart metric values in 4k pages in memory.
Each chart dimension gets its own page to store consecutive values generated from the data collectors.
When those pages fill up they are slowly compressed and flushed to disk.
亦即是每 17 min. flush 一次 (4096 / 4 = 1024 sec (dimension: 1s) # 每類 4 kbyte page, 每隻 record 4 bytes)
When the disk quota is exceeded the oldest values are removed from the DB engine at real time
Files
ls /opt/netdata/var/cache/netdata/dbengine
datafile-1-0000000001.ndf journalfile-1-0000000001.njf datafile-1-0000000002.ndf # more recent metric data journalfile-1-0000000002.njf ...
Config
/opt/netdata/etc/netdata/netdata.conf
[global] memory mode = dbengine # Unit: MiB page cache size = 32 dbengine disk space = 256
* There is one DB engine instance per Netdata host/node
* All DB engine instances, for localhost and all other streaming recipient nodes inherit their configuration from netdata.conf
* There are explicit memory requirements per DB engine instance
File descriptor
The Database Engine may keep a significant amount of files open per instance
(at least 50 file descriptors available per dbengine instance)
systemctl edit netdata
[Service] LimitNOFILE=65536
Remark: /etc/sysctl.conf: "fs.file-max = 65536"
Performance
OOM Score
[global] OOM score = 1000
Netdata runs with OOMScore = 1000
This means Netdata will be the first to be killed when your server runs out of memory.
Scheduling Policy
[global] process scheduling policy = idle
By default Netdata runs with the idle process scheduling policy,
so that it uses CPU resources, only when there is idle CPU to spare.
Database
Ram Usage (for DB)
The default history is 3600 entries,
thus it will need 14.4KB for each chart dimension (4 bytes for the value * the entries of its history)
If you need 1000 dimensions, they will occupy just 14.4MB.
If data collection frequency is set to 1 second. You will have just one hour of data.
KSM
Netdata offers all its round robin database to kernel for deduplication
KSM is a solution that will provide 60+% memory savings to Netdata.
# by default 0; 1 for the kernel to spawn ksmd
echo 1 >/sys/kernel/mm/ksm/run
Authentication
IP Level ACL
* best and the suggested way to protect Netdata
=> Expose Netdata only in a private LAN => IP Level
[web] bind to = 10.1.1.1:19999 localhost:19999
username & password
Use web server to provide authentication (in front of all your Netdata servers)
Web Server Setting (nginx)
Nginx to forward requests to netdata
HTTP auth file: /etc/nginx/netdata.users
URL: https://your-server/netdata/
# Running netdata as a subfolder to an existing virtual host
server { ... include /etc/nginx/templates/netdata.tmpl; }
netdata.tmpl
location = /status { return 301 /status/; } location ~ /status/(?<ndpath>.*) { proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Server $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_http_version 1.1; proxy_pass_request_headers on; proxy_set_header Connection "keep-alive"; proxy_store off; proxy_pass http://127.0.0.1:19999/$ndpath$is_args$args; gzip on; gzip_proxied any; gzip_types *; auth_basic "Authentication Required"; auth_basic_user_file /etc/nginx/netdata.users; }
Netdata registry
registry = node menu (on top left corner of the Netdata dashboards)
目的:
- enables the Netdata cloud features, such as the node view
- multiple Netdata are integrated into one distributed application (distributed monitoring)
The registry keeps track of 4 entities:
- machine_guid: a random GUID generated by each Netdata (first time it starts)
- person_guid: the web browsers accessing the Netdata installations (first time it sees a new web browser)
- URLs of Netdata installations
- accounts: i.e. the information used to sign-in via one of the available sign-in methods.
Default registry: https://registry.my-netdata.io
Who talks to the registry?
Your web browser only!
Flow
Browser --> netdata
<-URL to Registry-
Browser --> Registry
Run your own registry
# Server (registry)
* Every Netdata can be a registry
[registry] enabled = yes registry to announce = http://your.registry:19999 allow from = 192.168.123.*
Remark
(1) registry 有 DB database: /var/lib/netdata/registry/*.db
- registry-log.db, the transaction log
- registry.db, the database
(2) IPs allowed by [registry].allow from should also be allowed by [web].allow connection from.
# Client (netdata)
Advertise it to registry
[registry]
enabled = no
registry to announce = http://your.registry:19999
Web
Disable Web
[web] mode = none
Threads
# The default number of processor threads is min(cpu cores, 6)
[web] web server threads = 4 web server max sockets = 512
Access lists
Netdata supports access lists in netdata.conf:
[web] allow connections from = localhost * allow dashboard from = localhost * allow badges from = * allow streaming from = * allow netdata.conf from = localhost allow management from = localhost
Plugins
internal data collection plugins (running inside the Netdata daemon)
external data collection plugins (independent processes, sending data to Netdata over pipes)
modular plugin orchestrators (external plugins that have multiple data collection modules)
netdata.conf
[plugin:XXX] ...
Health monitoring
Default alarms shipped with Netdata.
/opt/netdata/etc/netdata/netdata.conf
[health] enabled = yes
/opt/netdata/etc/netdata/edit-config health_alarm_notify.conf
SEND_EMAIL="YES"
Anonymous Statistics
Starting with v1.12 Netdata also collects anonymous statistics on certain events for:
To opt-out from sending anonymous statistics
touch /opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics
log
/opt/netdata/var/log/netdata/error.log
2019-08-29 17:32:14: netdata INFO : MAIN : /opt/netdata/usr/libexec/netdata/plugins.d/anonymous-statistics.sh 'EXIT' 'OK' '-'
anonymous-statistics.sh
if [ -f "/opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics" ]; then exit 0 fi
Central Netdata server (streaming)
Netdata slaves streaming metrics to upstream Netdata servers,
use exactly the same protocol local plugins use.