netdata

最後更新: 2019-08-13

介紹

real-time performance and health monitoring solution

Designed to:

  • Solve the centralization problem of monitoring (Scales to infinity)
  • Replace the console for performance troubleshooting

HomePage: http://my-netdata.io

特點

  • 1s granularity
  • Zero disk I/O (預設所有資料放在 RAM)

Port:

19999/tcp

運作:

           Query
             |
Collect -> Store -> Stream -> Archive
             |
           Check

positive / negative values

positive values

read, input, inbound, received

negative values

write, output, outbound, sent

monitoring agent

  • metrics collector
  • time-series database
  • metrics visualizer
  • alarms notification engine

Security Design

  • Netdata daemon runs as a normal system user
  • plugins perform a hard coded data collection job
  • plugins & Netdata slaves unidirectional: from the plugin towards the Netdata daemon
  • dashboards are read-only
  • data do not leave the server where they are collected
  • Netdata servers do not talk to each other
  • your browser connects all the Netdata servers

 


Install

 

netdata 一共有 4 重安裝方式

  1. Linux 64bit pre-built static binary
  2. Binary Packages
  3. Run Netdata in a Docker container
  4. Install Netdata on Linux manually

Static Binary

mkdir /usr/src/netdata

cd /usr/src/netdata

wget https://github.com/netdata/netdata/releases/download/v1.16.0/netdata-v1....

chmod 700 ./netdata-v1.16.0.gz.run

./netdata-v1.16.0.gz.run

systemctl enable netdata

systemctl start netdata

Remark

  • 會安裝在 /opt/netdata
  • 會建立 User: 'netdata', Group: 'netdata'

Checking

netstat -ntlp | grep netdata

tcp        0      0 127.0.0.1:8125          0.0.0.0:*               LISTEN      22515/netdata
tcp        0      0 0.0.0.0:19999           0.0.0.0:*               LISTEN      22515/netdata

Manually

Source: https://github.com/firehol/netdata.git

# Debian / Ubuntu

apt-get install zlib1g-dev uuid-dev libuv1-dev liblz4-dev libjudy-dev libssl-dev libmnl-dev \
 gcc make git autoconf autoconf-archive autogen automake pkg-config curl python

# CentOS / Red Hat Enterprise Linux

yum install autoconf automake curl gcc git make nc pkgconfig python\
 libmnl-devel libuuid-devel openssl-devel libuv-devel lz4-devel Judy-devel zlib-devel

 


Usage

 

Web: http://your.server.ip:19999/

the current charts zooming (SHIFT + mouse wheel over a chart),

the highlighted time-frame (ALT + select an area on a chart),

Auto-detection of data collection sources

This auto-detection process happens only once, when Netdata starts.

Exceptions:

containers and VMs are auto-detected forever

 


Configure

 

Get running config:

http://127.0.0.1:19999/netdata.conf

Config File Location:

/opt/netdata/etc/netdata/netdata.conf

# 建立 config file

wget -O /opt/netdata/etc/netdata/netdata.conf http://localhost:19999/netdata.conf

# CPU & RAM Usage

[global]
  # Enable KSM to half Netdata memory requirement
  history = 3600
  update every = 1

# Memory modes

[global]
  memory mode = save
  cache directory = /var/cache/netdata

ram

data are purely in memory. Data are never saved on disk. (Supports KSM)

save (the default)

data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata restart.

It also uses mmap() and supports KSM.

map

data are in memory mapped files. This works like the swap. (constant write on your disk)

(does not support KSM)

For each chart, Netdata maps the following files:

  • chart/main.db                    # chart information. Every time data are collected for a chart, this is updated.
  • chart/dimension_name.db   # round robin database

none

without a database (collected metrics can only be streamed to another Netdata).

alloc, like ram but it uses calloc() and does not support KSM. (fallback)

dbengine

The data are in database files.

There is some amount of RAM dedicated to data caching and indexing

The number of history entries is not fixed (depends on the configured disk space)

("history" configuration option is meaningless for "memory mode = dbengine")

This is the only mode that supports changing "update_every" without losing the previously stored metrics

Suggest to use this mode on nodes that also run other applications

Database Engine uses direct I/O to avoid polluting the OS filesystem caches

 

The DB engine stores chart metric values in 4k pages in memory.

Each chart dimension gets its own page to store consecutive values generated from the data collectors.

When those pages fill up they are slowly compressed and flushed to disk.

亦即是每 17 min. flush 一次 (4096 / 4 = 1024 sec (dimension: 1s) # 每類 4 kbyte page, 每隻 record 4 bytes)

 

When the disk quota is exceeded the oldest values are removed from the DB engine at real time

Files

ls /opt/netdata/var/cache/netdata/dbengine

datafile-1-0000000001.ndf
journalfile-1-0000000001.njf
datafile-1-0000000002.ndf        # more recent metric data
journalfile-1-0000000002.njf
...

Config

/opt/netdata/etc/netdata/netdata.conf

[global]
    memory mode = dbengine
    # Unit: MiB
    page cache size = 32
    dbengine disk space = 256

 * There is one DB engine instance per Netdata host/node

 * All DB engine instances, for localhost and all other streaming recipient nodes inherit their configuration from netdata.conf

 * There are explicit memory requirements per DB engine instance

File descriptor

The Database Engine may keep a significant amount of files open per instance
(at least 50 file descriptors available per dbengine instance)

systemctl edit netdata

[Service]
LimitNOFILE=65536

Remark: /etc/sysctl.conf: "fs.file-max = 65536"

Performance

OOM Score

[global]
    OOM score = 1000

Netdata runs with OOMScore = 1000

This means Netdata will be the first to be killed when your server runs out of memory.

Scheduling Policy

[global]
  process scheduling policy = idle

By default Netdata runs with the idle process scheduling policy,

so that it uses CPU resources, only when there is idle CPU to spare.

 


Database

 

Ram Usage (for DB)

The default history is 3600 entries,

thus it will need 14.4KB for each chart dimension (4 bytes for the value * the entries of its history)

If you need 1000 dimensions, they will occupy just 14.4MB.

If data collection frequency is set to 1 second. You will have just one hour of data.

KSM

Netdata offers all its round robin database to kernel for deduplication

KSM is a solution that will provide 60+% memory savings to Netdata.

# by default 0; 1 for the kernel to spawn ksmd

echo 1 >/sys/kernel/mm/ksm/run

 


Authentication

 

IP Level ACL

 * best and the suggested way to protect Netdata

   => Expose Netdata only in a private LAN => IP Level

[web]
    bind to = 10.1.1.1:19999 localhost:19999

username & password

Use web server to provide authentication (in front of all your Netdata servers)

Web Server Setting (nginx)

Nginx to forward requests to netdata

HTTP auth file: /etc/nginx/netdata.users

URL: https://your-server/netdata/

# Running netdata as a subfolder to an existing virtual host

server {
    ...
    include /etc/nginx/templates/netdata.tmpl;
}

netdata.tmpl

location = /status {
    return 301 /status/;
}

location ~ /status/(?<ndpath>.*) {
    proxy_redirect off;
    proxy_set_header Host $host;

    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_http_version 1.1;
    proxy_pass_request_headers on;
    proxy_set_header Connection "keep-alive";
    proxy_store off;
    proxy_pass http://127.0.0.1:19999/$ndpath$is_args$args;

    gzip on;
    gzip_proxied any;
    gzip_types *;

    auth_basic "Authentication Required";
    auth_basic_user_file /etc/nginx/netdata.users;
}

 


Netdata registry

 

registry = node menu  (on top left corner of the Netdata dashboards)

目的:

  • enables the Netdata cloud features, such as the node view
  • multiple Netdata are integrated into one distributed application (distributed monitoring)

The registry keeps track of 4 entities:

  • machine_guid: a random GUID generated by each Netdata  (first time it starts)
  • person_guid: the web browsers accessing the Netdata installations (first time it sees a new web browser)
  • URLs of Netdata installations
  • accounts: i.e. the information used to sign-in via one of the available sign-in methods.

Default registry: https://registry.my-netdata.io

Who talks to the registry?

Your web browser only!

Flow

Browser --> netdata

 <-URL to Registry-

Browser --> Registry

Run your own registry

# Server (registry)

* Every Netdata can be a registry

[registry]
    enabled = yes
    registry to announce = http://your.registry:19999
    allow from = 192.168.123.*

Remark

(1) registry 有 DB database: /var/lib/netdata/registry/*.db

  • registry-log.db, the transaction log
  • registry.db, the database

(2) IPs allowed by [registry].allow from should also be allowed by [web].allow connection from.

# Client (netdata)

Advertise it to registry

[registry]
    enabled = no
    registry to announce = http://your.registry:19999

 

 


Web

 

Disable Web

[web]
    mode = none

Threads

# The default number of processor threads is min(cpu cores, 6)

[web]
  web server threads = 4
  web server max sockets = 512

Access lists

Netdata supports access lists in netdata.conf:

[web]
    allow connections from = localhost *
    allow dashboard from = localhost *
    allow badges from = *
    allow streaming from = *
    allow netdata.conf from = localhost
    allow management from = localhost

 


Plugins

 

internal data collection plugins (running inside the Netdata daemon)

external data collection plugins (independent processes, sending data to Netdata over pipes)

modular plugin orchestrators (external plugins that have multiple data collection modules)

netdata.conf

[plugin:XXX]
    ...

 


Health monitoring

 

Default alarms shipped with Netdata.

/opt/netdata/etc/netdata/netdata.conf

[health]
    enabled = yes

/opt/netdata/etc/netdata/edit-config health_alarm_notify.conf

SEND_EMAIL="YES"

 


Anonymous Statistics

 

Starting with v1.12 Netdata also collects anonymous statistics on certain events for:

To opt-out from sending anonymous statistics

touch /opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics

log

/opt/netdata/var/log/netdata/error.log

2019-08-29 17:32:14: netdata INFO  : MAIN :
 /opt/netdata/usr/libexec/netdata/plugins.d/anonymous-statistics.sh 'EXIT' 'OK' '-'

anonymous-statistics.sh

if [ -f "/opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics" ]; then
        exit 0
fi

 


Central Netdata server (streaming)

 

Netdata slaves streaming metrics to upstream Netdata servers,

use exactly the same protocol local plugins use.