最後更新: 2019-08-13


real-time performance and health monitoring solution

Designed to:

  • Solve the centralization problem of monitoring (Scales to infinity)
  • Replace the console for performance troubleshooting



  • 1s granularity
  • Zero disk I/O (預設所有資料放在 RAM)


19999/tcp                  # Web

8125/TCP, 8125/UDP  # statsd


Collect -> Store -> Stream -> Archive

positive / negative values

positive values

read, input, inbound, received

negative values

write, output, outbound, sent

monitoring agent

  • metrics collector
  • time-series database
  • metrics visualizer
  • alarms notification engine

Security Design

  • Netdata daemon runs as a normal system user
  • plugins perform a hard coded data collection job
  • plugins & Netdata slaves unidirectional: from the plugin towards the Netdata daemon
  • dashboards are read-only
  • data do not leave the server where they are collected
  • Netdata servers do not talk to each other
  • your browser connects all the Netdata servers


  • Installation
  • Anonymous Statistics
  • Upgrade
  • Usage
  • Configure
  • Database
  • Authentication
  • Netdata registry
  • Web Setting
  • Plugins
  • Disable IPv6
  • Health monitoring
  • Stopping notifications for individual alarms
  • Central Netdata server (streaming)
     + statsd
  • Database Queries
  • Export and import a snapshot
  • Performance Tuning
  • Netdata 的自身 Info.



netdata 一共有 4 重安裝方式

  1. Linux 64bit pre-built static binary
  2. Binary Packages
  3. Run Netdata in a Docker container
  4. Install Netdata on Linux manually

Static Binary

mkdir /usr/src/netdata

cd /usr/src/netdata


chmod 700 ./


systemctl enable netdata

systemctl start netdata


  • 會安裝在 /opt/netdata
  • 會建立 User: 'netdata', Group: 'netdata'


netstat -ntlp | grep netdata

tcp        0      0*               LISTEN      22515/netdata
tcp        0      0 *               LISTEN      22515/netdata

Manually(用 Static Binary 方便很多)


# Debian / Ubuntu

apt-get install zlib1g-dev uuid-dev libuv1-dev liblz4-dev libjudy-dev libssl-dev libmnl-dev \
 gcc make git autoconf autoconf-archive autogen automake pkg-config curl python

# CentOS / Red Hat Enterprise Linux

yum install autoconf automake curl gcc git make nc pkgconfig python\
 libmnl-devel libuuid-devel openssl-devel libuv-devel lz4-devel Judy-devel zlib-devel


Anonymous Statistics


Starting with v1.12 Netdata also collects anonymous statistics on certain events

To opt-out from sending anonymous statistics

touch /opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics



2019-08-29 17:32:14: netdata INFO  : MAIN :
 /opt/netdata/usr/libexec/netdata/plugins.d/ 'EXIT' 'OK' '-'


if [ -f "/opt/netdata/etc/netdata/.opt-out-from-anonymous-statistics" ]; then
        exit 0




chmod 700 ./

# 過程會自動 stop / start netdata

./ --accept




Web: http://your.server.ip:19999/

the current charts zooming (SHIFT + mouse wheel over a chart),

the highlighted time-frame (ALT + select an area on a chart),

Auto-detection of data collection sources

This auto-detection process happens only once, when Netdata starts.


containers and VMs are auto-detected forever




Get running config:

Config File Location:


# 建立 config file

wget -O /opt/netdata/etc/netdata/netdata.conf http://localhost:19999/netdata.conf

# CPU & RAM Usage

  # Enable KSM to half Netdata memory requirement
  history = 3600
  update every = 1

# Memory modes

  • ram
  • alloc
  • save
  • map
  • none
  • dbengine
  memory mode = save
  cache directory = /var/cache/netdata


data are purely in memory. Data are never saved on disk. (Supports KSM)


like ram but it uses calloc() and does not support KSM. (fallback)

save (the default)

Data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata restart.

It also uses mmap() and supports KSM.


data are in memory mapped files. This works like the swap. (constant write on your disk)

(does not support KSM)

For each chart, Netdata maps the following files:

  • chart/main.db                    # chart information. Every time data are collected for a chart, this is updated.
  • chart/dimension_name.db   # round robin database


without a database (collected metrics can only be streamed to another Netdata)


The data are in database files.


ls /opt/netdata/var/cache/netdata/dbengine

datafile-1-0000000002.ndf        # more recent metric data

There is some amount of RAM dedicated to data caching and indexing

# Unit: MiB
page cache size = 32

The number of history entries is not fixed (depends on the configured disk space)

"history" configuration option is meaningless for "memory mode = dbengine"

"dbengine" is the only mode that supports changing "update_every" without losing the previously stored metrics

Suggest to use this mode on nodes that also run other applications

Database Engine uses direct I/O to avoid polluting the OS filesystem caches

# Unit: MiB
dbengine disk space = 256


The DB engine stores chart metric values in 4k pages in memory.

Each chart dimension gets its own page to store consecutive values generated from the data collectors.

When those pages fill up they are slowly compressed and flushed to disk.

 => 亦即是每 17 min. flush 一次

    # 每類 chart 的 cache = 4 kbyte, 每隻 record 4 bytes, 在每秒 get 一次的情況下, 1024 秒就 full

    4096 / 4 = 1024 sec (dimension: 1s)

When the disk quota is exceeded the oldest values are removed from the DB engine at real time

 * When we query the DB engine for data

    => trigger disk read I/O requests that fill the Page Cache with the requested pages

 * The Database Engine uses direct I/O to avoid polluting the OS filesystem caches.




    memory mode = dbengine
    # Unit: MiB
    page cache size = 32
    dbengine disk space = 256

 * There is one DB engine instance per Netdata host/node

 * All DB engine instances, for localhost and all other streaming recipient nodes inherit their configuration from netdata.conf

 * There are explicit memory requirements per DB engine instance

File descriptor

The Database Engine may keep a significant amount of files open per instance

(at least 50 file descriptors available per dbengine instance)

systemctl edit netdata


Remark: /etc/sysctl.conf: "fs.file-max = 65536"


OOM Score

    OOM score = 1000

Netdata runs with OOMScore = 1000

This means Netdata will be the first to be killed when your server runs out of memory.

Scheduling Policy

  process scheduling policy = idle

By default Netdata runs with the idle process scheduling policy,

so that it uses CPU resources, only when there is idle CPU to spare.




Ram Usage (for DB)

The default history is 3600 entries,

thus it will need 14.4KB for each chart dimension (4 bytes for the value * the entries of its history)

If you need 1000 dimensions, they will occupy just 14.4MB.

If data collection frequency is set to 1 second. You will have just one hour of data.


Netdata offers all its round robin database to kernel for deduplication

KSM is a solution that will provide 60+% memory savings to Netdata.

# by default 0; 1 for the kernel to spawn ksmd

echo 1 >/sys/kernel/mm/ksm/run




IP Level ACL

 * best and the suggested way to protect Netdata

   => Expose Netdata only in a private LAN => IP Level

    bind to = localhost:19999

username & password

Use web server to provide authentication (in front of all your Netdata servers)

Web Server Setting (nginx)

Nginx to forward requests to netdata

HTTP auth file: /etc/nginx/netdata.users

URL: https://your-server/netdata/

# Running netdata as a subfolder to an existing virtual host

server {
    include /etc/nginx/templates/netdata.tmpl;


location = /status {
    return 301 /status/;

location ~ /status/(?<ndpath>.*) {
    proxy_redirect off;
    proxy_set_header Host $host;

    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_http_version 1.1;
    proxy_pass_request_headers on;
    proxy_set_header Connection "keep-alive";
    proxy_store off;

    gzip on;
    gzip_proxied any;
    gzip_types *;

    auth_basic "Authentication Required";
    auth_basic_user_file /etc/nginx/netdata.users;


Netdata registry


registry = node menu  (on top left corner of the Netdata dashboards)


  • enables the Netdata cloud features, such as the node view
  • multiple Netdata are integrated into one distributed application (distributed monitoring)

The registry keeps track of 4 entities:

  • machine_guid: a random GUID generated by each Netdata  (first time it starts)
  • person_guid: the web browsers accessing the Netdata installations (first time it sees a new web browser)
  • URLs of Netdata installations
  • accounts: i.e. the information used to sign-in via one of the available sign-in methods.

Default registry:

Who talks to the registry?

Your web browser only!


Browser --> netdata

 <-URL to Registry-

Browser --> Registry

Run your own registry

# Server (registry)

* Every Netdata can be a registry

    enabled = yes
    registry to announce = http://your.registry:19999
    allow from = 192.168.123.*


(1) registry 有 DB database: /var/lib/netdata/registry/*.db

  • registry-log.db, the transaction log
  • registry.db, the database

(2) IPs allowed by [registry].allow from should also be allowed by [web].allow connection from.

# Client (netdata)

Advertise it to registry

    enabled = no
    registry to announce = http://your.registry:19999



Web Setting


Disable Web Dashboard

    mode = none


# The default number of processor threads is min(cpu cores, 6)

  web server threads = 4
  web server max sockets = 512

Access lists

Netdata supports access lists in netdata.conf:

    allow connections from = localhost *
    allow dashboard from = localhost *
    allow badges from = *
    allow streaming from = *
    allow netdata.conf from = localhost
    allow management from = localhost


allow badges from

checks if the API request is for a badge. Badges are not matched by allow dashboard from.

allow netdata.conf from

checks the IP to allow

IPs allowed by allow netdata.conf from should also be allowed by allow connections from




Internal, External, Modular plugins

  • Internal data collection plugins (running inside the Netdata daemon)
  • External data collection plugins (independent processes, sending data to Netdata over pipes)
  • Modular plugin orchestrators (external plugins that have multiple data collection modules)


Disable a plug-in

在 config folder node.d python.d ..

    proc = yes
    diskspace = yes
    node.d = yes

Per plug-in setting

        # update every = 5


Disable IPv6


# per plugin configuration
  /proc/net/sockstat6 = no
  /proc/net/snmp6 = no


 ipv6 TCP sockets = no
 ipv6 UDP sockets = no
 ipv6 UDPLITE sockets = no
 ipv6 RAW sockets = no
 ipv6 FRAG sockets = no
# filename to monitor = /proc/net/sockstat6


Disks Setting


Disabling performance metrics for individual device

    enable performance metrics = no

# Disks 分類內其他 items

    exclude space metrics on paths = /tmp /dev/* /run/* /var/*
    exclude space metrics on filesystems = *sshfs fusectl autofs


Health monitoring


# Enable monitoring


    enabled = yes

/opt/netdata/etc/netdata/edit-config health_alarm_notify.conf


 * Default alarms shipped with Netdata.

Alerm to multi mailbox


# to receive only critical alarms, set it to "root|critical"
# 沒有 "to:" 時發比誰


# Alert "to: sysadmin" 時發 mail 比誰

Testing Notifications

# become user netdata

su -s /bin/bash netdata

# enable debugging info on the console


# send test alarms to sysadmin

/opt/netdata/usr/libexec/netdata/plugins.d/ test

--- BEGIN sendmail command ---
/usr/sbin/sendmail -t
--- END sendmail command ---
2021-01-07 18:23:17: 
 INFO: sent email notification for: test.chart.test_alarm is CLEAR to ''
# OK

Stop notifications for individual alarms (silencing the alarm)

Step1: Find the alarm configuration file



Step2: Edit the file to enable silencing

to: sysadmin


to: silent


# NIC Full Loading

    alarm: 5m_sent_traffic_overflow
       on: net.ens192
       os: linux
    hosts: *
 families: *
   #lookup: average -5m unaligned absolute of received
   lookup: average -5m unaligned absolute of sent
     calc: ($interface_speed > 0) ? ($this * 100 / (100 * 1000)) : ( nan )
    units: %
    every: 60s
     warn: $this > (($status >= $WARNING)  ? (80) : (85))
     crit: $this > (($status == $CRITICAL) ? (85) : (90))
    delay: down 1m multiplier 1.5 max 1h
     info: interface sent bandwidth usage over net device speed max
       to: sysadmin

Value in alerm mail

[ $this = 94.079845 ] [ $status = 1 ] [ $CRITICAL = 4 ]

# CPU Usage

template: 10min_cpu_usage
      on: system.cpu
      os: linux
   hosts: *
  lookup: average -10m unaligned of user,system,softirq,irq,guest
   units: %
   every: 1m
    warn: $this > (($status >= $WARNING)  ? (75) : (85))
    crit: $this > (($status == $CRITICAL) ? (85) : (95))
   delay: down 15m multiplier 1.5 max 1h
    info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal)
      to: sysadmin

# 用 variable

 template: 1m_received_traffic_overflow
       os: linux
    hosts: *
 families: *
   lookup: average -1m unaligned absolute of received
     # $interface_speed 在之前的 template 定義出來
     calc: ($interface_speed > 0) ? ($this * 100 / ($interface_speed * 1000)) : ( nan )
    units: %
    every: 10s
     warn: $this > (($status >= $WARNING)  ? (80) : (85))
     crit: $this > (($status == $CRITICAL) ? (85) : (90))
    delay: down 1m multiplier 1.5 max 1h
     info: interface received bandwidth usage over net device speed max
       to: sysadmin

alarm vs template


It attached to specific charts and use the alarm label. (net.eth0)

Alarms have higher precedence and will override templates.

If an alarm and template entity have the same name and attach to the same chart, Netdata will use the alarm.

Need to find the context? Hover over the date on any given chart and look at the tooltip.


define rules that apply to all charts of a specific context(, and use the template label.

Templates help you apply one entity to all disks, all network interfaces, all MySQL databases, and so on.



Which chart the entity listens to


This line makes a database lookup to find a value. This result of this lookup is available as $this



  one of average, min, max, sum, incremental-sum


    Calculate the average of all the metrics collected.


    Clarify that we're calculating a percentage of RAM usage.

    of used: Specify which dimension (used) on the system.ram chart you want to monitor with this entity.


a relative number of seconds, but it also accepts a single letter for changing the units,

like -1s = 1 second in the past, -1m = 1 minute in the past, -1h = 1 hour in the past


space separated list of percentage, absolute, min2max, unaligned, match-ids, match-names


lookup: average -10m unaligned of user,system,softirq,irq,guest


"calc:" 回來的值的 units


How often to perform the lookup calculation to decide whether or not to trigger this alarm.


The value at which Netdata should trigger a warning or critical alarm.



  warn: $this > 80
  crit: $this >= 90

conditional evaluation operator "?"

The conditional evaluation operator ? is supported too.

Using this operator IF-THEN-ELSE conditional statements can be specified.

The format is: (condition) ? (true expression) : (false expression).


warn: $this > (($status >= $WARNING)  ? (75) : (85))
crit: $this > (($status == $CRITICAL) ? (85) : (95))

If the value is constantly varying between 80 and 90,
then it will trigger a warning the first time it goes above 85,
but will remain a warning until it goes below 75 (or goes above 85).

If the value is constantly varying between 90 and 100,
then it will trigger a critical alert the first time it goes above 95,
but will remain a critical alert goes below 85
(at which point it will return to being a warning).


instead of returning the value, calculate the percentage of the sum of the selected dimensions,
versus the sum of all the dimensions of the chart. This also sets the units to %.

absolute or abs, turn all values positive and then sum them.

min2max, when multiple dimensions are given, do not sum them, but take their max - min

special variables

$this, which is resolved to the value of the current alarm.

$status, which is resolved to the current status of the alarm

  This values can be compared with


  These values are incremental, ie. $status > $CLEAR works as expected.

$now, which is resolved to current unix timestamp.


when data are reduced / aggregated (e.g. the request is about the average of the last minute, or hour),
Netdata by default aligns them so that the charts will have a constant shape
(so average per minute returns always XX:XX:00 - XX:XX:59).
Setting the unaligned option, Netdata will aggregate data without any alignment,
so if the request is for 60 seconds, it will aggregate the latest 60 seconds of collected data.


A calculation to apply to the value found via lookup or another variable.


Set the green and red thresholds of a chart.

Both are available as $green and $red in expressions.

These will eventually visualized on the dashboard.


The script to execute when the alarm changes status.


Format: repeat: [off] [warning DURATION] [critical DURATION]

The interval for sending notifications when an alarm is in WARNING or CRITICAL mode.
This will override the default interval settings inherited from health settings in netdata.conf
(default repeat warning = DURATION and default repeat critical = DURATION)
Use 0s to turn off the repeating notification for WARNING / CRITICAL mode.


repeat: warning 600s critical 600s


delay: [[[up U] [down D] multiplier M] max X]

up U

defines the delay to be applied to a notification for an alarm that raised its status (i.e. CLEAR to WARNING, CLEAR to CRITICAL, WARNING to CRITICAL). For example, up 10s, the notification for this event will be sent 10 seconds after the actual event. This is used in hope the alarm will get back to its previous state within the duration given. The default U is zero.

mutliplier M

multiplies U and D when every time an alarm changes state, while a notification is delayed.

The default multiplier is 1.0.


delay: down 15m multiplier 1.5 max 1h


A description of the alarm, which will appear in the dashboard and notifications.

Reload health configuration

To make any changes to your health configuration live, you must reload Netdata's health monitoring system.

To do that without restarting all of Netdata, run the following:

killall -USR2 netdata


netdatacli reload-health


Stopping notifications for individual alarms



cd /opt/netdata/etc/netdata

./edit-config health.d/btrfs.conf        # call nano to create  health.d/btrfs.conf

# To silence this alarm, change sysadmin to silent.

to: silent

# reload

killall -USR2 netdata


Central Netdata server (streaming)


Netdata slaves streaming metrics to upstream Netdata servers(statsd),

use exactly the same protocol local plugins use.



Port: 8125/TCP, P8125/UDP

statsd is a system to collect data from any application.

Applications are sending metrics to it, usually via non-blocking UDP communication,

and statsd servers collect these metrics,

perform a few simple calculations on them and push them to backend time-series databases.

 * Netdata is a fully featured statsd server.

 * Netdata statsd is inside Netdata (an internal plugin, running inside the Netdata daemon)

Disable statsd

  enabled = no


Database Queries


API: /api/v1/data and /api/v1/badge.svg

after and before define a time-frame, accepting:


  • html
  • csv

curl -Ss ''


The number of points to be returned.

If not given, the result will have the same granularity as the database


The grouping method to use when reducing the points the database has.

If not given, it defaults to average.


Only 2 options are used by the query engine: unaligned and percentage.

All the other options are used by the output formatters.

The default is to return aligned data.

Export and import a snapshot


Snapshots can be incredibly useful for diagnosing anomalies after they've already happened.

Let's say Netdata triggered an alarm while you were sleeping.

The generated snapshot will include all charts of this dashboard, for the visible timeframe

To export a snapshot, click on the export icon.

The snapshot will be downloaded as a file, to your computer,

that can be imported back into any netdata dashboard (no need to import it back on this server).


Performance Tuning



    enable gzip compression = no

# Disable logs

you're not actively auditing Netdata's logs, disable them in netdata.conf

    # debug log = /opt/netdata/var/log/netdata/debug.log
    # error log = /opt/netdata/var/log/netdata/error.log
    # access log = /opt/netdata/var/log/netdata/access.log
    debug log = none
    error log = none
    access log = none


Netdata 的自身 Info.


How many metrics, on average, do your Agents collect?

Dashboard 的右下角有寫

Every 5 seconds, Netdata collects 3,387 metrics on home, 
presents them in 654 charts and monitors them with 268 alarms.

What is your compression savings ratio?

Search "dbengine_compression_ratio" on dashboard (Netdata Monitoring / dbengine)

Typical compression ratio of 80%

Disk Usage (1 day)


多久收集 metrics (update every) = 5
每次收集幾多 metrics = 2000
每 metrics 佔用空間 = 4 bytes
 => (2000 * 3600 * 24 * 4 / 1)/(1024^2) = 659 MB