最後更新: 2021-02-10
介紹
DRDB = Distributed Replicated Block Device = network based raid-1
HomePage: https://www.linbit.com/drbd/
特點
* 支援 Checksum-based synchronization (所以支援 online verification)
It is a common use case to have on-line verification managed by the local cron daemon
verification source \|/ calculating a cryptographic digest(MD5, SHA-1, or CRC-32C) of every block (Linux kernel crypto API provides these) DRBD transmits just the digests, not the full blocks, online verification uses network bandwidth very efficiently. \|/ verification target
* 支援 Fixed-rate / Variable-rate synchronization (Default)
(Variable-rate: DRBD detects the available bandwidth on the synchronization network)
* Synchronously or Asynchronously (Local / Remote)
* Transparently (其他 Program 不知它的存在)
* Driver code in the kernel (Linux kernel's block layer)
* User space tools(drbdadm -call-> drbdsetup -> kernel driver)
Version
Linux DRBD
3.0 - 3.4 8.3.11
Hardware 要求
Disk system 要支援 disk flush, 如
- SATA Devices
- SCSI
- MD
- LVM2
如果 disk 有 battery-backed cache 那可以考慮停用 device flushes 去提升 Performance
Feature
V 9
- RDMA is now integrated into DRBD9 ( reducing CPU load by 50% )
- More than 2-way redundancy
- Automatic Promotion of Resources
V 8.4
- Defaults to masking I/O errors
- Defaults Variable-rate synchronization
-
multiple volumes (each corresponding to one block device)
-> A single resource may correspond to multiple volumes
-> may share a single replication connection
lrwxrwxrwx 1 root root 11 2011-07-04 09:22 /dev/drbd/by-res/nfs-root/0 -> ../../drbd2 lrwxrwxrwx 1 root root 11 2011-07-04 09:22 /dev/drbd/by-res/nfs-root/1 -> ../../drbd3
- Boolean configuration options
no-md-flushes -> md-flushes no
- syncer section no longer exists
syncer{} 的設定分到 disk{} 及 net{}
- 在 resource{} 新增了 options{}
- replication protocol can be changed on the fly
# temporarily switch a connection to asynchronous replication
drbdadm net-options --protocol=A <resource>
- "--force" option to kick off an initial resource synchronization
drbdadm primary --force <resource>
- reduce bitmap IO during resync
- Align the internal object model with drbd-9.0
- New on disk format for the AL: double capacity; 4k aligned IO
- Multiple AL changes in a single transaction
V 8.3.10
- bandwidth management for mirroring over long distance
V 8.3.2
- Resource level fencing script, using Pacemaker's constraints
- OCF Heartbeat/Pacemeker resource agent
- Support for floating peers
- compression of the bitmap exchange
V 8.3.0
- off-site node for disaster recovery
- Existing file systems can be integrated into new DRBD setups
- (add redundancy to existing deployments)
- Heartbeat integration to outdate peers with broken replication links
- (avoids switchovers to stale data)(Short resynchronization)
- Optional data digests to verify the data transfer over the network
- Integration scripts for LVM to automatically take a snapshot before a node becomes the target of a resynchronization
- Online data verification
- Configurable handler scripts for various DRBD events
- Dual primary support for use with GFS/OCFS2
- Automatic detection of the most up-to-date data after complete failure
- Automatic recovery after node, network, or disk failures
Centos7 Installation
Install
# 要安裝 elrepo version, 因為 epel 沒有 Kernel Module
yum install elrepo-release -y
yum install kmod-drbd90 drbd90-utils drbd90-utils-sysvinit # V 9.0
yum install kmod-drbd84 drbd84-utils drbd84-utils-sysvinit # V 8.4
Package
drbd-utils # V9
- drbdadm ( drbdsetup 與 drbdmeta 的 font-end )
- drbdmon ( Monitor DRBD resources realtime )
- drbdmeta ( create, dump, restore, and modify DRBD's meta data )
- drbdsetup ( low-level tool 來, 用來控制 kernel module )
設定檔:
- /etc/drbd.conf
- /etc/drbd.d/global_common.conf
- /etc/drbd.d/*.res
Cluster Package
- drbd-pacemaker # Pacemaker resource agent for DRBD
- drbd-rgmanager # Red Hat Cluster Suite agent for DRBD
Enable Service
systemctl start drbd
lsmod | grep drbd # drbd 後就會載入
cat /proc/drbd # 載入 drbd 後就會有
systemctl enable drbd
DRBD device
Path: /dev/drbd<minor>
ls -l /dev/drbd*
major:minor
147:N
Role
Single-primary mode (canonical approach for high availability)
在整個 Cluster 內只有 1 個 primary node
(only one cluster node manipulates the data at any moment)
primary role - unrestrictedly for read and write
secondary role - can not be used by applications, neither for read nor write access.
* 可以用任何 file system (ext3, ext4, XFS etc.)
Dual-primary mode (load-balancing clusters)
必須要是以下其中一種 File System 才可以用 Dual primary
- GFS
- OCFS2 (distributed lock manager)
* Dual primary disabled by default
Replication modes
- Protocol A Asynchronous replication protocol
-
Protocol B Memory synchronous (semi-synchronous)
replication packet has reached the peer node - Protocol C Synchronous replication protocol (fully synchronous)
Protocol A 的優化
原本: write --> local buffer (sndbuf-size)
優化: write --> DRBD Proxy --> secondary node (加入了 proxy 去優化)
Inconsistent and Outdated data
Inconsistent data
data that cannot be expected to be accessible and useful in any manner.
Outdated data
data on a secondary node that is consistent, but no longer in sync with the primary node.
(disconnected secondary node is expected to be clean)
Synchronization & Replication
replication occurs on any write event to a resource in the primary role
Synchronization is necessary if the replication link has been interrupted for any reason
(interruption of the replication link)
* issuing on one node while <resource> is still in the primary role on the peer will result in an error.
Split Brain 問題
情況:
Both nodes switched to the primary role
修復方式:
- Discarding modifications made on the “younger” primary.
- Discarding modifications made on the “older” primary.
- Discarding modifications on the primary with fewer changes.
- Graceful recovery from split brain if one host has had no intermediate changes.
- (unlikely scenario. Even if both hosts only mounted the file system on the DRBD block device (even read-only))
Doc
man drbd.conf
man drbdadm
https://www.linbit.com/drbd-user-guide/drbd-guide-9_0-en/
https://linbit.com/drbd-user-guide/users-guide-drbd-8-4/