最後更新: 2024-10-23
目錄
- Pool
- Pool Import / Export
- Basic Recovering (online/offline/clear/replace)
- ZFS Filesystem(Dataset)
- 設立 Quota
- Snapshot
-
Space Usage
- REFER Lab - Clone
- Replication & Incremental Backups (send, receive)
- Volume
- Memory Usage(ARC)
- Module Parameters
- zed
- N copies of each data block
- Add cache and log to an existing pool
- ZFS Special Device
- RAID-Z pool
- Virtual Devices (vdevs)
- Online Resize Pool
- Growing a Pool
- CLI History
- dRAID
- Healing Resilver (=self healing)
- Troubleshoot
- User Property
- case-insensitive
- zfs-fuse
Pool
建立普通 Pool (單碟)
zpool create PoolName Device1 Device2 ...
e.g.
zpool create MyPool /dev/vdb1
建立 Mirrored Pool (雙碟)
Usage:
zpool create PoolName mirror Disk1 Disk2 [Spare_Disk3]
i.e.
cd /home/zfs_test
dd if=/dev/zero of=disk1.img bs=1M count=100
dd if=/dev/zero of=disk2.img bs=1M count=100
zpool create MyMirrorPool mirror /home/zfs_test/disk1.img /home/zfs_test/disk2.img
* 它會自動 mount 到 /MyMirrorPool
mount | grep zfs
MyMirrorPool on /MyMirrorPool type zfs (rw,xattr,noacl)
查看有什麼 Pool
zpool list [PoolName]
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT MyMirrorPool 80M 108K 79.9M - - 3% 0% 1.00x ONLINE -
* 由於係 mirror pool 所以 size 係得一半
Pool status
zpool status [PoolName]
ie.
zpool status MyMirrorPool
pool: MyMirrorPool state: ONLINE config: NAME STATE READ WRITE CKSUM MyMirrorPool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /home/zfs_test/disk1.img ONLINE 0 0 0 /home/zfs_test/disk2.img ONLINE 0 0 0 errors: No known data errors
Listing All Properties for a Pool
zpool get all PoolName
e.g.
zpool get all MyPool | less
NAME PROPERTY VALUE SOURCE MyPool size 9.50G - MyPool capacity 0% - MyPool altroot - default MyPool health ONLINE - MyPool guid 1233692729511501900 - ...
刪除 Pool
zpool destroy PoolName # 沒有 confirm 的, 要小心執行
Add disk to a mirrored Pool
# 必須 disk pair
zpool add PoolName mirror disk3 disk4
i.e.
dd if=/dev/zero of=/home/zfs_test/disk3.img bs=1M count=100
zpool add MyMirrorPool mirror /home/zfs_test/disk3.img
invalid vdev specification: mirror requires at least 2 devices
dd if=/dev/zero of=/home/zfs_test/disk4.img bs=1M count=100
zpool add MyMirrorPool mirror /home/zfs_test/disk3.img /home/zfs_test/disk4.img
zpool status MyMirrorPool
pool: MyMirrorPool state: ONLINE config: NAME STATE READ WRITE CKSUM MyMirrorPool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /home/zfs_test/disk1.img ONLINE 0 0 0 /home/zfs_test/disk2.img ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 /home/zfs_test/disk3.img ONLINE 0 0 0 /home/zfs_test/disk4.img ONLINE 0 0 0 errors: No known data errors
zpool list MyMirrorPool
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT MyMirrorPool 160M 111K 160M - - 2% 0% 1.00x ONLINE -
iostat
zpool iostat [MyPool] [interval [count]]
capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- MyPool 97.5K 9.50G 0 3 6.53K 44.0K
zpool iostat 5
capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- maildata 5.66G 143G 0 45 1.29K 2.94M maildata 5.72G 143G 3 141 26.5K 12.5M ...
zpool iostat -v [MyPool]
capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- MyPool 97.5K 9.50G 0 2 3.96K 26.7K mirror 97.5K 9.50G 0 2 3.96K 26.7K sda1 - - 0 1 1.98K 13.3K sdb1 - - 0 1 1.98K 13.3K ---------- ----- ----- ----- ----- ----- -----
Pool Import / Export
Export a Pool
# Exports the given pools from the system.
export [-f] <pool>
* pool 被 export 後執行 zpool status 只有 "no pools available"
Import a Pool
Lists pools available to import
import [-d dir|device][-c cachefile]
* If the -d -or -c options are not specified, searches for devices using libblkid on Linux
- -d DIR|Device # device / directories are searched
- -c cachefile # Reads configuration from the given cachefile (instead of searching for devices)
e.g.
zpool import
pool: lxc id: 1289429485874501839 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: lxc ONLINE sda1 ONLINE
Import a pool
zpool import PoolName
* pool 要成功 import 後才可 "zpool status" 看到
e.g.
zpool import lxc
Notes: force import
當在另一個 system 沒有 "export" 時, 到另一 system "import" 會出 Error
那時就要 force import
zpool import -f lxc
Import ALL without mount it
zpool import -a -N
- -a # Searches for and imports all pools found.
- -N # Import the pool without mounting any file systems.
- -o mntopts # Comma-separated list of mount options to use when mounting datasets within the pool.
Import & Rename
zpool import pool [newpool]
Import service
zfs-import-cache.service enabled zfs-import-scan.service disabled
systemctl show zfs-import-cache | grep ExecStart
zfs-import-cache
相當於
zpool import -c /etc/zfs/zpool.cache -aN
zfs-import-scan
相當於
zpool import -aN -o cachefile=none
The cachefile property
- This property controls where pool configuration information is cached.
- All pools in the cache are automatically imported when the system boots.
- File will be automatically updated when your pool configuration is changed
Generating a new cache file(/etc/zfs/zpool.cache)
zpool set cachefile=/etc/zfs/zpool.cache tank
Disabled by setting
zpool set cachefile=none tank
Force a check (Scrubbing)
# Examines all data in the specified pools
zpool scrub [PoolName]
# Current / Last scrub status
zpool status
# Stop the scrub
zpool scrub -s
心得: 設定 weekly schedule 閒時 scrubbing
Remark: sysctl.conf
vfs.zfs.scan_idle
Number of milliseconds since the last operation before considering the pool is idle.
ZFS disables the rate limiting for scrub and resilver when the pool is idle.
detached and attach
Increases or decreases redundancy by attach-ing or detach-ing a device on an existing vdev (virtual device)
attach
# Attaches new device to an existing zpool device
# If device is not currently part of a mirrored configuration, device automatically transforms into a two-way mirror
# If device is part of a two-way mirror, attaching new device creates a three-way mirror
zpool attach [-s] [-o property=value] pool device new_device
Opts
-s The new_device is reconstructed sequentially to restore redundancy as quickly as possible.
(scrub is started when the resilver completes)
ie. zfs convert solo disk to mirror
zpool status
pool: lxc state: ONLINE scan: resilvered 2.50G in 00:05:20 with 0 errors on Thu Jan 6 17:28:53 2022 config: NAME STATE READ WRITE CKSUM lxc ONLINE 0 0 0 sdb1 ONLINE 0 0 0 errors: No known data errors
zpool attach lxc sdb1 /dev/sda1
detached
zpool detach <MyPool> /home/storage/fault_disk
Add / Remove device
# Adds the specified vdev(virtual devices) to the given pool.
zpool add [-fn] pool vdev ...
應用:
- add spares disk
- add cache device
# Remove: currently only supports removing hot spares, cache, and log devices
zpool remove pool device ...
Add vs Attach Remark
* Adding vdevs provides higher performance by distributing writes across the vdevs.
(ZFS stripes data across each of the vdevs. (two mirror vdevs = RAID 10))
* Each vdev provides its own redundancy level
* ZFS allocates space so that each vdev reaches 100% full at the same time.
* Adding a non-redundant vdev to a pool containing mirror or RAID-Z vdevs risks the data on the entire pool.
Other CLI
zpool initialize pool [device...]
This command writes a pattern of data to all unallocated regions
The default data pattern that is written is 0xdeadbeefdeadbeef
zpool sync [pool]...
zpool sync [pool ...]
forces all in-core dirty data to be written to the primary pool storage
Basic Recovering (online/offline/clear/replace)
# -t Temporary. Upon reboot, the specified physical device reverts to its previous state
zpool online [-t] <pool> <device1> [device2 ...]
zpool offline [-t] <pool> <device1> [device2 ...] # Takes the specified physical device offline
zpool clear [-nF] <pool> [device]
zpool replace [-s] <pool> [old_device new_device]
replace
# Replaces an existing device (which may be faulted) with a new one
* new_device is required if the pool is not redundant.
* If new_device is not specified, it defaults to old_device
(useful after an existing disk has failed and has been physically replaced.)
ie.
當壞的 HDD 更換後, 執行 zpool replace MyPool
Opts
-s The new_device is reconstructed sequentially to restore redundancy as quickly as possible.
clear
Clears device errors log in a pool.
If no devices are specified, all device errors within the pool are cleared.
zpool clear [-F -n] pool [devices]
Opts
-F Initiates recovery mode for an unopenable pool.
(Attempts to discard the last few transactions in the pool to return it to an openable state)
-n Used in combination with the -F flag (not actually discard any transactions)
Example
1) 查看 Pool 是否有問題
# -x Only display status for pools that are exhibiting errors
zpool status -x
all pools are healthy
2) 在 hdd 係半死的情況
zpool offline MyPool vdc1
3) 在沒有 hotplug 的情況
Power down the computer and replace vdc1
4)
zpool replace MyPool vdc1
5) After rebuild complete
zpool clear MyPool
ZFS Filesystem
FS = datasets
每個 dataset 都有自己的 Propery. (compression, deduplication, caching, quotas, mount point, network sharing, readonly)
* child datasets will inherit properties from their ancestors
List Dataset Command
zfs list
NAME USED AVAIL REFER MOUNTPOINT
MyPool 97.5K 9.20G 24K /MyPool
* By default, a ZFS file system is automatically mounted when it is created.
* The mountpoint property is inherited.
* The used space (USED column) is hierarchal. 所以用了幾多空間是看 REFER
建立/刪除 FS
建立 FS
zfs create Pool_NAME/FS_NAME
i.e.
zfs create MyPool/RockyMirror
zfs list MyPool/RockyMirror
NAME USED AVAIL REFER MOUNTPOINT MyPool/RockyMirror 24K 1.76T 24K /MyPool/RockyMirror
刪除 FS
zfs destroy Pool_Mount_Point/FS_NAME
Change Pool default mountpoint
Default: /PoolName
i.e.
zfs set mountpoint=/path/to/folder mypool
Change Dataset mountpoint
At Dataset Create Time
zfs create -o mountpoint=/lxc MyPool/lxc
zfs list
NAME USED AVAIL REFER MOUNTPOINT MyPool 189K 9.20G 24K /MyPool MyPool/lxc 24K 9.20G 24K /lxc
After Dataset Created
- zfs mount # Displays all ZFS file systems currently mounted.
- zfs unmount Dataset
- zfs set mountpoint=/Path/To/New-Mount-Point Dataset
i.e. 修改 mount point 的位置
zfs create MyPool/lxc
zfs mount MyPool/lxc # check current mountpoint location
MyPool/lxc /MyPool/lxc
zfs unmount MyPool/lxc
zfs set mountpoint=/var/lib/lxc MyPool/lxc
zfs mount -a # Mount all available ZFS file systems.
zfs mount MyPool/lxc # Verify new mountpoint location
MyPool/lxc /var/lib/lxc
Remark: zfs mount
- -a # Mount all available ZFS file systems.
- -O # Perform an overlay mount. Allows mounting in non-empty mountpoint.
- -l # Load keys for encrypted filesystems as they are being mounted. (zfs load-key)
- -o options # man 8 zfsprops
Get MountPoint Status
zfs get all [MountPoint|Dataset]
i.e.
zfs get all /MyPool | less
NAME PROPERTY VALUE SOURCE
MyPool type filesystem -
MyPool creation Wed Dec 22 18:17 2021 -
MyPool used 105K -
MyPool available 9.20G -
MyPool referenced 24K -
MyPool compressratio 1.00x -
...
MyPool checksum on default
MyPool compression lz4 local
...
MyPool casesensitivity sensitive -
...
FS Tuning(Optimizing ZFS Parameters)
zfs get all | grep atime
MyPool atime on default MyPool relatime off default
zfs set atime=on MyPool # Default: on
zfs set relatime=on MyPool # Default: off
zfs set xattr=off MyPool # Default: on
Remark:
relatime
This brings the default ext4/XFS atime semantics to ZFS,
where access time is only updated if the modified time or changed time changes,
or if the existing access time has not been updated within the past 24 hours.
This property only takes effect if atime is on
xattr
extended attributes
i.e.
setfattr -n user.DOSATTRIB testfile.txt
getfattr testfile.txt
設立 Quota
Quota 的種類
- Mount Point Quota
- User/Group Quota
Mount Point Quota
"quota" limits the overall size of a dataset and all of it's children and snapshots while
"refquota" applies to only to data directly referred to from within that dataset.
* Setting the refquota or refreservation property higher than the quota or reservation property has no effect.
Reference Quota
A reference quota limits the amount of space a dataset can consume by enforcing a hard limit.
This hard limit includes space referenced by the dataset alone and does not include space used by descendants,
such as file systems or snapshots.
Syntax
zfs set quota=10G ZPOOL/ZFS
Get Quota Usage
# -r recursively gets the property for all child filesystems.
zfs get [-r] refquota,quota ZPOOL/ZFS
Example
zfs set quota=1g lxc/test
zfs list lxc/test
NAME USED AVAIL REFER MOUNTPOINT
lxc/test 96K 1024M 96K /lxc/test
zfs get refquota,quota lxc/test
NAME PROPERTY VALUE SOURCE lxc/test refquota none default lxc/test quota 1G local
# 當 Full 時
touch test.txt
touch: cannot touch 'test.txt': Disk quota exceeded
# 取消 Quota
zfs set quota=none lxc/test
User/Group Quota
zfs set userquota@username=size ZPOOL/ZFS # size= none
zfs set groupquota@group=size ZPOOL/ZFS
Get Quota Usage
- zfs userspace ZPOOL/ZFS
- zfs groupspace ZPOOL/ZFS
* A user quota or group quota that is set on a parent file system is not automatically inherited by a descendent file system.
However, the user or group quota is applied when a clone or a snapshot is created from a file system that has a user or group quota.
Reservation
To guarantee that a specified amount of disk space is available to a file system
# 保留 1 GB 給 RootMountPoint/FS
zfs set reservation=1G RootMountPoint/FS (取消: reservation=none)
* 那 RootMountPoint 的 USED 會立即多了 1 GB (AVAIL 小了 1 GB)
reservation 與 refreservation 的分別
zfs get reservation RootMountPoint/FS
zfs get refreservation RootMountPoint/FS
* refreservation = space guaranteed to a dataset excluding descendants (snapshots)
* you cannot reserve disk space for a dataset if that space is not currently available in the pool.
Inherit
# To remove a custom property
zfs inherit [-rS] property filesystem|volume|snapshot
# Clears the specified property, causing it to be inherited from an ancestor
-r Recursively inherit the given property for all children.
i.e.
zfs inherit -r compression tank/home
Snapshot
* snapshot is a read-only copy of a file system or volume
Option:
-r Creates the recursive snapshot (one atomic operation)
好處: consistent time, even across descendent file systems
Usage
zfs snap Dataset@SnapName
Take snapshot
zfs snap MyPool/lxc@`date +%s` # unix timestamp
Notes
date +%Y%m%d # 20231215
* The "written" property of a snapshot tracks the space the snapshot uses.
To get a list of all available snapshots
# Snapshots live in a hidden directory under the parent dataset
* By default, these directories will not show even "ls -a"
ls -a
知到 path 依然 access 到
ls .zfs/snapshot
snapshot-name ...
P.S.
* Setting: zfs get snapdir <dataset> # snapdir=hidden/visible
zfs list -t snap
NAME USED AVAIL REFER MOUNTPOINT MyPool/lxc@1641203769 0B - 28.5K -
Or
zfs list -t all
Comparing Snapshots
zfs diff Dataset@SnapName
- + Adding the path or file.
- - Deleting the path or file.
- M Modifying the path or file.
- R Renaming the path or file.
ie.
mkdir test; echo abc > test/test.txt
zfs snap MyPool/lxc@`date +%s`
echo 123 > test/test.txt
zfs list -t snap
NAME USED AVAIL REFER MOUNTPOINT MyPool/lxc@1641204584 12.5K - 27.5K -
zfs diff MyPool/lxc@1641204584
M /var/lib/lxc/test/test.txt
Comparing two snapshots
rm -f test/test.txt ; echo abc > abc.txt
zfs snap MyPool/lxc@`date +%s`
zfs list -t snap
NAME USED AVAIL REFER MOUNTPOINT MyPool/lxc@1641204584 17K - 27.5K - MyPool/lxc@1641204813 0B - 27.5K -
zfs diff MyPool/lxc@1641204584 MyPool/lxc@1641204813
+ /var/lib/lxc/abc.txt + /var/lib/lxc/abc.txt/<xattrdir> + /var/lib/lxc/abc.txt/<xattrdir>/security.selinux M /var/lib/lxc/ M /var/lib/lxc/test - /var/lib/lxc/test/test.txt - /var/lib/lxc/test/test.txt/<xattrdir> - /var/lib/lxc/test/test.txt/<xattrdir>/security.selinux
rollback snapshot
zfs rollback storage/home@08-30-08
destroy snapshot
zfs destroy storage/home@08-30-08 # 沒有 confirm, 小心使用
destroy snapshot with range ("%")
An inclusive range of snapshots may be specified by separating the first and last snapshots with a percent sign.
The first and/or last snapshots may be left blank,
in which case the filesystem's oldest or newest snapshot will be implied.
i.e.
zfs list -t snap | grep ispconfig
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0748 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0751 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0756 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0757 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0758 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0759 0B - 96K -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0807 0B - 96K -
# 在 X 之後的 destroy
zfs destroy lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0756%
zfs list -t snap | grep ispconfig
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0748 0B - 96K - lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0751 0B - 96K -
# 在 X 之前的 destroy
zfs destroy lxc@%zfs-auto-snap_daily-2022-01-18-0758
# 在 X 範圍內的 destroy
lxc/test@zfs-auto-snap_daily-2022-01-18-0751 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0756 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0757 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0758 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0759 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0807 0B - 96K -
zfs destroy lxc/test@zfs-auto-snap_daily-2022-01-18-0756%zfs-auto-snap_daily-2022-01-18-0759
lxc/test@zfs-auto-snap_daily-2022-01-18-0751 0B - 96K - lxc/test@zfs-auto-snap_daily-2022-01-18-0807 0B - 96K -
Restoring Individual Files from Snapshots
Snapshots live in a hidden directory under the parent dataset: .zfs/snapshots/snapshotname.
not show even when executing a standard ls -a
zfs set snapdir=visible mypool/var/tmp # hidden | visible
Even if the the snapdir property is set to hidden, running ls .zfs/snapshot will still list the contents of that directory.
hold & release
zfs hold [-r] tag snapshot...
Hold a snapshot to prevent it being removed with the zfs destroy command.
If a hold exists on a snapshot, attempts to destroy that snapshot by using the zfs destroy command return EBUSY.
zfs holds
Lists all existing user references for the given snapshot or snapshots.
zfs release [-r] tag snapshot...
Removes a single reference, named with the tag argument, from the specified snapshot or snapshots.
Space Usage
zpool list
SIZE column
...
FREE column
Pool 一共有的可用空間
zfs list
USED column
The amount of disk space that is currently being used by a specific(unique) dataset.
It includes the actual data stored in the dataset,
as well as any metadata or overhead associated with the file system.
In other words, how much space will be freed if that particular snapshot is deleted.
AVAIL column
扣除 redundancy, compressions, deduplication 後的可用空間
在 RAID-Z 情況下, FREE != AVAIL
REFER column
The total amount of data that is being referenced by a specific dataset,
including the actual data stored in the dataset itself and
any shared data blocks with its snapshots or clones.
zfs list -t all -o name,written,refer,used
WRITTEN
* REFER 係當前的 data size, 與 rm, destroy 無關
zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT lxc/test 96K 1024M 96K /lxc/test
dd if=/dev/zero of=test64.bin bs=1M count=64
zfs snap lxc/test@test64
lxc/test 64.1M 960M 64.1M /lxc/test lxc/test@test64 0B - 64.1M -
dd if=/dev/zero of=test128.bin bs=1M count=128
lxc/test 192M 832M 192M /lxc/test lxc/test@test64 64K - 64.1M -
zfs snap lxc/test@test128
lxc/test 192M 832M 192M /lxc/test lxc/test@test64 64K - 64.1M - lxc/test@test128 0B - 192M -
rm -f test64.bin
lxc/test 192M 896M 128M /lxc/test lxc/test@test64 64K - 64.1M - lxc/test@test128 64K - 192M -
dd if=/dev/zero of=test256.bin bs=1M count=256
zfs snap lxc/test@test256
lxc/test 448M 640M 384M /lxc/test lxc/test@test64 64K - 64.1M - lxc/test@test128 64K - 192M - lxc/test@test256 0B - 384M -
zfs destroy lxc/test@test128
lxc/test 448M 1024M 96K /lxc/test lxc/test@test64 64.1M - 64.1M - lxc/test@test256 384M - 384M -
Clone
A clone is a writable version of a snapshot and has its own properties.
Removing the snapshot upon which a clone bases is impossible.
zfs promote
makes the clone an independent dataset.
makes the snapshot become a child of the clone, rather than of the original parent dataset.
This removes the value of the "origin" property and disconnects the newly independent dataset from the snapshot.
zfs clone camino/home/joe@backup camino/home/joenew
zfs get origin camino/home/joenew
zfs promote camino/home/joenew
Checking
zfs get origin camino/home/joenew
Replication & Incremental Backups (send, receive)
Replication
zfs snapshot mypool@backup1
# storing the backups as archive files
zfs send mypool@backup1 > /backup/backup1
zfs send -v mypool@replica1 | zfs receive backup/mypool
Incremental Backups
# zfs snapshot mypool@replica2
# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT mypool@replica1 5.72M - 43.6M - mypool@replica2 0 - 44.1M -
Using "zfs send -i" and indicating the pair of snapshots generates an incremental replica stream containing the changed data.
zfs send -v -i mypool@replica1 mypool@replica2 | zfs receive /backup/mypool
Volume
# Creating and Destroying Volumes
'Volumes' expose it as a block device under /dev/zvol/poolname/dataset
This allows using the volume for other file systems
Create Volume
zfs create -V size VolumeName
ie.
zfs create -V 250m -o compression=on tank/fat32
Memory Usage(ARC)
- zfs_arc_max / zfs_arc_min
- vfs.zfs.arc_meta_limit
zfs_arc_max
To improve the random read performance, a separate L2ARC device can be used
(zpool add <pool> cache <device>)
ZFS has the ability to extend the ARC with one or more L2ARC devices
These L2ARC devices should be faster and/or lower latency than the storage pool.
ZFS uses 50% of the host memory for the Adaptive Replacement Cache (ARC) by default.
runtime setting
echo "$[100*1024*1024]" > /sys/module/zfs/parameters/zfs_arc_max # 100M
Permanent setting
/etc/modprobe.d/zfs.conf
# 100M options zfs zfs_arc_max=104857600
* zfs_arc_min (Defaults to 1/32 of the system memory)
* The value for vfs.zfs.arc_max needs to be smaller than the value for vm.kmem_size
Checking
cat /sys/module/zfs/parameters/zfs_arc_max
vfs.zfs.arc_meta_limit
Limit the amount of the ARC used to store metadata.
default: 1/4 vfs.zfs.arc_max
Module Parameters
vfs.zfs.vdev.cache.size
A preallocated amount of memory reserved as a cache for each device in the pool.
The total amount of memory used will be this value multiplied by the number of devices.
zfs_vdev_min_auto_ashift
Lower ashift (sector size) used automatically at pool creation time.
The default value of 9 represents 2^9 = 512
To avoid write amplification and get the best performance,
set this value to the largest sector size used by a device in the pool.
zed
ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module.
ps aux | grep zed
root 649 0.0 0.4 155208 5880 ? Ssl 13:01 0:00 /sbin/zed -F
/etc/zfs/zed.d/zed.rc
# Mail to admin ZED_EMAIL_ADDR="root"
systemctl restart zed
N copies of each data block
- dataset-level feature
- These copies are in addition to any pool-level redundancy
- chance of surviving data corruption (checksum errors)
-
Provides data protection, even when only a single disk is available.
(multiple copies might be placed on a single disk)
ie.
zfs create example/data
zfs set copies=2 example/data # N 只可以係 1(default)~3
Notes
- filesystem metadata is automatically stored multiple times across different disks
Add cache and log to an existing pool
zpool add -f <pool> log <device-part1> cache <device-part2>
log device
Mirroring of log devices is possible, but RAID-Z is not supported.
cache devices
Mirroring cache devices is impossible.
Since a cache device stores only new copies of existing data, there is no risk of data loss.
ZFS Special Device
A special device in a pool is used to store metadata, deduplication tables, and optionally small file blocks.
* Adding a special device to a pool cannot be undone!
# The redundancy of the special device should match the one of the pool
zpool add <pool> special mirror <device1> <device2>
RAID-Z pool
zpool create MyPool raidz disk0 disk1 disk2
echo 'daily_status_zfs_enable="YES"' >> /etc/periodic.conf
Virtual Devices (vdevs)
- disk
- file
- mirror
- raidz
- spare
- log
- cache
sysctl.conf Setting
vfs.zfs.vdev.cache.size
A preallocated amount of memory reserved as a cache for each device in the pool.
vfs.zfs.vdev.max_pending
Limit the number of pending I/O requests per device.
Online Resize Pool
# For mirror/raidz Pool
# -e Expand the device to use all available space.
zpool online -e PoolName device1 [device2 ...]
* All devices must be expanded before the new space will become available to the pool
Example
# update the device size
echo 1 > /sys/block/sdX/device/rescan
zpool status
zpool online -e lxc sdX1
P.S.
# Auto resize
zpool get autoexpand MyPool
zpool set autoexpand=on MyPool
Growing a Pool
The smallest device in each vdev limits the usable size of a redundant pool.
1. Replace or resilver operation
2. Expand exch devices
Start expansion by using 'zpool online -e' on each device.
After expanding all devices, the extra space becomes available to the pool.
CLI History
# ZFS records commands that change the pool
# -l Displays log records in long format
# -i Displays internally logged ZFS events
zpool history
dRAID
dRAID is a variant of raidz that provides integrated distributed hot spares disk
A dRAID vdev is constructed from multiple internal raidz groups, each with D data devices and P parity devices.
Group0(raidz) | Group1(raidz) | Logical Spare
Create
zpool create <pool> draid[<parity>][:<data>d][:<children>c][:<spares>s] <vdevs...>
- parity - The parity level (1-3). Defaults to one.
- data - The number of data devices per redundancy group.
- spares - The number of distributed hot spares. Defaults to zero.
- children - Useful as a cross-check when listing a large number of devices.
i.e.
# 11 disk dRAID pool with 4+1 redundancy and 1 distributed spare
# a-k = 11
zpool create tank draid:4d:1s:11c /dev/sd[a-k]
Recovery
# draid1-0-0 = distributed spare, 行 zpool status tank 會見到它
echo offline > /sys/block/sdg/device/state
zpool replace -s tank sdg draid1-0-0
zpool status
Healing Resilver (=self healing)
一共有兩種模式
- Traditional resilver
- Sequential resilver
Traditional resilver
scans the entire block tree(Driven by a traversal of the pool block pointer tree)
each block is available while it’s being repaired and can be immediately verified.
Sequential resilver(dRAID 功能)
Driven by a traversal of space allocation maps
The price to pay for this performance improvement is that the block checksums cannot be verified while resilvering.
Therefore, a scrub is started to verify the checksums after the sequential resilver completes.
Reconstructs based off of knowledge of data layout
E.g., Mirror reconstruction is just a copy
Problem: no block pointer data =>
1. Does not verify reads (has no checksums)
2. Does not work with RAIDz layout (no block boundaries to locate parity)
zfs: scrub vs resilver
A scrub reads all the data in the zpool and checks it against its parity information.
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device.
This action is a form of disk scrubbing.
Troubleshoot
1) boot fail after zfs disk failed (Rocky 8)
... A start job is running for import zfs pools by cache file
Fix
boot with rescue mode
mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache.backup
reboot
2) fail to attach disk to mirror
zpool attach lxc sdb1 /dev/sda1 # attach disk(/dev/sda1) to pool(lxc)
cannot attach /dev/sda1 to sdb1: can only attach to mirrors and top-level disks
Fix
zpool attach -o ashift=9 lxc sdb1 /dev/sda1
"can only attach to mirrors and top-level disks" message is in fact reported on many other reasons when disk can't be added to the pool ....
Case: physical sector size of devices different
reported as 4KB by the SSDs and not reported by the USB stick and probably defaulted to 512.
It should probably be possible to force ashift=9 for the added vdev, since 512e disks support smaller I/Os, but they will work suboptimal.
3) modprobe 載入 zfs 失敗
/sbin/modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/4.18.0-348.el8.0.2.x86_64
Fix: dnf update & reboot # 原因係行緊的 Kernel 版本不對應 module
User Property
ZFS supports arbitrary user properties. User properties have no effect on ZFS behavior.
User property names must conform to the following characteristics:
- Must Contain a colon (':') character to distinguish them from native properties.
-
They must contain lowercase letters or the following punctuation characters: '+','.', '_'
(不可以有 "-") - Maximum length: 256
Set user property
zfs set property=value filesystem|volume|snapshot
i.e.
zfs set user:tag=test lxc/test
Get user property
zfs get property filesystem|volume|snapshot
i.e.
zfs get user:tag lxc/test
NAME PROPERTY VALUE SOURCE lxc/test user:tag test local
To clear a user property
i.e.
zfs inherit user:tag lxc/test
# "-r" applied recursively when the
zfs inherit -r user:tag lxc/test
Case-Insensitive
The properties cannot be changed after the file system is created,
and therefore, should be set when the file system is created.
zfs create lxc/test -o casesensitivity=insensitive
echo case-insensitive > /lxc/test/Tim.txt
cat /lxc/test/TIM.txt
zfs-fuse
不建議使用, 因為它很久沒維護