2. zfs pool & zfs volume

最後更新: 2024-10-23

目錄

 

 


Pool

 

建立普通 Pool (單碟)

zpool create PoolName Device1 Device2 ...

e.g.

zpool create MyPool /dev/vdb1

建立 Mirrored Pool (雙碟)

Usage:

zpool create PoolName mirror Disk1 Disk2 [Spare_Disk3]

i.e.

cd /home/zfs_test

dd if=/dev/zero of=disk1.img bs=1M count=100

dd if=/dev/zero of=disk2.img bs=1M count=100

zpool create MyMirrorPool mirror /home/zfs_test/disk1.img /home/zfs_test/disk2.img

 * 它會自動 mount 到 /MyMirrorPool

mount | grep zfs

MyMirrorPool on /MyMirrorPool type zfs (rw,xattr,noacl)

查看有什麼 Pool

zpool list [PoolName]

NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
MyMirrorPool    80M   108K  79.9M        -         -     3%     0%  1.00x    ONLINE  -

* 由於係 mirror pool 所以 size 係得一半

Pool status

zpool status [PoolName]

ie.

zpool status MyMirrorPool

  pool: MyMirrorPool
 state: ONLINE
config:

        NAME                          STATE     READ WRITE CKSUM
        MyMirrorPool                  ONLINE       0     0     0
          mirror-0                    ONLINE       0     0     0
            /home/zfs_test/disk1.img  ONLINE       0     0     0
            /home/zfs_test/disk2.img  ONLINE       0     0     0

errors: No known data errors

 

Listing All Properties for a Pool

zpool get all PoolName

e.g.

zpool get all MyPool | less

NAME    PROPERTY                       VALUE                          SOURCE
MyPool  size                           9.50G                          -
MyPool  capacity                       0%                             -
MyPool  altroot                        -                              default
MyPool  health                         ONLINE                         -
MyPool  guid                           1233692729511501900            -
...

 

刪除 Pool

zpool destroy PoolName    # 沒有 confirm 的, 要小心執行

 

Add disk to a mirrored Pool

# 必須 disk pair

zpool add PoolName mirror disk3 disk4

i.e.

dd if=/dev/zero of=/home/zfs_test/disk3.img bs=1M count=100

zpool add MyMirrorPool mirror /home/zfs_test/disk3.img

invalid vdev specification: mirror requires at least 2 devices

dd if=/dev/zero of=/home/zfs_test/disk4.img bs=1M count=100

zpool add MyMirrorPool mirror /home/zfs_test/disk3.img /home/zfs_test/disk4.img

zpool status MyMirrorPool

  pool: MyMirrorPool
 state: ONLINE
config:

        NAME                          STATE     READ WRITE CKSUM
        MyMirrorPool                  ONLINE       0     0     0
          mirror-0                    ONLINE       0     0     0
            /home/zfs_test/disk1.img  ONLINE       0     0     0
            /home/zfs_test/disk2.img  ONLINE       0     0     0
          mirror-1                    ONLINE       0     0     0
            /home/zfs_test/disk3.img  ONLINE       0     0     0
            /home/zfs_test/disk4.img  ONLINE       0     0     0

errors: No known data errors

zpool list MyMirrorPool

NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
MyMirrorPool   160M   111K   160M        -         -     2%     0%  1.00x    ONLINE  -

 

iostat

zpool iostat [MyPool] [interval [count]]

              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MyPool      97.5K  9.50G      0      3  6.53K  44.0K

zpool iostat 5

              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
maildata    5.66G   143G      0     45  1.29K  2.94M
maildata    5.72G   143G      3    141  26.5K  12.5M
...

zpool iostat -v [MyPool]

              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
MyPool      97.5K  9.50G      0      2  3.96K  26.7K
  mirror    97.5K  9.50G      0      2  3.96K  26.7K
    sda1        -      -      0      1  1.98K  13.3K
    sdb1        -      -      0      1  1.98K  13.3K
----------  -----  -----  -----  -----  -----  -----

 


Pool Import / Export

 

Export a Pool

# Exports the given pools from the system.

export [-f] <pool>

 * pool 被 export 後執行 zpool status 只有 "no pools available"

Import a Pool

Lists pools available to import

import [-d dir|device][-c cachefile]

 * If the -d -or -c options are not specified, searches for devices using libblkid on Linux

  • -d  DIR|Device            # device / directories are searched
  • -c cachefile      # Reads configuration from the given cachefile (instead of searching for devices)

e.g.

zpool import

   pool: lxc
     id: 1289429485874501839
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        lxc         ONLINE
          sda1      ONLINE

Import a pool

zpool import PoolName

 * pool 要成功 import 後才可 "zpool status" 看到

e.g.

zpool import lxc

Notes: force import

當在另一個 system 沒有 "export" 時, 到另一 system "import" 會出 Error

那時就要 force import

zpool import -f lxc

Import ALL without mount it

zpool import -a -N

  • -a               # Searches for and imports all pools found.
  • -N              # Import the pool without mounting any file systems.
  • -o mntopts  # Comma-separated list of mount options to use when mounting datasets within the pool.

Import & Rename

zpool import pool [newpool]

 


 

Import service

zfs-import-cache.service      enabled
zfs-import-scan.service       disabled

systemctl show zfs-import-cache | grep ExecStart

zfs-import-cache

相當於

zpool import -c /etc/zfs/zpool.cache -aN

zfs-import-scan

相當於

zpool import -aN -o cachefile=none

The cachefile property

  • This property controls where pool configuration information is cached.
  • All pools in the cache are automatically imported when the system boots.
  • File will be automatically updated when your pool configuration is changed

Generating a new cache file(/etc/zfs/zpool.cache)

zpool set cachefile=/etc/zfs/zpool.cache tank

Disabled by setting

zpool set cachefile=none tank

Force a check (Scrubbing)

# Examines all data in the specified pools

zpool scrub [PoolName]

# Current / Last scrub status

zpool status

# Stop the scrub

zpool scrub -s

心得: 設定 weekly schedule 閒時 scrubbing

Remark: sysctl.conf

vfs.zfs.scan_idle

Number of milliseconds since the last operation before considering the pool is idle.

ZFS disables the rate limiting for scrub and resilver when the pool is idle.

detached and attach

Increases or decreases redundancy by attach-ing or detach-ing a device on an existing vdev (virtual device)

attach

# Attaches new device to an existing zpool device

# If device is not currently part of a mirrored configuration, device automatically transforms into a two-way mirror

# If device is part of a two-way mirror, attaching new device creates a three-way mirror

zpool attach [-s] [-o property=value] pool device new_device

Opts

-s The new_device is reconstructed sequentially to restore redundancy as quickly as possible.

    (scrub is started when the resilver completes)

ie. zfs convert solo disk to mirror

zpool status

  pool: lxc
 state: ONLINE
  scan: resilvered 2.50G in 00:05:20 with 0 errors on Thu Jan  6 17:28:53 2022
config:

        NAME        STATE     READ WRITE CKSUM
        lxc         ONLINE       0     0     0
          sdb1      ONLINE       0     0     0

errors: No known data errors

zpool attach lxc sdb1 /dev/sda1

detached

zpool detach <MyPool> /home/storage/fault_disk

Add / Remove device

# Adds the specified vdev(virtual devices) to the given pool.

zpool add [-fn] pool vdev ...

應用:

  • add spares disk
  • add cache device

# Remove: currently only supports removing hot spares, cache, and log devices

zpool remove pool device ...

Add vs Attach Remark

 * Adding vdevs provides higher performance by distributing writes across the vdevs.

   (ZFS stripes data across each of the vdevs. (two mirror vdevs = RAID 10))

 * Each vdev provides its own redundancy level

 * ZFS allocates space so that each vdev reaches 100% full at the same time.

 * Adding a non-redundant vdev to a pool containing mirror or RAID-Z vdevs risks the data on the entire pool.

 

Other CLI

zpool initialize pool [device...]

This command writes a pattern of data to all unallocated regions

The default data pattern that is written is 0xdeadbeefdeadbeef

zpool sync [pool]...

zpool sync [pool ...]

forces all in-core dirty data to be written to the primary pool storage

 


Basic Recovering (online/offline/clear/replace)

 

# -t    Temporary. Upon reboot, the specified physical device reverts to its previous state

zpool online [-t] <pool> <device1> [device2 ...]

zpool offline [-t] <pool> <device1> [device2 ...]   # Takes the specified physical device offline

zpool clear [-nF] <pool> [device]

zpool replace [-s] <pool> [old_device new_device]

replace

# Replaces an existing device (which may be faulted) with a new one

 * new_device is required if the pool is not redundant.

 * If new_device is not specified, it defaults to old_device
    (useful after an existing disk has failed and has been physically replaced.)

    ie.

    當壞的 HDD 更換後, 執行 zpool replace MyPool

Opts

-s      The new_device is reconstructed sequentially to restore redundancy as quickly as possible.

clear

Clears device errors log in a pool.

If no devices are specified, all device errors within the pool are cleared.

zpool clear [-F -n] pool [devices]

Opts

-F     Initiates recovery mode for an unopenable pool.
        (Attempts to discard the last few transactions in the pool to return it to an openable state)

-n     Used in combination with the -F flag (not actually discard any transactions)

Example

1) 查看 Pool 是否有問題

# -x Only display status for pools that are exhibiting errors

zpool status -x

all pools are healthy

2) 在 hdd 係半死的情況

zpool offline MyPool vdc1

3) 在沒有 hotplug 的情況

Power down the computer and replace vdc1

4)

zpool replace MyPool vdc1

5) After rebuild complete

zpool clear MyPool

 


ZFS Filesystem

 

FS = datasets

每個 dataset 都有自己的 Propery. (compression, deduplication, caching, quotas, mount point, network sharing, readonly)

 * child datasets will inherit properties from their ancestors

 

List Dataset Command

zfs list

NAME     USED  AVAIL     REFER  MOUNTPOINT
MyPool  97.5K  9.20G       24K  /MyPool

 * By default, a ZFS file system is automatically mounted when it is created.

 * The mountpoint property is inherited.

 * The used space (USED column) is hierarchal. 所以用了幾多空間是看 REFER

 

建立/刪除 FS

建立 FS

zfs create Pool_NAME/FS_NAME

i.e.

zfs create MyPool/RockyMirror

zfs list MyPool/RockyMirror

NAME                 USED  AVAIL     REFER  MOUNTPOINT
MyPool/RockyMirror    24K  1.76T       24K  /MyPool/RockyMirror

刪除 FS

zfs destroy Pool_Mount_Point/FS_NAME

 

Change Pool default mountpoint

Default: /PoolName

i.e.

zfs set mountpoint=/path/to/folder mypool

 

Change Dataset mountpoint

At Dataset Create Time

zfs create -o mountpoint=/lxc MyPool/lxc

zfs list

NAME         USED  AVAIL     REFER  MOUNTPOINT
MyPool       189K  9.20G       24K  /MyPool
MyPool/lxc    24K  9.20G       24K  /lxc

After Dataset Created

  1. zfs mount                                # Displays all ZFS file systems currently mounted.
  2. zfs unmount Dataset
  3. zfs set mountpoint=/Path/To/New-Mount-Point Dataset

i.e. 修改 mount point 的位置

zfs create MyPool/lxc

zfs mount MyPool/lxc                           # check current mountpoint location

MyPool/lxc                      /MyPool/lxc

zfs unmount MyPool/lxc

zfs set mountpoint=/var/lib/lxc MyPool/lxc

zfs mount -a                                        # Mount all available ZFS file systems.

zfs mount MyPool/lxc                            # Verify new mountpoint location

MyPool/lxc                      /var/lib/lxc

Remark: zfs mount

  • -a                # Mount all available ZFS file systems.
  • -O                # Perform an overlay mount. Allows mounting in non-empty mountpoint.
  • -l                 # Load keys for encrypted filesystems as they are being mounted. (zfs load-key)
  • -o options     # man 8 zfsprops

Get MountPoint Status

zfs get all [MountPoint|Dataset]

i.e.

zfs get all /MyPool | less

NAME    PROPERTY              VALUE                  SOURCE
MyPool  type                  filesystem             -
MyPool  creation              Wed Dec 22 18:17 2021  -
MyPool  used                  105K                   -
MyPool  available             9.20G                  -
MyPool  referenced            24K                    -
MyPool  compressratio         1.00x                  -
...
MyPool  checksum              on                     default
MyPool  compression           lz4                    local
...
MyPool  casesensitivity       sensitive              -
...

 

FS Tuning(Optimizing ZFS Parameters)

zfs get all | grep atime

MyPool      atime                 on                     default
MyPool      relatime              off                    default

zfs set atime=on MyPool                         # Default: on

zfs set relatime=on MyPool                     # Default: off

zfs set xattr=off MyPool                          # Default: on

Remark:

relatime

This brings the default ext4/XFS atime semantics to ZFS,

where access time is only updated if the modified time or changed time changes,

or if the existing access time has not been updated within the past 24 hours.

This property only takes effect if atime is on

xattr

extended attributes

i.e.

setfattr -n user.DOSATTRIB testfile.txt

getfattr testfile.txt

 


設立 Quota

 

Quota 的種類

  • Mount Point Quota
  • User/Group Quota

Mount Point Quota

"quota" limits the overall size of a dataset and all of it's children and snapshots while

"refquota" applies to only to data directly referred to from within that dataset.

* Setting the refquota or refreservation property higher than the quota or reservation property has no effect.

Reference Quota

A reference quota limits the amount of space a dataset can consume by enforcing a hard limit.

This hard limit includes space referenced by the dataset alone and does not include space used by descendants,

such as file systems or snapshots.

Syntax

zfs set quota=10G ZPOOL/ZFS

Get Quota Usage

# -r recursively gets the property for all child filesystems.

zfs get [-r] refquota,quota ZPOOL/ZFS

Example

zfs set quota=1g lxc/test

zfs list lxc/test

NAME       USED  AVAIL     REFER  MOUNTPOINT
lxc/test    96K  1024M       96K  /lxc/test

zfs get refquota,quota lxc/test

NAME      PROPERTY  VALUE     SOURCE
lxc/test  refquota  none      default
lxc/test  quota     1G        local

# 當 Full 時

touch test.txt

touch: cannot touch 'test.txt': Disk quota exceeded

# 取消 Quota

zfs set quota=none lxc/test

User/Group Quota

zfs set userquota@username=size  ZPOOL/ZFS           # size= none

zfs set groupquota@group=size ZPOOL/ZFS

Get Quota Usage

  • zfs userspace ZPOOL/ZFS
  • zfs groupspace ZPOOL/ZFS

 * A user quota or group quota that is set on a parent file system is not automatically inherited by a descendent file system.

However, the user or group quota is applied when a clone or a snapshot is created from a file system that has a user or group quota.

Reservation

To guarantee that a specified amount of disk space is available to a file system

# 保留 1 GB 給 RootMountPoint/FS

zfs set reservation=1G RootMountPoint/FS            (取消: reservation=none)

* 那 RootMountPoint 的 USED 會立即多了 1 GB (AVAIL 小了 1 GB)

reservation 與 refreservation 的分別

zfs get reservation RootMountPoint/FS

zfs get refreservation RootMountPoint/FS

* refreservation = space guaranteed to a dataset excluding descendants (snapshots)

* you cannot reserve disk space for a dataset if that space is not currently available in the pool.

Inherit

# To remove a custom property

zfs inherit [-rS] property filesystem|volume|snapshot

# Clears the specified property, causing it to be inherited from an ancestor

-r    Recursively inherit the given property for all children.

i.e.

zfs inherit -r compression tank/home

 


Snapshot

 

 * snapshot is a read-only copy of a file system or volume

Option:

-r      Creates the recursive snapshot (one atomic operation)
        好處: consistent time, even across descendent file systems

Usage

zfs snap Dataset@SnapName

Take snapshot

zfs snap MyPool/lxc@`date +%s`    # unix timestamp

Notes

date +%Y%m%d       # 20231215

 * The "written" property of a snapshot tracks the space the snapshot uses.

To get a list of all available snapshots

# Snapshots live in a hidden directory under the parent dataset

 * By default, these directories will not show even "ls -a"

ls -a

知到 path 依然 access 到

ls .zfs/snapshot

snapshot-name ...

P.S.

 * Setting: zfs get snapdir <dataset>    # snapdir=hidden/visible

zfs list -t snap

NAME                    USED  AVAIL     REFER  MOUNTPOINT
MyPool/lxc@1641203769     0B      -     28.5K  -

Or

zfs list -t all

Comparing Snapshots

zfs diff Dataset@SnapName

  • + Adding the path or file.
  • - Deleting the path or file.
  • M Modifying the path or file.
  • R Renaming the path or file.

ie.

mkdir test; echo abc > test/test.txt

zfs snap MyPool/lxc@`date +%s`

echo 123 > test/test.txt

zfs list -t snap

NAME                    USED  AVAIL     REFER  MOUNTPOINT
MyPool/lxc@1641204584  12.5K      -     27.5K  -

zfs diff MyPool/lxc@1641204584

M       /var/lib/lxc/test/test.txt

Comparing two snapshots

rm -f test/test.txt ; echo abc > abc.txt

zfs snap MyPool/lxc@`date +%s`

zfs list -t snap

NAME                    USED  AVAIL     REFER  MOUNTPOINT
MyPool/lxc@1641204584    17K      -     27.5K  -
MyPool/lxc@1641204813     0B      -     27.5K  -

zfs diff MyPool/lxc@1641204584 MyPool/lxc@1641204813

+       /var/lib/lxc/abc.txt
+       /var/lib/lxc/abc.txt/<xattrdir>
+       /var/lib/lxc/abc.txt/<xattrdir>/security.selinux
M       /var/lib/lxc/
M       /var/lib/lxc/test
-       /var/lib/lxc/test/test.txt
-       /var/lib/lxc/test/test.txt/<xattrdir>
-       /var/lib/lxc/test/test.txt/<xattrdir>/security.selinux

rollback snapshot

zfs rollback storage/home@08-30-08

destroy snapshot

zfs destroy storage/home@08-30-08         # 沒有 confirm, 小心使用

destroy snapshot with range ("%")

An inclusive range of snapshots may be specified by separating the first and last snapshots with a percent sign.

The first and/or last snapshots may be left blank,
in which case the filesystem's oldest or newest snapshot will be implied.

i.e.

zfs list -t snap | grep ispconfig

lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0748     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0751     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0756     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0757     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0758     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0759     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0807     0B      -       96K  -

# 在 X 之後的 destroy

zfs destroy lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0756%

zfs list -t snap | grep ispconfig

lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0748     0B      -       96K  -
lxc/ispconfig@zfs-auto-snap_daily-2022-01-18-0751     0B      -       96K  -

# 在 X 之前的 destroy

zfs destroy lxc@%zfs-auto-snap_daily-2022-01-18-0758

# 在 X 範圍的 destroy

lxc/test@zfs-auto-snap_daily-2022-01-18-0751          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0756          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0757          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0758          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0759          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0807          0B      -       96K  -

zfs destroy lxc/test@zfs-auto-snap_daily-2022-01-18-0756%zfs-auto-snap_daily-2022-01-18-0759

lxc/test@zfs-auto-snap_daily-2022-01-18-0751          0B      -       96K  -
lxc/test@zfs-auto-snap_daily-2022-01-18-0807          0B      -       96K  -

Restoring Individual Files from Snapshots

Snapshots live in a hidden directory under the parent dataset: .zfs/snapshots/snapshotname.

not show even when executing a standard ls -a

zfs set snapdir=visible mypool/var/tmp   # hidden | visible

Even if the the snapdir property is set to hidden, running ls .zfs/snapshot will still list the contents of that directory.

hold & release

zfs hold [-r] tag snapshot...

Hold a snapshot to prevent it being removed with the zfs destroy command.

If a hold exists on a snapshot, attempts to destroy that snapshot by using the zfs destroy command return EBUSY.

zfs holds

Lists all existing user references for the given snapshot or snapshots.

zfs release [-r] tag snapshot...

Removes a single reference, named with the tag argument, from the specified snapshot or snapshots.

 


Space Usage

 

zpool list

SIZE column

...

FREE column

Pool 一共有的可用空間

zfs list

USED column

The amount of disk space that is currently being used by a specific(unique) dataset.

It includes the actual data stored in the dataset,
as well as any metadata or overhead associated with the file system.

In other words, how much space will be freed if that particular snapshot is deleted.

AVAIL column

扣除 redundancy, compressions, deduplication 後的可用空間

在 RAID-Z 情況下, FREE != AVAIL

REFER column

The total amount of data that is being referenced by a specific dataset,
    including the actual data stored in the dataset itself and
    any shared data blocks with its snapshots or clones.

 

zfs list -t all -o name,written,refer,used

WRITTEN

 

REFER Lab

 * REFER 係當前的 data size, 與 rm, destroy 無關

zfs list -t all

NAME                                                USED  AVAIL     REFER  MOUNTPOINT
lxc/test                                             96K  1024M       96K  /lxc/test

dd if=/dev/zero of=test64.bin bs=1M count=64

zfs snap lxc/test@test64

lxc/test                                           64.1M   960M     64.1M  /lxc/test
lxc/test@test64                                       0B      -     64.1M  -

dd if=/dev/zero of=test128.bin bs=1M count=128

lxc/test                                            192M   832M      192M  /lxc/test
lxc/test@test64                                      64K      -     64.1M  -

zfs snap lxc/test@test128

lxc/test                                            192M   832M      192M  /lxc/test
lxc/test@test64                                      64K      -     64.1M  -
lxc/test@test128                                      0B      -      192M  -

rm -f test64.bin

lxc/test                                            192M   896M      128M  /lxc/test
lxc/test@test64                                      64K      -     64.1M  -
lxc/test@test128                                     64K      -      192M  -

dd if=/dev/zero of=test256.bin bs=1M count=256

zfs snap lxc/test@test256

lxc/test                                            448M   640M      384M  /lxc/test
lxc/test@test64                                      64K      -     64.1M  -
lxc/test@test128                                     64K      -      192M  -
lxc/test@test256                                      0B      -      384M  -

zfs destroy lxc/test@test128

lxc/test                                            448M  1024M       96K  /lxc/test
lxc/test@test64                                    64.1M      -     64.1M  -
lxc/test@test256                                    384M      -      384M  -

 


Clone

 

A clone is a writable version of a snapshot and has its own properties.

Removing the snapshot upon which a clone bases is impossible.

zfs promote

makes the clone an independent dataset.

makes the snapshot become a child of the clone, rather than of the original parent dataset.

This removes the value of the "origin" property and disconnects the newly independent dataset from the snapshot.

zfs clone camino/home/joe@backup camino/home/joenew

zfs get origin camino/home/joenew

zfs promote camino/home/joenew

Checking

zfs get origin camino/home/joenew

 


Replication & Incremental Backups (send, receive)

 

Replication

zfs snapshot mypool@backup1

# storing the backups as archive files

zfs send mypool@backup1 > /backup/backup1

zfs send -v mypool@replica1 | zfs receive backup/mypool

Incremental Backups

# zfs snapshot mypool@replica2

# zfs list -t snapshot

NAME                    USED  AVAIL  REFER  MOUNTPOINT
mypool@replica1         5.72M      -  43.6M  -
mypool@replica2             0      -  44.1M  -

Using "zfs send -i" and indicating the pair of snapshots generates an incremental replica stream containing the changed data.

zfs send -v -i mypool@replica1 mypool@replica2 | zfs receive /backup/mypool

 


Volume

 

# Creating and Destroying Volumes

'Volumes' expose it as a block device under /dev/zvol/poolname/dataset

This allows using the volume for other file systems

Create Volume

zfs create -V size VolumeName

ie.

zfs create -V 250m -o compression=on tank/fat32

 


Memory Usage(ARC)

 

  • zfs_arc_max / zfs_arc_min
  • vfs.zfs.arc_meta_limit

zfs_arc_max

To improve the random read performance, a separate L2ARC device can be used
(zpool add <pool> cache <device>)

ZFS has the ability to extend the ARC with one or more L2ARC devices

These L2ARC devices should be faster and/or lower latency than the storage pool.

ZFS uses 50% of the host memory for the Adaptive Replacement Cache (ARC) by default.

runtime setting

echo "$[100*1024*1024]" > /sys/module/zfs/parameters/zfs_arc_max   # 100M

Permanent setting

/etc/modprobe.d/zfs.conf

# 100M
options zfs zfs_arc_max=104857600

 * zfs_arc_min (Defaults to 1/32 of the system memory)

 * The value for vfs.zfs.arc_max needs to be smaller than the value for vm.kmem_size

Checking

cat /sys/module/zfs/parameters/zfs_arc_max

vfs.zfs.arc_meta_limit

Limit the amount of the ARC used to store metadata.

default: 1/4 vfs.zfs.arc_max

 


Module Parameters

 

vfs.zfs.vdev.cache.size

A preallocated amount of memory reserved as a cache for each device in the pool.

The total amount of memory used will be this value multiplied by the number of devices.

zfs_vdev_min_auto_ashift

Lower ashift (sector size) used automatically at pool creation time.

The default value of 9 represents 2^9 = 512

To avoid write amplification and get the best performance,

set this value to the largest sector size used by a device in the pool.

 


zed

 

ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module.

ps aux | grep zed

root         649  0.0  0.4 155208  5880 ?        Ssl  13:01   0:00 /sbin/zed -F

/etc/zfs/zed.d/zed.rc

# Mail to admin
ZED_EMAIL_ADDR="root"

systemctl restart zed

 


N copies of each data block

 

  • dataset-level feature
  • These copies are in addition to any pool-level redundancy
  • chance of surviving data corruption (checksum errors)
  • Provides data protection, even when only a single disk is available.
    (multiple copies might be placed on a single disk)

ie.

zfs create example/data

zfs set copies=2 example/data                # N 只可以係 1(default)~3

Notes

  • filesystem metadata is automatically stored multiple times across different disks

 


Add cache and log to an existing pool

 

zpool add -f <pool> log <device-part1> cache <device-part2>

log device

Mirroring of log devices is possible, but RAID-Z is not supported.

cache devices

Mirroring cache devices is impossible.

Since a cache device stores only new copies of existing data, there is no risk of data loss.

 


ZFS Special Device

 

A special device in a pool is used to store metadata, deduplication tables, and optionally small file blocks.

* Adding a special device to a pool cannot be undone!

# The redundancy of the special device should match the one of the pool

zpool add <pool> special mirror <device1> <device2>

 


RAID-Z pool

 

zpool create MyPool raidz disk0 disk1 disk2

echo 'daily_status_zfs_enable="YES"' >> /etc/periodic.conf

 


Virtual Devices (vdevs)

 

  • disk    
  • file    
  • mirror  
  • raidz   
  • spare   
  • log     
  • cache  

sysctl.conf Setting

vfs.zfs.vdev.cache.size

A preallocated amount of memory reserved as a cache for each device in the pool.

vfs.zfs.vdev.max_pending

Limit the number of pending I/O requests per device.

 


Online Resize Pool

 

# For mirror/raidz Pool
# -e      Expand the device to use all available space.

zpool online -e PoolName device1 [device2 ...]

 * All devices must be expanded before the new space will become available to the pool

Example

# update the device size

echo 1 > /sys/block/sdX/device/rescan

zpool status

zpool online -e lxc sdX1

P.S.

# Auto resize

zpool get autoexpand MyPool

zpool set autoexpand=on MyPool

 

 


Growing a Pool

 

The smallest device in each vdev limits the usable size of a redundant pool.

1. Replace or resilver operation

2. Expand exch devices

Start expansion by using 'zpool online -e' on each device.

After expanding all devices, the extra space becomes available to the pool.
 


CLI History
 

# ZFS records commands that change the pool
# -l      Displays log records in long format
# -i      Displays internally logged ZFS events

zpool history
 


dRAID

 

dRAID is a variant of raidz that provides integrated distributed hot spares disk

A dRAID vdev is constructed from multiple internal raidz groups, each with D data devices and P parity devices.

Group0(raidz) | Group1(raidz) | Logical Spare

Create

zpool create <pool> draid[<parity>][:<data>d][:<children>c][:<spares>s] <vdevs...>

  • parity - The parity level (1-3). Defaults to one.
  • data - The number of data devices per redundancy group.
  • spares - The number of distributed hot spares. Defaults to zero.
  • children - Useful as a cross-check when listing a large number of devices.

i.e.

# 11 disk dRAID pool with 4+1 redundancy and 1 distributed spare
# a-k = 11

zpool create tank draid:4d:1s:11c /dev/sd[a-k]

Recovery

# draid1-0-0  = distributed spare, 行 zpool status tank 會見到它

echo offline > /sys/block/sdg/device/state

zpool replace -s tank sdg draid1-0-0

zpool status

 


Healing Resilver (=self healing)

 

一共有兩種模式

  • Traditional resilver
  • Sequential resilver

Traditional resilver

scans the entire block tree(Driven by a traversal of the pool block pointer tree)

each block is available while it’s being repaired and can be immediately verified.

Sequential resilver(dRAID 功能)

Driven by a traversal of space allocation maps

The price to pay for this performance improvement is that the block checksums cannot be verified while resilvering.

Therefore, a scrub is started to verify the checksums after the sequential resilver completes.

Reconstructs based off of knowledge of data layout

E.g., Mirror reconstruction is just a copy

Problem: no block pointer data =>

1. Does not verify reads (has no checksums)

2. Does not work with RAIDz layout (no block boundaries to locate parity)

zfs: scrub vs resilver

A scrub reads all the data in the zpool and checks it against its parity information.

When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device.

This action is a form of disk scrubbing.

 


Troubleshoot

 

1) boot fail after zfs disk failed (Rocky 8)

... A start job is running for import zfs pools by cache file

Fix

boot with rescue mode

mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache.backup

reboot

2) fail to attach disk to mirror

zpool attach lxc sdb1 /dev/sda1         # attach disk(/dev/sda1) to pool(lxc)

cannot attach /dev/sda1 to sdb1: can only attach to mirrors and top-level disks

Fix

zpool attach -o ashift=9 lxc sdb1 /dev/sda1

"can only attach to mirrors and top-level disks" message is in fact reported on many other reasons when disk can't be added to the pool ....

Case: physical sector size of devices different

reported as 4KB by the SSDs and not reported by the USB stick and probably defaulted to 512.  

It should probably be possible to force ashift=9 for the added vdev, since 512e disks support smaller I/Os, but they will work suboptimal.

3) modprobe 載入 zfs 失敗

/sbin/modprobe zfs

modprobe: FATAL: Module zfs not found in directory /lib/modules/4.18.0-348.el8.0.2.x86_64

Fix: dnf update & reboot # 原因係行緊的 Kernel 版本不對應 module

 


User Property

 

ZFS supports arbitrary user properties. User properties have no effect on ZFS behavior.

User property names must conform to the following characteristics:

  • Must Contain a colon (':') character to distinguish them from native properties.
  • They must contain lowercase letters or the following punctuation characters: '+','.', '_'
    (不可以有 "-")
  • Maximum length: 256

Set user property

zfs set property=value filesystem|volume|snapshot

i.e.

zfs set user:tag=test lxc/test

Get user property

zfs get property filesystem|volume|snapshot

i.e.

zfs get user:tag lxc/test

NAME      PROPERTY  VALUE     SOURCE
lxc/test  user:tag  test      local

To clear a user property

i.e.

zfs inherit user:tag lxc/test

# "-r"  applied recursively when the

zfs inherit -r user:tag lxc/test

 


Case-Insensitive

 

The properties cannot be changed after the file system is created,

and therefore, should be set when the file system is created.

zfs create lxc/test -o casesensitivity=insensitive

echo case-insensitive > /lxc/test/Tim.txt

cat /lxc/test/TIM.txt

 


zfs-fuse

 

不建議使用, 因為它很久沒維護

 

 

 

Creative Commons license icon Creative Commons license icon