lvm - 硬碟的動態管理

更新時間: 2020-07-26

目錄

 

介紹

LVM 在商用的環境下應用實在不可或缺, 它可以應用在硬碟備份, 硬碟替換, 容量擴充等方面.

LVM works at the block level.

強大功能

  • Online data relocation
  • Flexible capacity
  • Disk striping
  • Mirroring volumes
  • Volume Snapshots

它一共由三個部分所組成

  • PV
  • VG
  • LV

LVM 自身支援

  • RAID4/5/6
  • Linear
  • Striped
  • Mirrored

 

DOC

 


基礎: 建立LV

 

把硬碟轉成為 PV

pvcreate /dev/sda          # 會把整個硬碟轉成 PV

pvcreate /dev/sdb1        # 會把單一個 Partition 轉成 PV ( Partition ID: 0x8e )

!! 注意 !! 它會刪除硬碟上的資料

建立VG, 並把PV加到VG上

vgcreate vzvg /dev/hda1 /dev/hdb1

# 建立 VG: vzvg, 並把 PV: /dev/hda1 及 PV: /hdb1 加進去

# 如果 hda1 及 hdb1 分別有 10G 容量, 那 vzvg 就有 20G 容量了

建立一個 15GB 的 LV

lvcreate -L1500 -n mail vzvg

# 建立好的 LV 是在放在 /dev/mapper/vzvg-mail

# 命名方式是 GroupName-VolumeName

# 系統會自動建立 /dev/vzvg/mail -> /dev/mapper/vzvg-mail

# 到現在, 我們可以把 vzvg-mail 視作一個普通的 Block Device 來用

Review Info.

vgs

  VG   #PV #LV #SN Attr   VSize   VFree
  myvg   1   4   0 wz--n- 793.31g  73.31g
  vg3t   2   5   0 wz--n-   5.46t 613.00g

lvs

  LV             VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  kvm            myvg -wi-ao---- 400.00g
  backup         vg3t -wi-ao---- 500.00g

Opts

--units hHbBsSkKmMgGtTpPeE

Zero out empty space in a Volume Group

# Create a volume that consumes the empty space, then dd with zero

vgs --unit m

  VG   #PV #LV #SN Attr   VSize       VFree
  myvg   1   3   0 wz--n-  812352.00m 382272.00m

lvcreate -L 382272m -n /dev/mapper/myvg-emptyspace

# 別用 dd, 因為會影響其他 VPS

# dd if=/dev/zero of=/dev/mapper/myvg-emptyspace bs=16M

pv -L 50m /dev/zero > /dev/mapper/myvg-emptyspace

lvremove /dev/mapper/myvg-emptyspace

 


Get lvm info

 

lvm version

  LVM version:     2.02.133(2) (2015-10-30)
  Library version: 1.02.110 (2015-10-30)
  Driver version:  4.37.0

# Displays the recognized built-in block device types

lvm devtypes

  DevType       MaxParts Description
  aoe                 16 ATA over Ethernet
  ataraid             16 ATA Raid
  bcache               1 bcache block device cache
  ...

lvm formats

  lvm1
  pool
  lvm2

lvm segtypes

striped
zero
error
snapshot
mirror
raid1
raid10
...

 


啟用/停用 VG

 

lvs

  LV         VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  pc_data    st3t -wi-a-----   1.00t

停用 VG

vgchange -a n st3t

  Logical volume st3t/pc_data in use.
  Can't deactivate volume group "st3t" with 1 open logical volume(s)

因為 VG st3t 內的 LV pc_data 被 mount 了, 所以我們不能 in-active 它

umount 它次後 in-active VG

lvs

  pc_data    st3t -wi-------   1.00t

P.S.

in-active VG 後, /dev/mapper/st3t 會不見了, 要用 "vgchange -a y st3t" 去令它出現

 


VG 其他操作

 

Rename VG

vgrename OldVolumeGroup NewVolumeGroup

i.e.

vgs

VG   #PV #LV #SN Attr   VSize   VFree
myvg   1   4   0 wz--n- 793.31g 323.31g
st3t   1   4   0 wz--n-   2.73t 196.52g

vgrename st3t vg3t

 


block device backup - Snapshot

 

建立 Snapshot:

# 這 5G free space 是用來作寫入 Buffer
# 當它 Full 時, Snapshot 就會被 drop, 那時你將會失去 舊版本 的檔案 !!(snapshot 內的檔案)

lvcreate -L 5G -s -n my_snapshot_name /path/to/lv

P.S.

snapshot 是可以寫入的. 當 snapshot 被 mount 到 /mnt/tmp 後, 我們可以對它寫入東西

不過, 新寫入 orig / snap 的東西是會占用 "Allocated to snapshot"

Delete 新建立的 File 是不會 release "Allocated to snapshot"

原理

When a change is made to the original device after a snapshot is taken,
the snapshot feature makes a copy of the changed data area as it was prior to the change
so that it can reconstruct the state of the device.

When you create a snapshot file system, full read and write access to the origin stays possible.

If a chunk on a snapshot is changed, that chunk is marked and never gets copied from the original volume.

the snapshot contains the old data, while the LV holds the current data

Overfilling the snapshot will simply mean no more old data is saved as it's changed

查看 snapshot 的資料:

lvdisplay [-c]        <--  -c  會有一行一個的效果

i.e.

lvdisplay /dev/mapper/myraidvg-snap_mytestlw

  --- Logical volume ---
  LV Path                /dev/myraidvg/snap_mytestlw
  LV Name                snap_mytestlw
  VG Name                myraidvg
  LV UUID                yoOJsN-6ASk-fj3c-vU4H-uczN-5uLw-N51z6O
  LV Write Access        read/write
  LV Creation host, time server, 2018-01-03 14:57:32 +0800
  LV snapshot status     active destination for mytestlw
  LV Status              available
  # open                 1
  LV Size                10.00 GiB
  Current LE             2560
  COW-table size         1.00 GiB
  COW-table LE           256
  Allocated to snapshot  0.00%
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:10

lvs | grep MyBackup.Snap

MyBackup.Snap myraidvg swi-a-s---   1.00t      MyBackup 0.10

 * sync 完才見到 'Allocated to snapshot' 更新

當 Full 了時:

mount | grep snap

mount 不見了

dmesg

[78560.181614] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.

lvdisplay /dev/mapper/myraidvg-snap_mytestlw

  LV snapshot status     INACTIVE destination for mytestlw

再 mount 亦只會失敗

[78662.630747] Buffer I/O error on dev dm-10, logical block 2621424, async page read
[78662.630951] Buffer I/O error on dev dm-10, logical block 16, async page read
[78662.631055] EXT4-fs (dm-10): unable to read superblock
[78662.631150] EXT4-fs (dm-10): unable to read superblock
[78662.631240] EXT4-fs (dm-10): unable to read superblock

移除 Snapshot:

# 此時會有很多的 IO, 因為新寫入的資料要寫回原來的 LV

# -y|--yes    Do not prompt for confirmation interactively

lvremove -y /path/to/snapshot

Remark:

[1]

 * Table snapshoot 後, 寫東西會變得很慢, 所以 snapshot 狀態不應該 keep

LVM first makes a copy of the original version which is stored in the snapshot,

and then the modification is written on the normal Logical-Volume.

So the normal Logical-Volume always contains the latest version of the data and the snapshot only contains a copy of the blocks which have been modified.

建立了 snapshot 後, file system 的效能會下降, 因為每次寫入都會引發2次"寫入", 一次是 backup 舊版本, 一次是寫入新版本

Checking

dstat -d -D sdb

--dsk/sdb--
 read  writ
   0    32M
   0    31M
   0    31M
   0    33M
   0    37M
   0    36M
  61M   19M
 124M    0
 124M    0
  73M   64M
  27M   93M
   0   107M
   0    76M
   0    41M
   0    46M
   0    37M
   0    30M

 * If the snapshot logical volume becomes full it will be dropped

    so it is vitally important to allocate enough space.

 


將 LV 搬到另一 VG

 

# 將 LV - icy 由 VG - myvg 移到 vg3t

lvdisplay --units m /dev/myvg/icy

lvcreate --name icy --size 30720.00 vg3t

# 如果 VPS 沒有運行, 那可以跳過此步

lvcreate --snapshot --name icy-snap --size 1G /dev/myvg/icy

# 停一停, 諗一諗

dd if=/dev/myvg/icy-snap of=/dev/vg3t/icy bs=16M

# Cleanup

lvremove /dev/myvg/icy-snap

lvremove /dev/myvg/icy

 


容量擴充 - Xextend

 

擴充方式

  • 加新的 PV 到 VG
  • Resize Disk

加新的 PV 到 VG

pvcreate /dev/sdc1

blkid /dev/sdc1

/dev/sdc1: UUID="..." TYPE="LVM2_member" PARTLABEL="myraidvg-b" PARTUUID="..."

pvs

  PV         VG       Fmt  Attr PSize   PFree
  /dev/sdb1  myraidvg lvm2 a--    1.82t   1.33t
  /dev/sdc1           lvm2 ---    1.82t   1.82t

vgextend myraidvg /dev/sdc1

  Volume group "myraidvg" successfully extended

pvs

  PV         VG       Fmt  Attr PSize   PFree
  /dev/sdb1  myraidvg lvm2 a--    1.82t   1.33t
  /dev/sdc1  myraidvg lvm2 a--    1.82t   1.82t

Resize Disk

echo 1 > /sys/block/sda/device/rescan    # 對 virtio-blk 無效

parted /dev/vda print                             # Verify

pvresize /dev/sda2                                 # Expand a PV after enlarging the partition

增大 LV 的容量

lvextend -L12G /dev/vzvg/homevol               # 直接設定 absolute size

OR

lvextend -L+1G /dev/vzvg/homevol               # 增加 size

OR

lvextend -l+256 /dev/vzvg/homevol               # size in units of logical extents

Percentage

只有 -l 支持 %STRING, 而 -L 是不支援的

# %VG percentage of the total space in the VG

# %FREE remaining free space in the VG

lvextend -l100%VG /dev/vzvg/homevol # 在一個 VG 內只有一個 LV 時很有用

lvextend -l+100%FREE /dev/vzvg/homevol

必須要加 "+", 幸好 lvextend 有 check size !!

New size given (2571 extents) not larger than existing size (2935 extents)

增加 file system 的容量

Link

容量

Link
 


硬碟替換 - pvmove

 

vgextend vzvg /dev/sda1

pvmove /dev/hda1

vgreduce vzvg /dev/hda1

解釋:

pvmove SourcePV

pvmove SourcePV DestinationPV

# move allocated physical extents on Source to one or more other PVs

# All PE that are used by simple LV on /dev/hda1 to free PE elsewhere in the VG

P.S.

If pvmove gets interrupted for any reason,  run "pvmove" again without any PV arguments

to restart any moves that were in progress from the last checkpoint

pvmove opt:

--atomic                   # Make the entire operation atomic.

-i, --interval sec        # Report progress as a percentage at regular intervals.

 -b, --background      # Run the daemon in the background.

more i.e

# moves all allocated space off  sdd1, reports the progress (%)  every 5 sec

pvmove -i5 /dev/sdd1

# sdc1 -> sdf1 in background

pvmove -b /dev/sdc1 /dev/sdf1
 

vgreduce vzvg /dev/hda1

# remove one or more unused PV from a VG

# vgreduce VG PVPath...
 

 


搬遷

 

舊機:

# make inactive VG unknown to the system

# You can then move all the PV in that VG to  a different  system for later vgimport

# vgexport clears the VG system  ID

vgexport vzvg

新機:

# vgimport  sets  the VG system ID

vgimport vzvg

 


刪除 LV 與 VG

 

刪除 LV:

# lvremove removes one or more logical volumes. 

# Confirmation will be requested before deactivating any active logical volume prior to removal.

# Removing an origin logical volume will also remove all dependent snapshots.

# remove by full path

lvremove /dev/vzvg/lamp

# remove vol1 in vg00

lvremove -f vg00/vol1

# Remove all logical volumes in volume group vg00:

lvremove vg00

刪除 VG:

vgchange -a n vzvg

vgremove vzvg

 


掛外來的 LVM

 

1. pvscan

When called without the --cache option, pvscan lists PVs on the system

pvscan

 /dev/sda2
  Total: 1 [465.26 GiB] / in use: 1 [465.26 GiB] / in no VG: 0 [0   ]

Opts

  • -t|--test          # Run in test mode. Commands will not update metadata.
  • -v|--verbose    # Set verbose level. (-v ~ -vvvv)

# This first clears all existing PV online records,
# then scans all devices on the system,
# adding PV online records for any PVs that are found.

pvscan --cache device

If  device is present, lvm adds a record that the PV on device is online. 

If device is not present, lvm removes the online record for the PV.

2. vgscan [-v]

scans all SCSI, (E)IDE disks, multiple devices and a bunch of other disk devices in the system looking for LVM physical volumes and volume groups.

output:

  Reading all physical volumes.  This may take a while...
  Found volume group "myvg" using metadata type lvm2

3. lvscan

  inactive          '/dev/myvg/swap' [7.00 GiB] inherit
  inactive          '/dev/myvg/root' [96.00 GiB] inherit
  inactive          '/dev/myvg/data' [346.27 GiB] inherit

4. activate the volume group

vgchange -a y

最重要的一步, 之前那3步都是在查看東西.

5. mount

mount /dev/VolGroup00/LogVol00 /mnt

 


lvs

lvs - report information about logical volumes

[-a|--all]
[-o|--options    [+]Field[,Field]]
[-v|--verbose]

i.e.

lvs -a -o name,copy_percent,devices vg00

 


Mirror (舊方案)

 

LVM 的 mirror 是在 PV 上進行的 (sync).

Opts:

# -m 1 (2 份 copies )
# -m 2 (3 份 copies )

# 建立 Mirror, Defualt mirror sector size: 512KB (-R x , Unit: MB)(lvm.conf: mirror_region_size=512 )

i.e. 在 vg0 上建立一個 10G 的 mirror LV (vg0 至小要有 2 leg)

lvcreate -L 10G -m 1 -n mirrorlv vg0

--nosync

# 新建立 mirror LV 時, 可以用 --nosync 參數, 相當於沒有 initial synchronization

轉換成 mirror

將 LV 轉成 mirror LV 可以用  "-b" option

此操作會在 background 進行 conversion (sync data)

mirror 轉換成一般 LV

lvconvert -m 0 vg0/lvol1

LOG:

# LVM maintains a small log which it uses to keep track of which regions are in sync

 --mirrorlog {disk|core|mirrored}

# --corelog is equivalent to --mirrorlog core

--mirrorlog core => kept log in memory <-- resynchronized at every reboot

# Create a mirror log

--mirrorlog mirrored

# Default: mirror log 是建立在不參與 mirror 的碟上, 以下指令使 log 可以放其中一碟上.

--alloc anywhere     <-- degrade performance

# 查看 mirror 的結構

lvs -a -o +devices

...
  • _mimage_0
  • _mimage_1
  • _mlog

(進階) 設定在那 leg 放 log

# lvcreate ... data_leg1 data_leg2 log_leg

lvcreate -L 500M -m1 -n mirrorlv vg0 /dev/sda1 /dev/sdb1 /dev/sdc1

 


Mirror Repair

 

# brought up in degraded mode (disk failed)

vgchange -ay --partial vg0

# replace disk

* When fail occurs, LVM converts the mirror into a single linear volume.

# 假設 vg0 內有 sda1 及 sdb1, 現在 sdb1 壞了

# 加 sdc1 到 vg0

vgextend vg0 /dev/sdc1

# When replacing devices that are no longer visible on the system, use lvconvert --repair

lvconvert --repair vg0/lv0 sdc1

vgreduce --removemissing --test

vgreduce --removemissing vg0

Check:

lvs -a -o +devices

mirror_image_fault_policy and mirror_log_fault_policy

When an LVM mirror suffers a device failure, a two-stage recovery

first stage: involves removing the failed devices, reduced to a linear device

second stage: mirror_?_fault_policy

* remove

* allocate

Splitting Off a Redundant Image of a Mirrored Logical Volume

# specifying the number of redundant images to split off

lvconvert --splitmirrors 2 --name copy vg/lv

# 指定用那 leg 分離出來.

lvconvert --splitmirrors 2 --name copy vg/lv /dev/sd[ce]1

Combine striping and mirroring in a single logical volume

--mirrors X --stripes Y

Splitting Off a Redundant Image

# splits off a new logical volume named copy from the mirrored logical volume vg/lv
lvconvert --splitmirrors 2 --name copy vg/lv

# The new logical volume contains two mirror legs consisting of devices /dev/sdc1 and /dev/sde1.
lvconvert --splitmirrors 2 --name copy vg/lv /dev/sd[ce]1

Remark

* LVM is not safe in a power failure

 


RAID

 

LVM supports RAID1/4/5/6

New implementation of mirroring is raid1 (舊的叫 mirror)

  • It maintains a fully redundant bitmap area for each mirror image
    (no --mirrorlog or --corelog option)
  • It supports snapshots

raid1 v.s. mirror

  • raid1 not require I/O to be blocked while handling a failure
  • raid1 implementation using the MD, mirror using the DM

resync 時的 iotop

Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       3.76 K/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  44772 be/4 root        0.00 B/s    0.00 B/s  0.00 % 17.19 % [mdX_resync]
  44771 be/4 root        0.00 B/s    0.00 B/s  0.00 %  5.03 % [mdX_raid1]

Help

# Centos

man 7 lvmraid

Show RAID Level

lvs -o name,segtype

  LV            Type
  proxy         linear
  data_disk     raid1

# 結構

SubLVs

SubLVs hold LV data blocks, RAID parity blocks, and RAID metadata.

SubLVs are generally hidden, so the lvs -a option is required display them

SubLV names begin with the visible LV name, and have an automatic suffix indicating its role

(_rimage_N, _rmeta_N)

When you create a RAID logical volume, LVM creates a metadata subvolume

that is one extent in size for every data or parity subvolume in the array.

metadata subvolumes

  • lv_rmeta_0
  • lv_rmeta_1

data subvolumes

  • lv_rimage_0
  • lv_rimage_1

# Usage

# -m|--mirrors Number
# Specifies the number of mirror images in addition to the original LV image

# --type raid1 | mirror
# There are two mirroring implementations

lvcreate --type raid1 -m 1 -L 1G -n my_lv my_vg

# 常用 Checking CMD

# lvs ... VolumeGroup

lvs -a -o name,copy_percent,devices vg3t | grep vm_admin_data

  LV                       Cpy%Sync Devices
  vm_admin_data            10.62    vm_admin_data_rimage_0(0),vm_admin_data_rimage_1(0)
  [vm_admin_data_rimage_0]          /dev/sda(390144)
  [vm_admin_data_rimage_1]          /dev/sdf1(1)
  [vm_admin_data_rmeta_0]           /dev/sda(665088)
  [vm_admin_data_rmeta_1]           /dev/sdf1(0)

# Converting

# Linear -> raid1

# --type: convert a logical volume to another segment type (cache, cache-pool, raid1,  snapshot, thin,  or  thin-pool)

lvconvert --type raid1 -m 1 my_vg/my_lv

 * If the metadata image that pairs with the original LV cannot be placed on the same PV, the lvconvert will fail.

Notes: 指定加 RAID 在某 PV (sdd1)

lvconvert -m +1 my_vg/my_lv /dev/sdd1

 

# mirror -> raid1

lvconvert --type raid1 my_vg/my_lv

# raid1 -> Linear

lvconvert -m0 my_vg/my_lv

# specifies that you want to remove /dev/sda1

lvconvert -m0 my_vg/my_lv /dev/sda1

# RAID 多份

  • -m 1 => 2-way
  • -m 2 => 3-way

# lvconvert -m new_absolute_count vg/lv [removable_PVs]

# lvconvert -m +num_additional_images vg/lv [removable_PVs]

i.e.

lvconvert -m 2 my_vg/my_lv

# Resize

lvextend -L+100G /dev/vg3t/data_disk

  Extending 2 mirror images.
  Size of logical volume vg3t/data_disk changed from 1.00 TiB (262144 extents) to 1.10 TiB (287744 extents).
  Logical volume data_disk successfully resized.

Volume group 會用去了 200G

lvs -a -o name,copy_percent,devices | grep data_disk

  data_disk                 93.05    data_disk_rimage_0(0),data_disk_rimage_1(0)
  [data_disk_rimage_0]               /dev/sda(0)
  [data_disk_rimage_0]               /dev/sda(665091)       # 多了的 size
  [data_disk_rimage_1]               /dev/sdb1(12802)
  [data_disk_rimage_1]               /dev/sdb1(537091)
  [data_disk_rmeta_0]                /dev/sda(665089)
  [data_disk_rmeta_1]                /dev/sdb1(12801)

# Replacing a RAID device

# Remove the specified device PhysicalVolume and replace it with one that is available in the VG,

# or from a specific list of PVs specified on the command line following the LV name.

# for RAID types other than RAID1, removing a device would mean converting to a lower level RAID

lvconvert --replace disk_to_remove vg/lv [possible_replacements]

i.e.

# 將 /dev/sdb2 從 my_vg 的 my_lv 拿走

lvconvert --replace /dev/sdb2 my_vg/my_lv

# 用 sdd1 去取代 sdb1

lvconvert --replace /dev/sdb1 my_vg/my_lv /dev/sdd1

Scrubbing a RAID (man 7 lvmraid)

Scrubbing assumes that RAID metadata and bitmaps may be inaccurate,

  so it verifies all RAID metadata, LV data, and parity blocks.

check mode

only report the number of inconsistent blocks, it cannot report which blocks are inconsistent. (read-only)

repair mode

make the RAID LV data consistent,

  but it does not know which data is correct => may be consistent but incorrect data

  當不一至時, it chooses the block from the device that would be used during RAID intialization.

Usage

lvchange --syncaction {check|repair} vg/raid_lv

i.e.

lvchange --syncaction check vg3t/xpenology

dmesg

... md: data-check of RAID array mdX
... md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
... md: using maximum available idle IO bandwidth (but not more than 40960 KB/sec) for data-check.
... md: using 128k window, over a total of 524288000k.

Show status

lvs -o +raid_sync_action,raid_mismatch_count vg/lv

i.e.

lvs -o +raid_sync_action,raid_mismatch_count myraidvg/kvm

  LV   VG       Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert SyncAction Mismatches
  kvm  myraidvg rwi-aor--- 500.00g                                    0.12             check               0

# SyncAction (displays the current synchronization operation)

  • idle: All sync operations complete (doing nothing)
  • resync: Initializing an array or recovering after a machine failure
  • recover: Replacing a device in the array
  • check: Looking for array inconsistencies
  • repair: Looking for and repairing inconsistencies

# Cpy%Sync (progress of any of the raid_sync_action operations)

# Mismatches (number of discrepancies found during a check operation)

# lv_attr

(m)ismatches:

    shown after a scrubbing operation has detected that portions of the RAID are not coherent

(r)efresh

    indicates that a device in a RAID array has suffered a failure and the kernel regards it as failed

Rebuild specific PV (--rebuild PV)

# If specific PVs in a RAID LV are known to have corrupt data

# The data on those PVs can be reconstructed with:

i.e.

lvchange --rebuild PV_BAD LV

Limit Sync Speed

# control the rate at which sync operations

# Default Unit: kiB/sec/device   <-- per device

--maxrecoveryrate Rate[bBsSkKmMgG]

--minrecoveryrate Rate[bBsSkKmMgG]

ie.

lvchange --maxrecoveryrate 50m myraidvg/kvm

 

RAID1 Tuning

lvchange --writemostly PhysicalVolume[:{t|y|n}]

Marks a device(PV) in a RAID1 LV as write-mostly.

All reads to these drives will be avoided unless necessary.
(Setting this parameter keeps the number of I/O operations to the drive to a minimum.)

lvchange --writebehind IOCount

maximum number of outstanding writes that are allowed to devices in a RAID1 LV that are marked as write-mostly.

Once this value is exceeded, writes become synchronous
(causing all writes to the constituent devices to complete before the array signals the write has completed)

[Setting a RAID fault policy]

lvchange -ay --activationmode complete|degraded|partial LV

complete

The LV is only activated if all devices are present.

degraded

The LV is activated with missing devices

  if the RAID level can tolerate the number of missing devices without LV data loss.

lvm.conf

raid_fault_policy

    allocate

    attempt to replace the failed device with a spare device from the volume group.

    warn

    produce a warning and the log (/var/log/messages) will indicate that a device has failed

replace 左 failed pv 後

even though the failed device has been replaced, the display still indicates that LVM could not find the failed device.

To remove the failed device from the volume group, you can execute

vgreduce --removemissing VG

[Split]

# Temporarily split off (--trackchanges)

lvconvert --splitmirrors count --trackchanges vg/lv [removable_PVs]

# RAID1 只能分一份, 所以 "--splitmirrors 1", "vg/lv" 是要被分割的 LV

# An image of a RAID1 array for read-only

i.e.

lvconvert --splitmirrors 1 --trackchanges vg3t/vm_admin_data

vm_admin_data_rimage_1 split from vm_admin_data for read-only purposes.
Use 'lvconvert --merge vg3t/vm_admin_data_rimage_1' to merge back into vm_admin_data

lvs -a -o name,copy_percent,devices vg3t | grep vm_admin_data

LV                       Cpy%Sync Devices
vm_admin_data            100.00   vm_admin_data_rimage_0(0),vm_admin_data_rimage_1(0)
[vm_admin_data_rimage_0]          /dev/sda(390144)
vm_admin_data_rimage_1            /dev/sdf1(1)       # 不再是 [vm_admin_data_rimage_1]
[vm_admin_data_rmeta_0]           /dev/sda(665088)
[vm_admin_data_rmeta_1]           /dev/sdf1(0)

# Temporarily -> permanently split off

# without --trackchanges

lvconvert --splitmirrors count -n splitname vg/lv [removable_PVs]

ie.

lvconvert --splitmirrors 1 -n newVol vg3t/vm_admin_data

# Merge

# When you merge the image,

# (only the portions of the array that have changed since the image was split are resynced)

lvconvert --merge raid_image

i.e.

# vm_admin_data_rimage_1 是被 split 出來那邊

lvconvert --merge vg3t/vm_admin_data_rimage_1

  vg3t/vm_admin_data_rimage_1 successfully merged back into vg3t/vm_admin_data

Find the PV(s) that hold a LV in LVM

# Display the mapping of physical extents to logical volumes and logical extents.

lvdisplay -m /dev/mapper/myraidvg-mytestlw

  ....
  --- Segments ---
  Logical extents 0 to 2559:
    Type                linear
    Physical volume     /dev/sdb1
    Physical extents    128001 to 130560

lvdisplay -m /dev/mapper/myraidvg-kvm

  --- Segments ---
  Logical extents 0 to 127999:
    Type                raid1
    Monitoring          monitored
    Raid Data LV 0
      Logical volume    kvm_rimage_0
      Logical extents   0 to 127999
    Raid Data LV 1
      Logical volume    kvm_rimage_1
      Logical extents   0 to 127999
    Raid Metadata LV 0  kvm_rmeta_0
    Raid Metadata LV 1  kvm_rmeta_1

 


RAID Repair

 

情況: LVM RAID1 其中一隻 HDD hang hang 地. 相信它已出現故障.

1) 在 OS 層面 remove 它

echo 1 > /sys/block/sdb/device/delete

ls /dev/sdb

ls: cannot access '/dev/sdb': No such file or directory

2) 在 logical volume 層面 remove 它

lvs -a -o name,copy_percent,devices

  WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter.
  LV             Cpy%Sync Devices
  kvm            100.00   kvm_rimage_0(0),kvm_rimage_1(0)
  [kvm_rimage_0]          [unknown](1)
  [kvm_rimage_1]          /dev/sdc1(1)
  [kvm_rmeta_0]           [unknown](0)
  [kvm_rmeta_1]           /dev/sdc1(0)

vgreduce --removemissing myraidvg

  WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter.
  WARNING: Partial LV kvm needs to be repaired or removed.
  WARNING: Partial LV kvm_rimage_0 needs to be repaired or removed.
  WARNING: Partial LV kvm_rmeta_0 needs to be repaired or removed.
  There are still partial LVs in VG myraidvg.
  To remove them unconditionally use: vgreduce --removemissing --force.
  WARNING: Proceeding to remove empty missing PVs.

vgreduce --removemissing --force myraidvg

  WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter.
  Wrote out consistent volume group myraidvg.

lvs -a -o name,copy_percent,devices

  LV             Cpy%Sync Devices
  kvm            100.00   kvm_rimage_0(0),kvm_rimage_1(0)
  [kvm_rimage_0]
  [kvm_rimage_1]          /dev/sdb1(1)
  [kvm_rmeta_0]
  [kvm_rmeta_1]           /dev/sdb1(0)

3) 由 RAID1 轉成 RAID0

lvconvert -m0 myraidvg/kvm

  LV   Cpy%Sync Devices
  kvm           /dev/sdb1(1)

4) 再將它轉回 RAID1

 * VG(myraidvg) 內有其他 PV(sdc)

lvconvert --type raid1 -m1 myraidvg/kvm

lvs -a -o name,copy_percent,devices

  LV             Cpy%Sync Devices
  kvm            10.78    kvm_rimage_0(0),kvm_rimage_1(0)
  [kvm_rimage_0]          /dev/sdb1(1)
  [kvm_rimage_1]          /dev/sdc1(1)
  [kvm_rmeta_0]           /dev/sdb1(0)
  [kvm_rmeta_1]           /dev/sdc1(0)

Remark

# When replacing  devices that are no longer visible on the system

lvconvert --repair LV [NewPVs]

# When replacing devices that are still visible

lvconvert --replace OldPV LV [NewPV]

 


Stripe Volume

 

# Striped LV across 2 PV with a stripe of 64 kilobytes

# -I|--stripesize StripeSize (Default: 64.00 KiB)

# -i|--stripes Stripes       # specify how many devices to stripe over

lvcreate -L 10G -i 2 -I 64 -n TestStripe myraidvg

Info.

-m, --maps              # Display the mapping of logical extents to physical volumes and physical extents.

lvdisplay -m /dev/mapper/myraidvg-TestStripe

  --- Segments ---
  Logical extents 0 to 2559:
    Type                striped
    Stripes             2
    Stripe size         64.00 KiB
    Stripe 0:
      Physical volume   /dev/sdb1
      Physical extents  130561 to 131840
    Stripe 1:
      Physical volume   /dev/sdc1
      Physical extents  128001 to 129280

# As with linear volumes, you can specify the extents of the physical volume that you are using for the stripe.

# i.e. 100 extents

lvcreate -l 100 -i 2 -n stripelv testvg /dev/sda1:0-49 /dev/sdb1:50-99

Striped LV to Linear LV

 * There is not direct command to convert a Striped LV to Linear LV.
 * First you need to convert it to a mirrored LV and then to a Linear LV.
 * Add new disk enough large

lvs -a -o +devices

pvcreate /dev/sdc1

vgextend stripevg /dev/sdc1

vgs

# using physical extents sdc1 for allocation of new extents

lvconvert -m 1 stripevg/stripelv /dev/sdc1

lvs -a -o +devices

# freeing physical extents from sda1 sdb1

lvconvert -m 0 stripevg/stripelv /dev/sda1 /dev/sdb1

lvs -a -o +devices

Linear LV to striped LV

lvconvert -m 1 vg3t/backup

# striped mirror
# The number to the --stripes option must be equal to the number of PVs listed.
# This does not apply to existing allocated space, only newly allocated space can be striped.

lvconvert -m 1 --stripes 2 /dev/vgtest/lvol0 /dev/sdb1 /dev/sdc1

 


Config Backup

 

vgcfgbackup — backup volume group descriptor area

Default: all of VG will be backed up

Backup Path: /etc/lvm/backup

 * Metadata backups and archives are automatically created on every volume group and logical volume configuration change

    unless disabled in the lvm.conf
 


Thinly-Provisioned Logical Volumes

 

LVM thin pools instead allocates blocks when they are written.

DOC

man lvmthin

opts:

-T (--thin)  # to create either a thin pool or a thin volume.

i.e.

1. Create a thin pool named mythinpool in vg001

lvcreate -L 100M -T vg001/mythinpool

                            OR

lvcreate -L 100M --thinpool mythinpool vg001

2. Create a thin volume named thinvolume

# a virtual size for the volume that is greater than the pool that contains it

lvcreate -V1G -T vg001/mythinpool -n thinvolume

# checking

lvs

# resizes an existing thinpool

lvextend -L+100M vg001/mythinpool

P.S.

* Converting a logical volume to a thin pool volume destroys the content of the logical volume

 


LV 常見操作

 

為 LV 改名:

lvrename OldPath NewPath

Example 1:

lvrename /dev/vg02/lvold /dev/vg02/lvnew

lvrename VolumeGroupName OldLogicalVolumeName NewLogicalVolumeName

Example 2:

lvrename vg02 lvold lvnew

 


Remove PV from VG

 

vgreduce myraidvg /dev/sdb1

Physical volume "/dev/sdb1" still in use

Checking: pvs

# 會見到 sdb1 有 "a" 在 Attr

  PV         VG       Fmt  Attr PSize   PFree
  /dev/sdb1  myraidvg lvm2 a--    1.82t   1.33t
  /dev/sdc1  myraidvg lvm2 a--    1.82t   1.33t

解決

migrate the data to another physical volume using the pvmove command.

 


Attr

 

PV attributes

pvs

PV         VG     Fmt  Attr PSize PFree
/dev/vda2  centos lvm2 a--  9.51g    0

(a)llocated      # stop allocated: "pvchange -x n /dev/sdd1"

e(x)ported      # vgexport uavg

Other Opts

pvs --units k

VG Attributes

vgs

VG     #PV #LV #SN Attr   VSize VFree
 centos   1   2   0 wz--n- 9.51g    0

r,w     (r)ead & (w)rite permissions
z        resi(z)eable
x        e(x)ported
p        (p)artial
c,n,a,i     allocation policy
            (c)ontiguous,
            c(l)ing,
            (n)ormal,
            (a)nywhere,
            (i)nherited
c     (c)luster

LV Attributes

lvs

LV   VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
root centos -wi-ao---- 8.51g                                       
swap centos -wi-ao---- 1.00g 

First Field

m     (m)irrored
M     (M)irrored without intial sync
o     (o)rgin
p     (p)vmove
s     (s)napshot
S     invalid (S)napshot
v     (v)irtual
i     mirror (i)mage
l     mirror (I)mage without sync
c     under (c)onstruction
–     Simple Volume

Second Feild

w,r (Second Feild)     Permissions ‘(r)’ead ‘(w)’rite

Third Feild (Allocation policy)

c,I,n.a,I           (c)ontiguous, c(l)ing, (n)ormal, (a)nywhere, (i)nherited

Fourth Feild

m           Fixed (m)inor

Fifth Feild

(a)ctive,
(s)uspended,
(I)nvalid snapshot
S    Invalid (S)uspended snapshot
I    Mapped device present with (i)nactive table
d    Mapped (d)evice present with-out tables

Sixth Feild

o         device (o)pen (Volume is in active state or may be mounted )

 


lvm snapshoot 原理

 

# 準備

dd if=/dev/zero of=dummydevice bs=1M count=1024

losetup /dev/loop0 dummydevice

pvcreate /dev/loop0

vgcreate vg0 /dev/loop0

lvcreate -n lv0 -L 400M vg0

# 未 take snapshot 前的 table

dmsetup table

vg0-lv0: 0 819200 linear 7:0 2048

# 建立 snap ( name: snap1, size: 200Mbyte )

lvcreate -s -n snap1 -L 200M /dev/vg0/lv0

# take snapshot 後

lvs

  LV    VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv0   vg0    owi-a-s--- 400.00m                                    
  snap1 vg0    swi-a-s--- 200.00m      lv0    0.00

dmsetup table

vg0-lv0-real: 0 819200 linear 7:0 2048
vg0-snap1-cow: 0 409600 linear 7:0 821248
vg0-lv0: 0 819200 snapshot-origin 253:4
vg0-snap1: 0 819200 snapshot 253:4 253:5 P 8

# 刪除 snapshoot

# destroy vg0-snap1-cow, vg0-lv0 and vg0-snap1, and

# rename vg0-lv0-real to vg0-lv0.

lvremove vg0/snap1

# Doc

http://www.softpanorama.org/Internals/Unix_filesystems/snapshots.shtml

https://www.clevernetsystems.com/lvm-snapshots-explained/

 


Clone LV

 

方法1

舊: /dev/sda (10G)  --> mydata (myvgB)

新: /dev/sdc (20G)  --> mydata (myvgC)

CURRENT_LE=2000  # get exact "Current LE" value from lvdisplay

# 建立新的 PV, VG, LV

parted -a optimal /dev/sdc mklabel gpt mkpart p1 ext4 0% 100%

pvcreate /dev/sdc1

vgcreate myvgC /dev/sdc1

lvcreate -n mydata -l $CURRENT_LE myvgC

# umount

umount /dev/mapper/myvgB-mydata

# Clone

dd if=/dev/mapper/myvgB-mydata of=/dev/mapper/myvgC-mydata bs=4M

# resize

lvresize /dev/mapper/myvgC-mydata

fsck.ext4 -f -y /dev/mapper/myvgC-mydata

resize2fs /dev/mapper/myvgC-mydata

 


Toubleshoot

 

[1] Whole disk as PV

pvcreate /dev/sda

Can't open /dev/sda exclusively.  Mounted filesystem?

沒有好結果

  • mount | grep sda
  • lsof /dev/sda
  • pvcreate -vvvv /dev/sda

---------

OS Multipath is "stealing" and trying to make another failover layer/device ...

  • dmsetup ls           # 看不出問題

----------

被 md 開了

cat /proc/mdstat | grep sda

mdadm -S /dev/md127

 


Performance Testing

 

SATA: WDC WD2002FAEX-007BA0

Type

device  131M
linear  130M
raid1   133M
Stripe  262M

 


P.S.

 

  • resize2fs       # ext4
  • xfs_growfs    # xfs

 


Cheat List

 

pvs -a

lvs -v

vgs -o +devices

 

 

Creative Commons license icon Creative Commons license icon