更新時間: 2020-07-26
目錄
- 基礎: 建立LV
- 硬碟備份 - Snapshot
- 容量擴充 - Xextend
- 硬碟替換 - pvmove
- 搬遷
- 刪除 LV 與 VG
- 掛外來的 LVM
- lvs
- Mirror
- Mirror Repair
- RAID
- RAID Repair
- Stripe Volume
- Striped LV to Linear LV
- Config Backup
- Thinly-Provisioned Logical Volumes
- LV 常見操作
- Remove PV from VG
- Attr
- lvm snapshoot 原理
- Clone LV
- Toubleshoot
- Performance Testing
- P.S.
- Cheat List
- LV 常見操作
介紹
LVM 在商用的環境下應用實在不可或缺, 它可以應用在硬碟備份, 硬碟替換, 容量擴充等方面.
LVM works at the block level.
強大功能
- Online data relocation
- Flexible capacity
- Disk striping
- Mirroring volumes
- Volume Snapshots
它一共由三個部分所組成
- PV
- VG
- LV
LVM 自身支援
- RAID4/5/6
- Linear
- Striped
- Mirrored
DOC
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6...
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7...
基礎: 建立LV
把硬碟轉成為 PV
pvcreate /dev/sda # 會把整個硬碟轉成 PV
pvcreate /dev/sdb1 # 會把單一個 Partition 轉成 PV ( Partition ID: 0x8e )
!! 注意 !! 它會刪除硬碟上的資料
建立VG, 並把PV加到VG上
vgcreate vzvg /dev/hda1 /dev/hdb1
# 建立 VG: vzvg, 並把 PV: /dev/hda1 及 PV: /hdb1 加進去
# 如果 hda1 及 hdb1 分別有 10G 容量, 那 vzvg 就有 20G 容量了
建立一個 15GB 的 LV
lvcreate -L1500 -n mail vzvg
# 建立好的 LV 是在放在 /dev/mapper/vzvg-mail
# 命名方式是 GroupName-VolumeName
# 系統會自動建立 /dev/vzvg/mail -> /dev/mapper/vzvg-mail
# 到現在, 我們可以把 vzvg-mail 視作一個普通的 Block Device 來用
Review Info.
vgs
VG #PV #LV #SN Attr VSize VFree myvg 1 4 0 wz--n- 793.31g 73.31g vg3t 2 5 0 wz--n- 5.46t 613.00g
lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert kvm myvg -wi-ao---- 400.00g backup vg3t -wi-ao---- 500.00g
Opts
--units hHbBsSkKmMgGtTpPeE
Zero out empty space in a Volume Group
# Create a volume that consumes the empty space, then dd with zero
vgs --unit m
VG #PV #LV #SN Attr VSize VFree myvg 1 3 0 wz--n- 812352.00m 382272.00m
lvcreate -L 382272m -n /dev/mapper/myvg-emptyspace
# 別用 dd, 因為會影響其他 VPS
# dd if=/dev/zero of=/dev/mapper/myvg-emptyspace bs=16M
pv -L 50m /dev/zero > /dev/mapper/myvg-emptyspace
lvremove /dev/mapper/myvg-emptyspace
Get lvm info
lvm version
LVM version: 2.02.133(2) (2015-10-30) Library version: 1.02.110 (2015-10-30) Driver version: 4.37.0
# Displays the recognized built-in block device types
lvm devtypes
DevType MaxParts Description aoe 16 ATA over Ethernet ataraid 16 ATA Raid bcache 1 bcache block device cache ...
lvm formats
lvm1 pool lvm2
lvm segtypes
striped zero error snapshot mirror raid1 raid10 ...
啟用/停用 VG
lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
pc_data st3t -wi-a----- 1.00t
停用 VG
vgchange -a n st3t
Logical volume st3t/pc_data in use. Can't deactivate volume group "st3t" with 1 open logical volume(s)
因為 VG st3t 內的 LV pc_data 被 mount 了, 所以我們不能 in-active 它
umount 它次後 in-active VG
lvs
pc_data st3t -wi------- 1.00t
P.S.
in-active VG 後, /dev/mapper/st3t 會不見了, 要用 "vgchange -a y st3t" 去令它出現
VG 其他操作
Rename VG
vgrename OldVolumeGroup NewVolumeGroup
i.e.
vgs
VG #PV #LV #SN Attr VSize VFree myvg 1 4 0 wz--n- 793.31g 323.31g st3t 1 4 0 wz--n- 2.73t 196.52g
vgrename st3t vg3t
block device backup - Snapshot
建立 Snapshot:
# 這 5G free space 是用來作寫入 Buffer 的
# 當它 Full 時, Snapshot 就會被 drop, 那時你將會失去 舊版本 的檔案 !!(snapshot 內的檔案)
lvcreate -L 5G -s -n my_snapshot_name /path/to/lv
P.S.
snapshot 是可以寫入的. 當 snapshot 被 mount 到 /mnt/tmp 後, 我們可以對它寫入東西
不過, 新寫入 orig / snap 的東西是會占用 "Allocated to snapshot"
Delete 新建立的 File 是不會 release "Allocated to snapshot"
原理
When a change is made to the original device after a snapshot is taken,
the snapshot feature makes a copy of the changed data area as it was prior to the change
so that it can reconstruct the state of the device.
When you create a snapshot file system, full read and write access to the origin stays possible.
If a chunk on a snapshot is changed, that chunk is marked and never gets copied from the original volume.
the snapshot contains the old data, while the LV holds the current data.
Overfilling the snapshot will simply mean no more old data is saved as it's changed
查看 snapshot 的資料:
lvdisplay [-c] <-- -c 會有一行一個的效果
i.e.
lvdisplay /dev/mapper/myraidvg-snap_mytestlw
--- Logical volume --- LV Path /dev/myraidvg/snap_mytestlw LV Name snap_mytestlw VG Name myraidvg LV UUID yoOJsN-6ASk-fj3c-vU4H-uczN-5uLw-N51z6O LV Write Access read/write LV Creation host, time server, 2018-01-03 14:57:32 +0800 LV snapshot status active destination for mytestlw LV Status available # open 1 LV Size 10.00 GiB Current LE 2560 COW-table size 1.00 GiB COW-table LE 256 Allocated to snapshot 0.00% Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 254:10
lvs | grep MyBackup.Snap
MyBackup.Snap myraidvg swi-a-s--- 1.00t MyBackup 0.10
* sync 完才見到 'Allocated to snapshot' 更新
當 Full 了時:
mount | grep snap
mount 不見了
dmesg
[78560.181614] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.
lvdisplay /dev/mapper/myraidvg-snap_mytestlw
LV snapshot status INACTIVE destination for mytestlw
再 mount 亦只會失敗
[78662.630747] Buffer I/O error on dev dm-10, logical block 2621424, async page read [78662.630951] Buffer I/O error on dev dm-10, logical block 16, async page read [78662.631055] EXT4-fs (dm-10): unable to read superblock [78662.631150] EXT4-fs (dm-10): unable to read superblock [78662.631240] EXT4-fs (dm-10): unable to read superblock
移除 Snapshot:
# 此時會有很多的 IO, 因為新寫入的資料要寫回原來的 LV
# -y|--yes Do not prompt for confirmation interactively
lvremove -y /path/to/snapshot
Remark:
[1]
* Table snapshoot 後, 寫東西會變得很慢, 所以 snapshot 狀態不應該 keep
LVM first makes a copy of the original version which is stored in the snapshot,
and then the modification is written on the normal Logical-Volume.
So the normal Logical-Volume always contains the latest version of the data and the snapshot only contains a copy of the blocks which have been modified.
建立了 snapshot 後, file system 的效能會下降, 因為每次寫入都會引發2次"寫入", 一次是 backup 舊版本, 一次是寫入新版本
Checking
dstat -d -D sdb
--dsk/sdb-- read writ 0 32M 0 31M 0 31M 0 33M 0 37M 0 36M 61M 19M 124M 0 124M 0 73M 64M 27M 93M 0 107M 0 76M 0 41M 0 46M 0 37M 0 30M
* If the snapshot logical volume becomes full it will be dropped
so it is vitally important to allocate enough space.
將 LV 搬到另一 VG
# 將 LV - icy 由 VG - myvg 移到 vg3t
lvdisplay --units m /dev/myvg/icy
lvcreate --name icy --size 30720.00 vg3t
# 如果 VPS 沒有運行, 那可以跳過此步
lvcreate --snapshot --name icy-snap --size 1G /dev/myvg/icy
# 停一停, 諗一諗
dd if=/dev/myvg/icy-snap of=/dev/vg3t/icy bs=16M
# Cleanup
lvremove /dev/myvg/icy-snap
lvremove /dev/myvg/icy
容量擴充 - Xextend
擴充方式
- 加新的 PV 到 VG
- Resize Disk
加新的 PV 到 VG
pvcreate /dev/sdc1
blkid /dev/sdc1
/dev/sdc1: UUID="..." TYPE="LVM2_member" PARTLABEL="myraidvg-b" PARTUUID="..."
pvs
PV VG Fmt Attr PSize PFree
/dev/sdb1 myraidvg lvm2 a-- 1.82t 1.33t
/dev/sdc1 lvm2 --- 1.82t 1.82t
vgextend myraidvg /dev/sdc1
Volume group "myraidvg" successfully extended
pvs
PV VG Fmt Attr PSize PFree
/dev/sdb1 myraidvg lvm2 a-- 1.82t 1.33t
/dev/sdc1 myraidvg lvm2 a-- 1.82t 1.82t
Resize Disk
echo 1 > /sys/block/sda/device/rescan # 對 virtio-blk 無效
parted /dev/vda print # Verify
pvresize /dev/sda2 # Expand a PV after enlarging the partition
增大 LV 的容量
lvextend -L12G /dev/vzvg/homevol # 直接設定 absolute size
OR
lvextend -L+1G /dev/vzvg/homevol # 增加 size
OR
lvextend -l+256 /dev/vzvg/homevol # size in units of logical extents
Percentage
只有 -l 支持 %STRING, 而 -L 是不支援的
# %VG percentage of the total space in the VG
# %FREE remaining free space in the VG
lvextend -l100%VG /dev/vzvg/homevol # 在一個 VG 內只有一個 LV 時很有用
lvextend -l+100%FREE /dev/vzvg/homevol
必須要加 "+", 幸好 lvextend 有 check size !!
New size given (2571 extents) not larger than existing size (2935 extents)
增加 file system 的容量
減容量
硬碟替換 - pvmove
vgextend vzvg /dev/sda1
pvmove /dev/hda1
vgreduce vzvg /dev/hda1
解釋:
pvmove SourcePV
pvmove SourcePV DestinationPV
# move allocated physical extents on Source to one or more other PVs
# All PE that are used by simple LV on /dev/hda1 to free PE elsewhere in the VG
P.S.
If pvmove gets interrupted for any reason, run "pvmove" again without any PV arguments
to restart any moves that were in progress from the last checkpoint
pvmove opt:
--atomic # Make the entire operation atomic.
-i, --interval sec # Report progress as a percentage at regular intervals.
-b, --background # Run the daemon in the background.
more i.e
# moves all allocated space off sdd1, reports the progress (%) every 5 sec
pvmove -i5 /dev/sdd1
# sdc1 -> sdf1 in background
pvmove -b /dev/sdc1 /dev/sdf1
vgreduce vzvg /dev/hda1
# remove one or more unused PV from a VG
# vgreduce VG PVPath...
搬遷
舊機:
# make inactive VG unknown to the system
# You can then move all the PV in that VG to a different system for later vgimport
# vgexport clears the VG system ID
vgexport vzvg
新機:
# vgimport sets the VG system ID
vgimport vzvg
刪除 LV 與 VG
刪除 LV:
# lvremove removes one or more logical volumes.
# Confirmation will be requested before deactivating any active logical volume prior to removal.
# Removing an origin logical volume will also remove all dependent snapshots.
# remove by full path
lvremove /dev/vzvg/lamp
# remove vol1 in vg00
lvremove -f vg00/vol1
# Remove all logical volumes in volume group vg00:
lvremove vg00
刪除 VG:
vgchange -a n vzvg
vgremove vzvg
掛外來的 LVM
1. pvscan
When called without the --cache option, pvscan lists PVs on the system
pvscan
/dev/sda2 Total: 1 [465.26 GiB] / in use: 1 [465.26 GiB] / in no VG: 0 [0 ]
Opts
- -t|--test # Run in test mode. Commands will not update metadata.
- -v|--verbose # Set verbose level. (-v ~ -vvvv)
# This first clears all existing PV online records,
# then scans all devices on the system,
# adding PV online records for any PVs that are found.
pvscan --cache device
If device is present, lvm adds a record that the PV on device is online.
If device is not present, lvm removes the online record for the PV.
2. vgscan [-v]
scans all SCSI, (E)IDE disks, multiple devices and a bunch of other disk devices in the system looking for LVM physical volumes and volume groups.
output:
Reading all physical volumes. This may take a while... Found volume group "myvg" using metadata type lvm2
3. lvscan
inactive '/dev/myvg/swap' [7.00 GiB] inherit inactive '/dev/myvg/root' [96.00 GiB] inherit inactive '/dev/myvg/data' [346.27 GiB] inherit
4. activate the volume group
vgchange -a y
最重要的一步, 之前那3步都是在查看東西.
5. mount
mount /dev/VolGroup00/LogVol00 /mnt
lvs
lvs - report information about logical volumes
[-a|--all]
[-o|--options [+]Field[,Field]]
[-v|--verbose]
i.e.
lvs -a -o name,copy_percent,devices vg00
Mirror (舊方案)
LVM 的 mirror 是在 PV 上進行的 (sync).
Opts:
# -m 1 (2 份 copies )
# -m 2 (3 份 copies )
# 建立 Mirror, Defualt mirror sector size: 512KB (-R x , Unit: MB)(lvm.conf: mirror_region_size=512 )
i.e. 在 vg0 上建立一個 10G 的 mirror LV (vg0 至小要有 2 leg)
lvcreate -L 10G -m 1 -n mirrorlv vg0
--nosync
# 新建立 mirror LV 時, 可以用 --nosync 參數, 相當於沒有 initial synchronization
轉換成 mirror
將 LV 轉成 mirror LV 可以用 "-b" option
此操作會在 background 進行 conversion (sync data)
mirror 轉換成一般 LV
lvconvert -m 0 vg0/lvol1
LOG:
# LVM maintains a small log which it uses to keep track of which regions are in sync
--mirrorlog {disk|core|mirrored}
# --corelog is equivalent to --mirrorlog core
--mirrorlog core => kept log in memory <-- resynchronized at every reboot
# Create a mirror log
--mirrorlog mirrored
# Default: mirror log 是建立在不參與 mirror 的碟上, 以下指令使 log 可以放其中一碟上.
--alloc anywhere <-- degrade performance
# 查看 mirror 的結構
lvs -a -o +devices
...
- _mimage_0
- _mimage_1
- _mlog
(進階) 設定在那 leg 放 log
# lvcreate ... data_leg1 data_leg2 log_leg
lvcreate -L 500M -m1 -n mirrorlv vg0 /dev/sda1 /dev/sdb1 /dev/sdc1
Mirror Repair
# brought up in degraded mode (disk failed)
vgchange -ay --partial vg0
# replace disk
* When fail occurs, LVM converts the mirror into a single linear volume.
# 假設 vg0 內有 sda1 及 sdb1, 現在 sdb1 壞了
# 加 sdc1 到 vg0
vgextend vg0 /dev/sdc1
# When replacing devices that are no longer visible on the system, use lvconvert --repair
lvconvert --repair vg0/lv0 sdc1
vgreduce --removemissing --test
vgreduce --removemissing vg0
Check:
lvs -a -o +devices
mirror_image_fault_policy and mirror_log_fault_policy
When an LVM mirror suffers a device failure, a two-stage recovery
first stage: involves removing the failed devices, reduced to a linear device
second stage: mirror_?_fault_policy
* remove
* allocate
Splitting Off a Redundant Image of a Mirrored Logical Volume
# specifying the number of redundant images to split off
lvconvert --splitmirrors 2 --name copy vg/lv
# 指定用那 leg 分離出來.
lvconvert --splitmirrors 2 --name copy vg/lv /dev/sd[ce]1
Combine striping and mirroring in a single logical volume
--mirrors X --stripes Y
Splitting Off a Redundant Image
# splits off a new logical volume named copy from the mirrored logical volume vg/lv
lvconvert --splitmirrors 2 --name copy vg/lv
# The new logical volume contains two mirror legs consisting of devices /dev/sdc1 and /dev/sde1.
lvconvert --splitmirrors 2 --name copy vg/lv /dev/sd[ce]1
Remark
* LVM is not safe in a power failure
RAID
LVM supports RAID1/4/5/6
New implementation of mirroring is raid1 (舊的叫 mirror)
-
It maintains a fully redundant bitmap area for each mirror image
(no --mirrorlog or --corelog option) - It supports snapshots
raid1 v.s. mirror
- raid1 not require I/O to be blocked while handling a failure
- raid1 implementation using the MD, mirror using the DM
resync 時的 iotop
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 3.76 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 44772 be/4 root 0.00 B/s 0.00 B/s 0.00 % 17.19 % [mdX_resync] 44771 be/4 root 0.00 B/s 0.00 B/s 0.00 % 5.03 % [mdX_raid1]
Help
# Centos
man 7 lvmraid
Show RAID Level
lvs -o name,segtype
LV Type proxy linear data_disk raid1
# 結構
SubLVs
SubLVs hold LV data blocks, RAID parity blocks, and RAID metadata.
SubLVs are generally hidden, so the lvs -a option is required display them
SubLV names begin with the visible LV name, and have an automatic suffix indicating its role
(_rimage_N, _rmeta_N)
When you create a RAID logical volume, LVM creates a metadata subvolume
that is one extent in size for every data or parity subvolume in the array.
metadata subvolumes
- lv_rmeta_0
- lv_rmeta_1
data subvolumes
- lv_rimage_0
- lv_rimage_1
# Usage
# -m|--mirrors Number
# Specifies the number of mirror images in addition to the original LV image
# --type raid1 | mirror
# There are two mirroring implementations
lvcreate --type raid1 -m 1 -L 1G -n my_lv my_vg
# 常用 Checking CMD
# lvs ... VolumeGroup
lvs -a -o name,copy_percent,devices vg3t | grep vm_admin_data
LV Cpy%Sync Devices
vm_admin_data 10.62 vm_admin_data_rimage_0(0),vm_admin_data_rimage_1(0)
[vm_admin_data_rimage_0] /dev/sda(390144)
[vm_admin_data_rimage_1] /dev/sdf1(1)
[vm_admin_data_rmeta_0] /dev/sda(665088)
[vm_admin_data_rmeta_1] /dev/sdf1(0)
# Converting
# Linear -> raid1
# --type: convert a logical volume to another segment type (cache, cache-pool, raid1, snapshot, thin, or thin-pool)
lvconvert --type raid1 -m 1 my_vg/my_lv
* If the metadata image that pairs with the original LV cannot be placed on the same PV, the lvconvert will fail.
Notes: 指定加 RAID 在某 PV (sdd1)
lvconvert -m +1 my_vg/my_lv /dev/sdd1
# mirror -> raid1
lvconvert --type raid1 my_vg/my_lv
# raid1 -> Linear
lvconvert -m0 my_vg/my_lv
# specifies that you want to remove /dev/sda1
lvconvert -m0 my_vg/my_lv /dev/sda1
# RAID 多份
- -m 1 => 2-way
- -m 2 => 3-way
# lvconvert -m new_absolute_count vg/lv [removable_PVs]
# lvconvert -m +num_additional_images vg/lv [removable_PVs]
i.e.
lvconvert -m 2 my_vg/my_lv
# Resize
lvextend -L+100G /dev/vg3t/data_disk
Extending 2 mirror images. Size of logical volume vg3t/data_disk changed from 1.00 TiB (262144 extents) to 1.10 TiB (287744 extents). Logical volume data_disk successfully resized.
Volume group 會用去了 200G
lvs -a -o name,copy_percent,devices | grep data_disk
data_disk 93.05 data_disk_rimage_0(0),data_disk_rimage_1(0) [data_disk_rimage_0] /dev/sda(0) [data_disk_rimage_0] /dev/sda(665091) # 多了的 size [data_disk_rimage_1] /dev/sdb1(12802) [data_disk_rimage_1] /dev/sdb1(537091) [data_disk_rmeta_0] /dev/sda(665089) [data_disk_rmeta_1] /dev/sdb1(12801)
# Replacing a RAID device
# Remove the specified device PhysicalVolume and replace it with one that is available in the VG,
# or from a specific list of PVs specified on the command line following the LV name.
# for RAID types other than RAID1, removing a device would mean converting to a lower level RAID
lvconvert --replace disk_to_remove vg/lv [possible_replacements]
i.e.
# 將 /dev/sdb2 從 my_vg 的 my_lv 拿走
lvconvert --replace /dev/sdb2 my_vg/my_lv
# 用 sdd1 去取代 sdb1
lvconvert --replace /dev/sdb1 my_vg/my_lv /dev/sdd1
Scrubbing a RAID (man 7 lvmraid)
Scrubbing assumes that RAID metadata and bitmaps may be inaccurate,
so it verifies all RAID metadata, LV data, and parity blocks.
check mode
only report the number of inconsistent blocks, it cannot report which blocks are inconsistent. (read-only)
repair mode
make the RAID LV data consistent,
but it does not know which data is correct => may be consistent but incorrect data
當不一至時, it chooses the block from the device that would be used during RAID intialization.
Usage
lvchange --syncaction {check|repair} vg/raid_lv
i.e.
lvchange --syncaction check vg3t/xpenology
dmesg
... md: data-check of RAID array mdX ... md: minimum _guaranteed_ speed: 1000 KB/sec/disk. ... md: using maximum available idle IO bandwidth (but not more than 40960 KB/sec) for data-check. ... md: using 128k window, over a total of 524288000k.
Show status
lvs -o +raid_sync_action,raid_mismatch_count vg/lv
i.e.
lvs -o +raid_sync_action,raid_mismatch_count myraidvg/kvm
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert SyncAction Mismatches
kvm myraidvg rwi-aor--- 500.00g 0.12 check 0
# SyncAction (displays the current synchronization operation)
- idle: All sync operations complete (doing nothing)
- resync: Initializing an array or recovering after a machine failure
- recover: Replacing a device in the array
- check: Looking for array inconsistencies
- repair: Looking for and repairing inconsistencies
# Cpy%Sync (progress of any of the raid_sync_action operations)
# Mismatches (number of discrepancies found during a check operation)
# lv_attr
(m)ismatches:
shown after a scrubbing operation has detected that portions of the RAID are not coherent
(r)efresh
indicates that a device in a RAID array has suffered a failure and the kernel regards it as failed
Rebuild specific PV (--rebuild PV)
# If specific PVs in a RAID LV are known to have corrupt data
# The data on those PVs can be reconstructed with:
i.e.
lvchange --rebuild PV_BAD LV
Limit Sync Speed
# control the rate at which sync operations
# Default Unit: kiB/sec/device <-- per device
--maxrecoveryrate Rate[bBsSkKmMgG]
--minrecoveryrate Rate[bBsSkKmMgG]
ie.
lvchange --maxrecoveryrate 50m myraidvg/kvm
RAID1 Tuning
lvchange --writemostly PhysicalVolume[:{t|y|n}]
Marks a device(PV) in a RAID1 LV as write-mostly.
All reads to these drives will be avoided unless necessary.
(Setting this parameter keeps the number of I/O operations to the drive to a minimum.)
lvchange --writebehind IOCount
maximum number of outstanding writes that are allowed to devices in a RAID1 LV that are marked as write-mostly.
Once this value is exceeded, writes become synchronous
(causing all writes to the constituent devices to complete before the array signals the write has completed)
[Setting a RAID fault policy]
lvchange -ay --activationmode complete|degraded|partial LV
complete
The LV is only activated if all devices are present.
degraded
The LV is activated with missing devices
if the RAID level can tolerate the number of missing devices without LV data loss.
lvm.conf
raid_fault_policy
allocate
attempt to replace the failed device with a spare device from the volume group.
warn
produce a warning and the log (/var/log/messages) will indicate that a device has failed
replace 左 failed pv 後
even though the failed device has been replaced, the display still indicates that LVM could not find the failed device.
To remove the failed device from the volume group, you can execute
vgreduce --removemissing VG
[Split]
# Temporarily split off (--trackchanges)
lvconvert --splitmirrors count --trackchanges vg/lv [removable_PVs]
# RAID1 只能分一份, 所以 "--splitmirrors 1", "vg/lv" 是要被分割的 LV
# An image of a RAID1 array for read-only
i.e.
lvconvert --splitmirrors 1 --trackchanges vg3t/vm_admin_data
vm_admin_data_rimage_1 split from vm_admin_data for read-only purposes.
Use 'lvconvert --merge vg3t/vm_admin_data_rimage_1' to merge back into vm_admin_data
lvs -a -o name,copy_percent,devices vg3t | grep vm_admin_data
LV Cpy%Sync Devices vm_admin_data 100.00 vm_admin_data_rimage_0(0),vm_admin_data_rimage_1(0) [vm_admin_data_rimage_0] /dev/sda(390144) vm_admin_data_rimage_1 /dev/sdf1(1) # 不再是 [vm_admin_data_rimage_1] [vm_admin_data_rmeta_0] /dev/sda(665088) [vm_admin_data_rmeta_1] /dev/sdf1(0)
# Temporarily -> permanently split off
# without --trackchanges
lvconvert --splitmirrors count -n splitname vg/lv [removable_PVs]
ie.
lvconvert --splitmirrors 1 -n newVol vg3t/vm_admin_data
# Merge
# When you merge the image,
# (only the portions of the array that have changed since the image was split are resynced)
lvconvert --merge raid_image
i.e.
# vm_admin_data_rimage_1 是被 split 出來那邊
lvconvert --merge vg3t/vm_admin_data_rimage_1
vg3t/vm_admin_data_rimage_1 successfully merged back into vg3t/vm_admin_data
Find the PV(s) that hold a LV in LVM
# Display the mapping of physical extents to logical volumes and logical extents.
lvdisplay -m /dev/mapper/myraidvg-mytestlw
....
--- Segments ---
Logical extents 0 to 2559:
Type linear
Physical volume /dev/sdb1
Physical extents 128001 to 130560
lvdisplay -m /dev/mapper/myraidvg-kvm
--- Segments --- Logical extents 0 to 127999: Type raid1 Monitoring monitored Raid Data LV 0 Logical volume kvm_rimage_0 Logical extents 0 to 127999 Raid Data LV 1 Logical volume kvm_rimage_1 Logical extents 0 to 127999 Raid Metadata LV 0 kvm_rmeta_0 Raid Metadata LV 1 kvm_rmeta_1
RAID Repair
情況: LVM RAID1 其中一隻 HDD hang hang 地. 相信它已出現故障.
1) 在 OS 層面 remove 它
echo 1 > /sys/block/sdb/device/delete
ls /dev/sdb
ls: cannot access '/dev/sdb': No such file or directory
2) 在 logical volume 層面 remove 它
lvs -a -o name,copy_percent,devices
WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter. LV Cpy%Sync Devices kvm 100.00 kvm_rimage_0(0),kvm_rimage_1(0) [kvm_rimage_0] [unknown](1) [kvm_rimage_1] /dev/sdc1(1) [kvm_rmeta_0] [unknown](0) [kvm_rmeta_1] /dev/sdc1(0)
vgreduce --removemissing myraidvg
WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter.
WARNING: Partial LV kvm needs to be repaired or removed.
WARNING: Partial LV kvm_rimage_0 needs to be repaired or removed.
WARNING: Partial LV kvm_rmeta_0 needs to be repaired or removed.
There are still partial LVs in VG myraidvg.
To remove them unconditionally use: vgreduce --removemissing --force.
WARNING: Proceeding to remove empty missing PVs.
vgreduce --removemissing --force myraidvg
WARNING: Device for PV SYq1Im-wF7f-QsOX-voEv-kNQO-2IcP-eqe2gN not found or rejected by a filter. Wrote out consistent volume group myraidvg.
lvs -a -o name,copy_percent,devices
LV Cpy%Sync Devices kvm 100.00 kvm_rimage_0(0),kvm_rimage_1(0) [kvm_rimage_0] [kvm_rimage_1] /dev/sdb1(1) [kvm_rmeta_0] [kvm_rmeta_1] /dev/sdb1(0)
3) 由 RAID1 轉成 RAID0
lvconvert -m0 myraidvg/kvm
LV Cpy%Sync Devices kvm /dev/sdb1(1)
4) 再將它轉回 RAID1
* VG(myraidvg) 內有其他 PV(sdc)
lvconvert --type raid1 -m1 myraidvg/kvm
lvs -a -o name,copy_percent,devices
LV Cpy%Sync Devices kvm 10.78 kvm_rimage_0(0),kvm_rimage_1(0) [kvm_rimage_0] /dev/sdb1(1) [kvm_rimage_1] /dev/sdc1(1) [kvm_rmeta_0] /dev/sdb1(0) [kvm_rmeta_1] /dev/sdc1(0)
Remark
# When replacing devices that are no longer visible on the system
lvconvert --repair LV [NewPVs]
# When replacing devices that are still visible
lvconvert --replace OldPV LV [NewPV]
Stripe Volume
# Striped LV across 2 PV with a stripe of 64 kilobytes
# -I|--stripesize StripeSize (Default: 64.00 KiB)
# -i|--stripes Stripes # specify how many devices to stripe over
lvcreate -L 10G -i 2 -I 64 -n TestStripe myraidvg
Info.
-m, --maps # Display the mapping of logical extents to physical volumes and physical extents.
lvdisplay -m /dev/mapper/myraidvg-TestStripe
--- Segments --- Logical extents 0 to 2559: Type striped Stripes 2 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/sdb1 Physical extents 130561 to 131840 Stripe 1: Physical volume /dev/sdc1 Physical extents 128001 to 129280
# As with linear volumes, you can specify the extents of the physical volume that you are using for the stripe.
# i.e. 100 extents
lvcreate -l 100 -i 2 -n stripelv testvg /dev/sda1:0-49 /dev/sdb1:50-99
Striped LV to Linear LV
* There is not direct command to convert a Striped LV to Linear LV.
* First you need to convert it to a mirrored LV and then to a Linear LV.
* Add new disk enough large
lvs -a -o +devices
pvcreate /dev/sdc1
vgextend stripevg /dev/sdc1
vgs
# using physical extents sdc1 for allocation of new extents
lvconvert -m 1 stripevg/stripelv /dev/sdc1
lvs -a -o +devices
# freeing physical extents from sda1 sdb1
lvconvert -m 0 stripevg/stripelv /dev/sda1 /dev/sdb1
lvs -a -o +devices
Linear LV to striped LV
lvconvert -m 1 vg3t/backup
# striped mirror
# The number to the --stripes option must be equal to the number of PVs listed.
# This does not apply to existing allocated space, only newly allocated space can be striped.
lvconvert -m 1 --stripes 2 /dev/vgtest/lvol0 /dev/sdb1 /dev/sdc1
Config Backup
vgcfgbackup — backup volume group descriptor area
Default: all of VG will be backed up
Backup Path: /etc/lvm/backup
* Metadata backups and archives are automatically created on every volume group and logical volume configuration change
unless disabled in the lvm.conf
Thinly-Provisioned Logical Volumes
LVM thin pools instead allocates blocks when they are written.
DOC
man lvmthin
opts:
-T (--thin) # to create either a thin pool or a thin volume.
i.e.
1. Create a thin pool named mythinpool in vg001
lvcreate -L 100M -T vg001/mythinpool
OR
lvcreate -L 100M --thinpool mythinpool vg001
2. Create a thin volume named thinvolume
# a virtual size for the volume that is greater than the pool that contains it
lvcreate -V1G -T vg001/mythinpool -n thinvolume
# checking
lvs
# resizes an existing thinpool
lvextend -L+100M vg001/mythinpool
P.S.
* Converting a logical volume to a thin pool volume destroys the content of the logical volume
LV 常見操作
為 LV 改名:
lvrename OldPath NewPath
Example 1:
lvrename /dev/vg02/lvold /dev/vg02/lvnew
lvrename VolumeGroupName OldLogicalVolumeName NewLogicalVolumeName
Example 2:
lvrename vg02 lvold lvnew
Remove PV from VG
vgreduce myraidvg /dev/sdb1
Physical volume "/dev/sdb1" still in use
Checking: pvs
# 會見到 sdb1 有 "a" 在 Attr
PV VG Fmt Attr PSize PFree
/dev/sdb1 myraidvg lvm2 a-- 1.82t 1.33t
/dev/sdc1 myraidvg lvm2 a-- 1.82t 1.33t
解決
migrate the data to another physical volume using the pvmove command.
Attr
PV attributes
pvs
PV VG Fmt Attr PSize PFree /dev/vda2 centos lvm2 a-- 9.51g 0
(a)llocated # stop allocated: "pvchange -x n /dev/sdd1"
e(x)ported # vgexport uavg
Other Opts
pvs --units k
VG Attributes
vgs
VG #PV #LV #SN Attr VSize VFree centos 1 2 0 wz--n- 9.51g 0
r,w (r)ead & (w)rite permissions
z resi(z)eable
x e(x)ported
p (p)artial
c,n,a,i allocation policy
(c)ontiguous,
c(l)ing,
(n)ormal,
(a)nywhere,
(i)nherited
c (c)luster
LV Attributes
lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root centos -wi-ao---- 8.51g swap centos -wi-ao---- 1.00g
First Field
m (m)irrored
M (M)irrored without intial sync
o (o)rgin
p (p)vmove
s (s)napshot
S invalid (S)napshot
v (v)irtual
i mirror (i)mage
l mirror (I)mage without sync
c under (c)onstruction
– Simple Volume
Second Feild
w,r (Second Feild) Permissions ‘(r)’ead ‘(w)’rite
Third Feild (Allocation policy)
c,I,n.a,I (c)ontiguous, c(l)ing, (n)ormal, (a)nywhere, (i)nherited
Fourth Feild
m Fixed (m)inor
Fifth Feild
(a)ctive,
(s)uspended,
(I)nvalid snapshot
S Invalid (S)uspended snapshot
I Mapped device present with (i)nactive table
d Mapped (d)evice present with-out tables
Sixth Feild
o device (o)pen (Volume is in active state or may be mounted )
lvm snapshoot 原理
# 準備
dd if=/dev/zero of=dummydevice bs=1M count=1024
losetup /dev/loop0 dummydevice
pvcreate /dev/loop0
vgcreate vg0 /dev/loop0
lvcreate -n lv0 -L 400M vg0
# 未 take snapshot 前的 table
dmsetup table
vg0-lv0: 0 819200 linear 7:0 2048
# 建立 snap ( name: snap1, size: 200Mbyte )
lvcreate -s -n snap1 -L 200M /dev/vg0/lv0
# take snapshot 後
lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv0 vg0 owi-a-s--- 400.00m
snap1 vg0 swi-a-s--- 200.00m lv0 0.00
dmsetup table
vg0-lv0-real: 0 819200 linear 7:0 2048 vg0-snap1-cow: 0 409600 linear 7:0 821248 vg0-lv0: 0 819200 snapshot-origin 253:4 vg0-snap1: 0 819200 snapshot 253:4 253:5 P 8
# 刪除 snapshoot
# destroy vg0-snap1-cow, vg0-lv0 and vg0-snap1, and
# rename vg0-lv0-real to vg0-lv0.
lvremove vg0/snap1
# Doc
http://www.softpanorama.org/Internals/Unix_filesystems/snapshots.shtml
https://www.clevernetsystems.com/lvm-snapshots-explained/
Clone LV
方法1
舊: /dev/sda (10G) --> mydata (myvgB)
新: /dev/sdc (20G) --> mydata (myvgC)
CURRENT_LE=2000 # get exact "Current LE" value from lvdisplay
# 建立新的 PV, VG, LV
parted -a optimal /dev/sdc mklabel gpt mkpart p1 ext4 0% 100%
pvcreate /dev/sdc1
vgcreate myvgC /dev/sdc1
lvcreate -n mydata -l $CURRENT_LE myvgC
# umount
umount /dev/mapper/myvgB-mydata
# Clone
dd if=/dev/mapper/myvgB-mydata of=/dev/mapper/myvgC-mydata bs=4M
# resize
lvresize /dev/mapper/myvgC-mydata
fsck.ext4 -f -y /dev/mapper/myvgC-mydata
resize2fs /dev/mapper/myvgC-mydata
Toubleshoot
[1] Whole disk as PV
pvcreate /dev/sda
Can't open /dev/sda exclusively. Mounted filesystem?
沒有好結果
- mount | grep sda
- lsof /dev/sda
- pvcreate -vvvv /dev/sda
---------
OS Multipath is "stealing" and trying to make another failover layer/device ...
- dmsetup ls # 看不出問題
----------
被 md 開了
cat /proc/mdstat | grep sda
mdadm -S /dev/md127
Performance Testing
SATA: WDC WD2002FAEX-007BA0
Type
device 131M linear 130M raid1 133M Stripe 262M
P.S.
- resize2fs # ext4
- xfs_growfs # xfs
Cheat List
pvs -a
lvs -v
vgs -o +devices