Backup - ghettoVCB

最後更新: 2019-09-13

介紹

 

ghettoVCB.sh is a script

backup mediums: LOCAL STORAGE, SAN and NFS

tested on ESX 3.5/4.x/5.x and ESXi 3.5/4.x/5.x

1. The script takes snapshots of live running virtual machines,
2. backs up the  master VMDK(s) and then upon completion,
3. deletes the snapshot until the next backup.
4. can be setup to run via cron.

* Support VM(s) with existing snapshots ()
* dryrun
* Quick email status summary
* Implemented simple locking mechenism to ensure only 1 instance of ghettoVCB is running per host

Output back up VMDK(s) in either ZEROEDTHICK (default behavior) or 2GB SPARSE or THIN or EAGERZEROEDTHICK format

Download: https://github.com/lamw/ghettoVCB

DOC: https://communities.vmware.com/docs/DOC-8760

 


Installation

 

Install vib

esxcli software vib install -v /vghetto-ghettoVCB.vib -f

Once installed

ghettoVCB configuration files located in:

/etc/ghettovcb/ghettoVCB.conf
/etc/ghettovcb/ghettoVCB-restore_vm_restore_configuration_template
/etc/ghettovcb/ghettoVCB-vm_backup_configuration_template

Both ghettoVCB and ghettoVCB-restore scripts are located in:

/opt/ghettovcb/bin/ghettoVCB.sh
/opt/ghettovcb/bin/ghettoVCB-restore.sh

 


Configurations(ghettoVCB.conf)

 

# execute permission

chmod +x ghettoVCB.shchmod

# ghettoVCB.conf

ghettoVCB.conf = global ghettoVCB configuration file

# Defining the backup datastore and folder in which the backups are stored

VM_BACKUP_VOLUME=/vmfs/volumes/backup_disk/VMBackup

# Defining the backup disk format

# zeroedthick, eagerzeroedthick, thin, and 2gbsparse

DISK_BACKUP_FORMAT=thin

# Defining the backup rotation per VM

VM_BACKUP_ROTATION_COUNT=2

# Defining whether the VM is powered down or not prior to backup

POWER_VM_DOWN_BEFORE_BACKUP=0

# Defining whether virtual machine memory is snapped

Memory: If the <memory> flag is 1 or true, a dump of the internal state of the virtual machine is included in the snapshot. Memory snapshots take longer to create, but allow reversion to a running virtual machine state as it was when the snapshot was taken. This option is selected by default. If this option is not selected, and quiescing is not selected, the snapshot will create files which are crash-consistent, which you can use to reboot the virtual machine.

Note: When taking a memory snapshot, the entire state of the virtual machine will be stunned. For more information, see  Taking a snapshot with virtual machine memory renders the virtual machine to an inactive state while the memory is written to disk (1013163).

VM_SNAPSHOT_MEMORY=0

# Quiescing

Quiesce: If the <quiesce> flag is 1 or true, and the virtual machine is powered on when the snapshot is taken,

VMware Tools is used to quiesce the file system in the virtual machine.

Quiescing a file system is a process of bringing the on-disk data of a physical

or virtual computer into a state suitable for backups.

This process might include such operations as flushing dirty buffers from the operating system's in-memory cache to disk,

or other higher-level application-specific tasks.

Note: Quiescing indicates pausing or altering the state of running processes on a computer,

particularly those that might modify information stored on disk during a backup,

to guarantee a consistent and usable backup. Quiescing is not necessary for memory snapshots; it is used primarily for backups.

VM_SNAPSHOT_QUIESCE=0

Mail

# *** Please enable firewall rule for email traffic on port 25 ***
# Defining whether or not to email backup logs

EMAIL_LOG=1

# Defining email server & port:

EMAIL_SERVER=r
EMAIL_SERVER_PORT=25

EMAIL_FROM=s@s
EMAIL_TO=r@r

 


Usage Example

 

# Dry run Mode (no backup will take place)

./ghettoVCB.sh -f /vmfs/volumes/backup_disk/ghettoVCB/vms_to_backup.txt  \
               -g /vmfs/volumes/backup_disk/ghettoVCB/ghettoVCB.conf \
               -l /vmfs/volumes/backup_disk/ghettoVCB/backup.log \
               -d dryrun

-g     Path to global ghettoVCB configuration file

-d     Debug level [info|debug|dryrun] (default: info)

dryrun 可以用來 troubleshoot 問題

2019-07-26 08:32:32 -- dryrun: ###############################################
2019-07-26 08:32:32 -- dryrun: Virtual Machine: srv2012
2019-07-26 08:32:32 -- dryrun: VM_ID: 1
2019-07-26 08:32:32 -- dryrun: VMX_PATH: /vmfs/volumes/datastore1/srv2012/srv2012.vmx
2019-07-26 08:32:32 -- dryrun: VMX_DIR: /vmfs/volumes/datastore1/srv2012
2019-07-26 08:32:32 -- dryrun: VMX_CONF: srv2012/srv2012.vmx
2019-07-26 08:32:32 -- dryrun: VMFS_VOLUME: datastore1
2019-07-26 08:32:32 -- dryrun: VMDK(s):
2019-07-26 08:32:32 -- dryrun:  /vmfs/volumes/573c8235-ca82de29-925b-3417ebef4403/srv2012/srv2012_1.vmdk        500 GB
2019-07-26 08:32:32 -- dryrun:  srv2012-000001.vmdk     600 GB
2019-07-26 08:32:32 -- dryrun: INDEPENDENT VMDK(s):
2019-07-26 08:32:32 -- dryrun: TOTAL_VM_SIZE_TO_BACKUP: 1100 GB
2019-07-26 08:32:32 -- dryrun: Snapshots found for this VM, please commit all snapshots before continuing!
2019-07-26 08:32:32 -- dryrun: THIS VIRTUAL MACHINE WILL NOT BE BACKED UP DUE TO EXISTING SNAPSHOTS!

# 設定 backup 那些 VM

vms_to_backup.txt

VM1
VM2
VM2

# List vm name by cmd

vim-cmd vmsvc/getallvms

Vmid Name File    Guest OS    Version   Annotation

Remark

# Backup VMs stored in a list

./ghettoVCB.sh -f /etc/ghettovcb/vms_to_backup.txt

# Backup Single VM using command-line

./ghettoVCB.sh -m MyVM

# Backup All VMs residing on specific ESX(i) host

./ghettoVCB.sh -a

 


Cronjob

 

Important Note:

Always redirect the ghettoVCB output to /dev/null and/or to a log when automating via cron,

this becomes very important as one user has identified a limited amount of buffer capacity in which once filled,

may cause ghettoVCB to stop in the middle of a backup.

This primarily only affects users on ESXi, but it is good practice to always redirect the output.

Also ensure you are specifying the FULL PATH when referencing the ghettoVCB script, input or log files.

Backup Script

start_bak.sh

# backup script

/vmfs/volumes/backup_disk/ghettoVCB/ghettoVCB.sh \
-g /vmfs/volumes/backup_disk/ghettoVCB/ghettoVCB.conf \
-f /vmfs/volumes/backup_disk/ghettoVCB/vms_to_backup.txt \
> /tmp/ghettoVCB.log 2>&1

chmod 755 start_bak.sh

Create cron jobs call Backup Script

# 每星期日 backup 一次

/bin/echo "0 12 * * 0 /vmfs/volumes/backup_disk/ghettoVCB/start_bak.sh" >> /var/spool/cron/crontabs/root

Notes:

 * 注意 ESXi 是用 UTC 時間, 所以 12 即是 HKT 20:00 才開始 backup

 * 星期日不是用 "7"

Restart crond on ESXi 5.1

#1 Stop

kill $(cat /var/run/crond.pid)

#2 Check

# -c   Display verbose command line

# No output

ps -c | grep [c]rond

#3 Start

crond

Keep crond jobs after reboot

/etc/rc.local.d/local.sh

/bin/kill $(cat /var/run/crond.pid)
/bin/echo "0 12 * * 0 /vmfs/volumes/backup_disk/ghettoVCB/start_bak.sh"  >> /var/spool/cron/crontabs/root
crond

 


Stopping ghettoVCB Process

 

Interactively running ghettoVCB:

Step 1 - Press Ctrl+C which will kill off the main ghettoVCB instance

Step 2 - Search for any existing ghettoVCB process by running the following:

ps -c | grep ghettoVCB | grep -v grep

ps -c | grep vmkfstools | grep -v grep

-c   Display verbose command line

Step 3 - remove any existing snapshots that may exist on the VM that was being backed up

 

 


Toubleshoot

 

ghettoVCB backup fail

log

2019-09-13 02:33:14 -- info: Initiate backup for vm.myserver
2019-09-13 02:33:14 -- info: Creating Snapshot "ghettoVCB-snapshot-2019-09-13" for vm.myserver
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/ADATA-SSD-SU650-480G/myserver/vm.myserver.vmdk'...
^MClone: 10% done.^MClone: 11% done. ... ^MClone: 99% done.^MClone: 100%
2019-09-13 03:30:41 -- info: ERROR: error in backing up of "/vmfs/volumes/SSD/myserver/vm.myserver.vmdk" for vm.myserver
2019-09-13 03:30:43 -- info: Removing snapshot from vm.myserver ...
2019-09-13 03:30:43 -- info: Backup Duration: 57.48 Minutes
2019-09-13 03:30:43 -- info: ERROR: Unable to backup vm.myserver due to error in VMDK backup!
...

Troubleshoot flow:

[1] check image with vmkfstools

# -x --fix [check|repair]

vmkfstools -x check vm.myserver.vmdk

Disk is error free

check 完 file 竟然無問題 @@||

[2] 人手 clone disk 測試

因為 ghettoVCB 是用 snapshot + vmkfstools 去 clone disk,

所以我們人手測試 clone vm snapshot image 先.

...
Clone: 100% done.Failed to clone disk: Input/output error (327689).

[Fix]

由於 clone 失敗, 所以我們要把 image 內的 file 到新 image 了

(別用 dd 去 clone !! 因為 image 有問題, 所以 dd 後的新 image 仍會有 FS 問題. 更甚者比原來的舊 image 更差 )

Snapshot found

log file:

2020-08-03 12:47:12 -- info: Snapshot found for myvm_2003  backup will not take place

GUI show without snapshoot

CLI

ls -1

myvm_2003.vmdk
myvm_2003-flat.vmdk
myvm_2003-000001-delta.vmdk
myvm_2003-000001.vmdk

[Fix] create a new snapshot and then use the 'delete all' to clear all the snapshots.

        That seems to clear up some partially completed snaps.

Remark

ESXi "Delete All"

Use the Delete All option to delete all snapshots from the Snapshot Manager.

Delete all consolidates and writes the changes that occur between snapshots and

the previous delta disk states to the base parent disk and merges them with the base virtual machine disk.