常用打包工具(tar, gzip, bzip, zip, 7z, xz, pigz)

更新時間: 2019-07-08

 

目錄


壓縮比

 

用解壓後的 clonezilla 的 filesystem.squashfs 測試壓縮

壓縮比: zstd, lz4, zstd < gzip, zip < bzip < xz, lzma (7zip)

# Size: 1.1G

tar -cf squashfs-root.tar squashfs-root

# Result: 341M

gzip < squashfs-root.tar > squashfs-root.tar.gz

# Result: 341M

zip squashfs-root.tar.zip squashfs-root.tar

# Result: 309M

bzip2 < squashfs-root.tar > squashfs-root.tar.bz

# Result: 217M

xz -k squashfs-root.tar

 


tar

 

Useful opts:

  • --numeric-owner                     # always use numbers for user/group names
  • --exclude=PATTERN
  • -C, --directory DIR                   # change to directory DIR
  • -t, --list
  • --strip-components count         # strip NUMBER leading components on extraction
  • -a, --auto-compress                # use archive suffix to determine the compression program
                                                 # 接受 "tar.xz", 不接受 "tzx"

Folder 尾的 "/" 不影響

tar -cvf MyFolder.tar MyFolder

tar -cvf MyFolder.tar MyFolder/

測試 tar 檔(-t, --list)

tar -tf test.tar

查看檔內檔案的 uid 及 gid

tar --numeric-owner -vtf test.tar

drwxr-xr-x 0/0               0 2023-12-24 21:23 test/
-rw-r--r-- 0/0               0 2023-12-24 21:22 test/test.txt

解開來自不同系統的 tar 檔時一定要用到它

tar --numeric-owner -zxf ubuntu-12.04.tar.gz -C /lxc/u12/rootfs/

個別 subfolders / files 不打包(--exclude)

整個 uploads Folder 不 keep (public_html/uploads)

tar -czvf backup.tar.gz public_html --exclude "public_html/uploads"

 * 支援 multiple exclude

 * use shell syntax, or globbing, rather than regexp syntax

 * public_html/uploads2 不會中被 exclude

 * "--exclude public_html/uploads/" 會令 exclude 無效 !!

 * "--exclude uploads" 所有層內的uploads都會被 exclude

PATTERN

  • '*' or '?'    # Globbing
  • '[a-e]'       # character class

保留 uploads Folder, 但沒有內容

tar -czvf backup.tar.gz ./public_html --exclude "./public_html/uploads/*"

 * quoting the patterns to protect them from shell expansion

Exclude By Full Path

# "./" 必須對應

tar -zcf public_html.tar.gz --exclude=./public_html/tmp ./public_html

 * 此方法(Full Path)不會 exclude 其他層內的 "public_html/tmp"
   (i.e. public_html/test/public_html/tmp)

測試

mkdir public_html/tmp public_html/test/public_html/tmp -p

touch public_html/a.txt public_html/tmp/b.txt public_html/test/public_html/tmp/c.txt

tar -zcf public_html.tar.gz --exclude=./public_html/tmp ./public_html

tar -ztf public_html.tar.gz

P.S.

# 對比

tar -zcf public_html.tar.gz --exclude=public_html/tmp ./public_html

Exclude files matching patterns listed in FILE

-X, --exclude-from=FILE

解開到另一個地方

tar zxf ~/backup.tgz -C /You/Path files

打包時不保持頂層的目錄

tar --strip-components=3 -zxf test.tar.gz a/b/c/test.txt

tar 的安全性測試("-t")

破壞一個 tar 檔

xxd -l 16 -s 102400 etc.tar

dd if=/dev/urandom of=etc.tar bs=1 count=16 conv=notrunc seek=102400

xxd -l 16 -s 1048576 etc.tar

tar -tf etc.tar

echo $?

Result 境然係無 Error ...

破壞一個 tar.gz 檔

xxd -l 16 -s 102400 etc.tar.gz

dd if=/dev/urandom of=etc.tar.gz bs=1 count=16 conv=notrunc seek=102400

xxd -l 16 -s 102400 etc.tar.gz

# stdout 會出 file list, stderr 才顯示 error

tar -ztf etc.tar.gz > /dev/null

tar: Skipping to next header

gzip: stdin: invalid compressed data--crc error

gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now

tar extract single file

tar -zxvf public_html.gz public_html/cms/config.php

# 會保留了層的結構

ls public_html/cms/config.php

Un-Pack bz2, xz

tar -jxf Folder.tar.bz2 Folder

tar -Jxf Folder.tar.xz Folder

 


zip 與 unzip

 

注意: 在 Debian 上, zip 與 unzip 分別是在兩個獨立的 package

unzip:

Usage:

unzip [opts] file[.zip] [file(s) ...]  [-x xfile(s) ...] [-d exdir]

測試原整性:

# -t     test archive files, extracts each specified file in memory and compares the CRC

root@debian6:~# unzip -t cms.zip

查看有什麼 file:

unzip -l cms.zip

  Length      Date    Time    Name
---------  ---------- -----   ----

查看 comment:

unzip -z cms.zip

就地解開:

unzip cms.zip

解到某 Folder:

-d  解開到另一個目錄內

unzip nic-traffic.sh.zip -d wts/

其他相關選項:

-X     restore UID/GID info

-o     overwrite files WITHOUT prompting

# the TZ (timezone) environment variable must be set correctly in order for -f and -u to work properly

# By default unzip queries before overwriting, but the -o option may be used to suppress the queries.

-u     update existing files and create new ones if needed.

-f     freshen existing files only

zipinfo

# list header line.  The archive name, actual size (in bytes) and total number of files is printed.

zipinfo archive.zip | less

Archive:  archive.zip
Zip file size: 6218637709 bytes, number of entries: 103847       <-- 真實 filesize
drwxr-x---  3.0 unx        0 bx stor 22-Mar-15 17:08 archive/
drwx------  3.0 unx        0 bx stor 16-Sep-08 10:02 archive/vincent/
-rw-------  3.0 unx       95 bx defN 16-Sep-08 10:02 archive/vincent/.dovecot.lda-dupes
...
103847 files, 10557281066 bytes uncompressed, 6187645423 bytes compressed:  41.4%

Opts

-h    #  list header line. The archive name, actual size (in bytes) and total number of files is printed.

-t     # list totals for files listed or for all files.

Note that

the total compressed (data) size will never match the actual zipfile size,

since the latter includes all of the internal zipfile headers in  addition to the compressed data.

zip:

zip [option] [-r] file.zip [list] .....  [-xi list]

功能:

  • -e   加密
  • -c   加 comment

操作:

  • -r     # --recurse-paths, including files with names starting with "."
  • -f     # freshen: only changed files
  • -u    # 更新及加檔案, 不過不會刪除檔案
  • -d    # delete entries in zipfile

Linux 特性:

  • -K  keep setuid/setgid/tacky permissions
  • -y   store symbolic links

Filter:

  • -x   exclude the following names
  • -i   include only the following names
  • -n   don't compress these suffixe

Compression level

-0 indicates no compression

-6 Default

-9 optimal compression

Example:

打包一個目錄:

zip -r cms.zip cms

不連目錄打包

cd portal

zip -r ../portal.zip *

加注:

# Prompt for a multi-line comment for the entire zip archive.

# ended by a line containing just a period

zip -z cms.zip

zip -z foo < foowhat

加密:

zip -e -r cms.zip cms

更新:

zip -u -r cms.zip cms

 


xz compression

 

lossless data compression

  • stripped-down version of the 7-Zip
  • incorporates the LZMA/LZMA2
  • single files as input

Install

apt-get install xz-utils

yum install xz

獲得

  • /usr/bin/unxz (-d)
  • /usr/bin/xz (-z)
  • /usr/bin/xzless
  • /usr/bin/xzcat
  • /usr/bin/xzgrep
  • /usr/bin/xzdiff
  • /usr/bin/lzmainfo

xz --version

xz (XZ Utils) 5.1.0alpha
liblzma 5.1.0alpha

Usage

 * xz / unxz 後原本的 file 會不見了, 所以記得加 "-k" # -k = Don't delete the input files.

xz

-TN, --threads=NUM   # use at most N threads; Default 1; set to 0 All processor cores

                                # memory usage 係以 threads 上的

-eN, --extreme          # try to improve compression ratio by using more CPU time;

                               # does not affect decompressor memory requirements

-v, --verbose            # If standard error is connected to a terminal, xz will display a progress indicator.

  --- %    2.5 MiB / 15.2 MiB = 0.162   1.8 MiB/s   0:08
  • Completion percentage is shown if the size of the input file is known.
  • Amount of compressed data produced
  • Amount of uncompressed data consumed
  • Compression ratio
  • Compression or decompression speed
  • Elapsed time in the format M:SS or H:MM:SS.
  • Estimated remaining  time

Compression preset levels with memory usage

Preset  DictSize   CompCPU  CompMem   DecMem
-6       8 MiB       6       94 MiB    9 MiB
-7      16 MiB       6      186 MiB   17 MiB
-8      32 MiB       6      370 MiB   33 MiB
-9      64 MiB       6      674 MiB   65 MiB

unxz

unxz is equivalent to xz --decompress

xzcat is equivalent to xz --decompress --stdout

  • -t                          # Test the integrity of compressed files
  • -l                          # Print  information about compressed files
  • -k, --keep              # Don't delete the input files.
  • -d, --decompress

ie.

xz -l 2023-05-03-raspios-bullseye-armhf-lite.img.xz

Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1      10    363.9 MiB  1,876.0 MiB  0.194  CRC64   2023-05-03-raspios-bullseye-armhf-lite.img.xz

xzcat 2023-05-03-raspios-bullseye-armhf-lite.img.xz > /dev/sdX

ENV - XZ_OPT

This is for passing options to xz when it is not possible to set the options directly on the xz command line.

(e.g. xz is run by tar)

Usage

XZ_OPT="-T0 -8 -v"  tar Jcf foo.tar.xz foo

 


gzip

 

Lempel-Ziv coding  (LZ77)

If no files are specified, or if a file name is "-" , the stdin is compressed to the stdout.

opts

  • -d --decompress
  • -c --stdout                           # Write output on standard output; keep original files unchanged.
  • -t --test
  • -l --list                                 # 與 -v 一起使用會見到 method crc date
  • -v --verbose

i.e.

-l

du -sh *

452G    20220210
23G     20220210.tar.gz

gzip -l 20220210.tar.gz

         compressed        uncompressed  ratio uncompressed_name
        24360201377          3415758848 -613.2% 20220210.tar

compressed size: size of the compressed file

uncompressed size: size of the uncompressed file

gzip uncompressed file estimation wrong?

原因:

gzip spec (RFC 1952) - ISIZE (Input SIZE)

This contains the size of the original (uncompressed) input data modulo 2^32.

The "gzip" format represents the the input size modulo 2^32,

so the uncompressed size and compression ratio are listed incorrectly for uncompressed files 4 GB and larger.

-dc

gzip -dc test.tar.gz > test.tar

zip without delete original

gzip < access.log > access.log.gz

Unzipping a .gz file without removing the gzipped

gzip -d < file.gz > file

OR

zcat somefile.gz > somefile

gzip slow

gzip slow despite(儘管) CPU and hard drive performance not being maxed out

=> 放棄 gzip 改用 lz4

 


bzip2

 

bzip2 = a block-sorting file compressor

It using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding.

 * bzip2 uses 32-bit CRCs

 * the  default  900k block size, bunzip2 will require about 3700 kbytes to decompress. (100k + ( 2.5 x block size ))

-d --decompress
-t --test
-c --stdout
-k --keep            # Keep (don’t delete) input files during compression or decompression.

-1 (or --fast) to -9 (or --best) <- Default

Set  the  block size to 100 k, 200 k ..  900 k when compressing.

Other bin

bzcat - decompresses files to stdout

 


pigz

 

stands for parallel implementation of gzip

Install

apt-get install pigz

Compression

 * The input is broken up into 128 KB chunks with each compressed in parallel

 * The compressed data format generated is in the gzip, zlib, or single-entry zip format

    using the deflate compression method.

 * The compression produces partial raw deflate streams which are concatenated by a single write thread

    and wrapped with the appropriate header and trailer, where the trailer contains the combined check value.

Example

# -k --keep      Do not delete original file after processing.

# -p n             number of compress threads. Default: ALL Core

# -b nK           input block size. Default: 128 # 128KiB

# -N                The default is -6 ( --fast=-1 )

# 2 CPU Core, 1MiB Block Size

pigz -k -p 2 -b 1024 --fast --rsyncable filename    # 會建立 filename.gz

rsyncable

-R --rsyncable

Input-determined block locations for rsync. it can re-sync inside the file

rsyncable which produces slightly larger archive that is more ‘friendly’ for the rsync algorithm

using regular gzip or bz2 leads to output files having too much differences between each dump,

leading to very large diffs produced by the rsync algorithm of rdiff.

Format

Default: gz

  • -K --zip                # PKWare zip (.zip)
  • -z --zlib                # zlib (.zz)

Other Opts

  • -t --test                # Test the integrity of the compressed input.

Decompression

unpigz OR -d (--decompress)

 * Decompression can't be parallelized

    (But will create three other threads for reading, writing, and check calculation)

ie.

pigz -k -d esxi6.7.raw.gz

 


lz4

 

Intro

 * focused on compression and decompression speed.

    (lz4 offers compression speeds of 400 MB/s per core, linearly scalable with multi-core CPUs.)

 * It belongs to the LZ77 family of byte-oriented compression schemes.

 * LZ4 was also implemented natively in the Linux kernel 3.11

modinfo lz4

filename:       /lib/modules/4.15.0-999-generic/kernel/crypto/lz4.ko

LZ77

LZ77 algorithms achieve compression by replacing repeated occurrences of data

with references to a single copy of that data existing earlier in the uncompressed data stream.

A match is encoded by a pair of numbers called a length-distance pair,

which is equivalent to the statement

"each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream"

To spot matches, the encoder must keep track of some amount of the most recent data, such as the last 2 kB, 4 kB, or 32 kB.

The structure in which this data is held is called a sliding window, which is why LZ77 is sometimes called sliding-window compression.

Install

# U16.04

apt-get install liblz4-tool

Version

lz4 -V

*** LZ4 command line interface 32-bits r131, by Yann Collet (Jan 25 2017) ***

Opts

  • -z, --compress ( -#, 1(default)~9 )
  • -d, --decompress
  • -k, --keep           # Don't delete source file (Default behavior)
  • -t, --test
  • -B#                    # block size [4-7](default : 7),  B4= 64KB ; B5= 256KB ; B6= 1MB ; B7= 4MB

Benchmark test

  • -b
  • -iN                    # iteration loops [1-9](default : 3)

tar with lz4

# -I, --use-compress-program PROG. filter through PROG (must accept -d)

tar -I=lz4 -cf OUTPUT.tar.gz paths_to_archive

OR

tar cf - paths_to_archive | lz4 > OUTPUT_FILE.tar.gz
 


cpio

 

Copy-out Mode: Copy files named in name-list to the archive

cpio -o < name-list > archive

find . -depth -print | cpio -o > /path/archive.cpio

Copy-in Mode: Extract files from the archive

cpio -i < archive

Copy-pass Mode: Copy files named in name-list to destination-directory

cpio -p destination-directory < name-list

List: Print a table of contents of all the inputs present.

cpio -t < archive.cpio

-v, –verbose: List the files processed in a particular task.

 


 

Creative Commons license icon Creative Commons license icon