常用打包工具(tar, gzip, bzip, zip, 7z, xz, pigz)

更新時間: 2019-07-08

 

目錄


tar

Useful opts:

  • --numeric-owner                     # always use numbers for user/group names
  • --exclude=PATTERN
  • -C, --directory DIR                   # change to directory DIR
  • -t, --list
  • --strip-components count         # strip NUMBER leading components on extraction

# 解開不同的系統檔時一定要用到

tar --numeric-owner -zxf ubuntu-12.04.tar.gz -C /lxc/u12/rootfs/

# 不理 backup 目錄及其下的一切

# PATTERN

'*' or '?'    # Globbing

'[a-e]'       # character class

  * excluding a directory also excludes all the files beneath it.

# uploads 整個不 keep

tar -czvf backup.tar.gz ./public_html --exclude "./public_html/uploads"

# 保留 uploads Folder, 但沒有內容

tar -czvf backup.tar.gz ./public_html --exclude "./public_html/uploads/*"

# 解開到另一個地方

tar zxf ~/backup.tgz -C /You/Path files

# 打包時不保持頂層的目錄

tar --strip-components=3 -zxf test.tar.gz a/b/c/test.txt

tar 的 "-t" 安全性測試

破壞一個 tar 檔

xxd -l 16 -s 102400 etc.tar

dd if=/dev/urandom of=etc.tar bs=1 count=16 conv=notrunc seek=102400

xxd -l 16 -s 1048576 etc.tar

tar -tf etc.tar

echo $?

Result 境然係無 Error ...

破壞一個 tar.gz 檔

xxd -l 16 -s 102400 etc.tar.gz

dd if=/dev/urandom of=etc.tar.gz bs=1 count=16 conv=notrunc seek=102400

xxd -l 16 -s 102400 etc.tar.gz

tar -tf etc.tar.gz > /dev/null

tar: Skipping to next header

gzip: stdin: invalid compressed data--crc error

gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now

tar extract single file

tar -zxvf public_html.gz public_html/cms/config.php

# 會保留了層的結構

ls public_html/cms/config.php

 


7-Zip (7z)

Install

yum install p7zip

apt-get install p7zip-full

Usage:

7z <command> [<switches>...] <archive_name> [<@listfiles...>]

常用 command 如下:

a:  Add files to archive

x:  eXtract files with full paths

l:   List contents of archive

t: Test integrity of archive

t:  Test

d:  Delete files from archive

e:  Extract files from archive (without using directory names)

u:  Update files to archive

b:  Benchmark

SWITCHES

-p{Password}                # Set Password

-o{Directory}                 # Set Output directory, -o 與之後的 folder name 不能有空格

-v{Size}[b|k|m|g]         # Create volumes

-l                                  # don’t  store  symlinks; store the files/directories they point to

Example:

1) 打包一個整個目錄:

7z a test.7z ./test

2) 查看包內有什麼

7z l tiki-12.3.7z  | less

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2014-11-14 23:01:06 ....A           94     29640662  tiki-12.3/vendor/jcapture-applet/jcapture-applet/src/META-INF/.svn/all-wcprops
2014-11-14 23:01:06 ....A          278               tiki-12.3/vendor/jcapture-applet/jcapture-applet/src/META-INF/services/.svn/all-wcprops

.......................................

Attr

  • R = READONLY
  • H = HIDDEN
  • S = SYSTEM
  • A = ARCHIVE                 # Windows 的 File Default 會有 A
  • C = COMPRESSED
  • N = NOT INDEXED
  • L = Reparse Points
  • O = OFFLINE
  • P = Sparse File
  • I = Not content indexed
  • T = TEMPORARY
  • E = ENCRYPTED

3) 解壓

7za x tiki-12.3.7z

.......................................

Extracting  tiki-12.3

Everything is Ok

Folders: 2101
Files: 16791
Size:       137347765
Compressed: 29849851

4) 7z extract to specific folder

7z x [archive.7z] -o[output_dir]

i.e.

7z x DATA.zip -o/volume1/tmp

Everything is Ok

Size:       11038359552
Compressed: 65107236764

5) fileFilter

7z e [archive.zip] -o[outputdir] [fileFilter_1] [fileFilter_2] -r

-r[-|0]          # Recurse subdirectories (CAUTION: this flag does not do what you think, avoid using it)

i.e.

7z x DATA.zip -o/volume1/tmp DATA/mydb.mdf

6) Show extraction progress of 7zip inside cmd

# Set output stream for output/error/progress line

-bs{o|e|p}{0|1|2}

i.e.

7z x s2012.iso -bsp1

Scanning the drive for archives:
1 file, 4542291968 bytes (4332 MiB)

Extracting archive: s2012.iso
--
Path = s2012.iso
Type = Udf
Physical Size = 4542291968
Comment = IR3_SSS_X64FREE_EN-US_DV9
Cluster Size = 2048
Created = 2014-03-22 05:27:47

 38% 549 - sources/install.wim

P.S.

7-zip does not store the owner/group of the file.

- tar cf - directory | 7za a -si directory.tar.7z

- 7za x -so directory.tar.7z | tar xf -

進階 opts

  • -ms=on               # solid archive = on
  • -md=32m            # dictionary size = 32 megabytes
  • -mx=9                 # level of compression = 9 (Ultra) [Default: 5]

Benchmark 系統

There are two tests:

  • Compressing with LZMA method

  • Decompressing with LZMA method

The benchmark shows a rating in MIPS (million instructions per second).

Score

https://www.7-cpu.com/

Usage

b [number_of_iterations] [-mmt{N}] [-md{N}] [-mm={Method}]

opt

-md{N}        dictionary size to increase memory usage

-mmt{N}      number of threads

-mm=crc      run a CRC calculation benchmark. That test shows the speed of CRC calculation in MB/s.

Example

7za b

RAM size:    1006 MB,  # CPU hardware threads:   2
RAM usage:    425 MB,  # Benchmark threads:      2

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    4469   156   2790   4347  |    19918   131   1371   1798
23:    3100   157   2017   3159  |    19653   131   1374   1799
24:    3044   157   2078   3273  |    20536   133   1435   1905
25:    5922   179   3767   6761  |    61724   200   2899   5805
----------------------------------------------------------------
Avr:          162   2663   4385               149   1770   2827
Tot:          156   2216   3606

解說

Dict column

shows dictionary size. For example, 21 means 2^21 = 2 MB.

Usage column

shows the percentage of time the processor is working.

It's normalized for a one-thread load.

For example, 180% CPU Usage for 2 threads can mean that average CPU usage is about 90% for each thread.

R/U column

The rating normalized for 100% of CPU usage.

That column shows the performance of one average CPU thread.

Rating column

The rating value is calculated from the measured CPU speed and

it is normalized with results of Intel Core 2 CPU with multi-threading option switched off.

So if you have Intel Core 2 Duo, rating values must be close to real CPU frequency.

Remark

7-Zip 18.03 uses new optimized version of LZMA decoder for x64 (x86-64) system.

The new LZMA decoder for x64 is written in assembler and

it uses Conditional Move (CMOV) instructions instead of branches in original code.

Compression / Decompression speed

Compression speed

strongly depends from memory (RAM) latency, Data Cache size/speed and TLB.

Out-of-Order execution feature of CPU is also important for that test.

Decompression speed

strongly depends on CPU integer operations.

The most important things for that test are:

branch misprediction penalty (the length of pipeline) and the latencies of 32-bit instructions ("multiply", "shift", "add" and other).

The decompression test has very high number of unpredictable branches.

Note that some CPU architectures (for example, 32-bit ARM) support instructions that can be conditionally executed.

So such CPUs can work without branches (and without pipeline flushing) in many cases in LZMA decompression code.

And such CPUs can have some speed advantages over other architectures that don't support complex conditionally execution.

Out-of-Order execution capability is not so important for LZMA Decompression.

 


zip 與 unzip

 

注意: 在 Debian 上, zip 與 unzip 分別是在兩個獨立的 package

unzip:

Usage:

unzip [opts] file[.zip] [file(s) ...]  [-x xfile(s) ...] [-d exdir]

測試原整性:

# -t     test archive files, extracts each specified file in memory and compares the CRC

root@debian6:~# unzip -t cms.zip

查看有什麼 file:

unzip -l cms.zip

  Length      Date    Time    Name
---------  ---------- -----   ----

查看 comment:

unzip -z cms.zip

就地解開:

unzip cms.zip

其中相關選項:

-X  restore UID/GID info
-o  overwrite files WITHOUT prompting
-d  解開到另一個目錄內

-u     update existing files and create new ones if needed.

-f     freshen existing files only
       By  default unzip queries before overwriting, but the -o option may be used to suppress the queries.
       the TZ (timezone) environment variable must be set correctly in order for -f and -u to work properly

zip:

zip [option] [-r] file.zip [list] .....  [-xi list]

功能:

  • -e   加密
  • -c   加 comment

操作:

  • -r     # --recurse-paths, including files with names starting with "."
  • -f     # freshen: only changed files
  • -u    # 更新及加檔案, 不過不會刪除檔案
  • -d    # delete entries in zipfile

Linux 特性:

  • -K  keep setuid/setgid/tacky permissions
  • -y   store symbolic links

Filter:

  • -x   exclude the following names
  • -i   include only the following names
  • -n   don't compress these suffixe

Compression level

-0 indicates no compression

-6 Default

-9 optimal compression

Example:

打包一個目錄:

zip -r cms.zip cms

加注:

zip -z cms.zip

zip -z foo < foowhat

加密:

zip -e -r cms.zip cms

更新:

zip -u -r cms.zip cms

 


xz compression

 

lossless data compression

* stripped-down version of the 7-Zip
* incorporates the LZMA/LZMA2
* single files as input

Install

apt-get install xz-utils

yum install xz

獲得

  • /usr/bin/unxz (-d)
  • /usr/bin/xz (-z)
  • /usr/bin/xzless
  • /usr/bin/xzcat
  • /usr/bin/xzgrep
  • /usr/bin/xzdiff
  • /usr/bin/lzmainfo

xz --version

xz (XZ Utils) 5.1.0alpha
liblzma 5.1.0alpha

# unxz 後原本的 file 會不見了

-k            # Don't delete the input files.

-t             # Test  the  integrity  of  compressed files.

-l             # Print  information  about compressed files.

 

 


gzip

 

Lempel-Ziv coding  (LZ77)

If no files are specified, or if a file name is "-" , the standard input  is  compressed  to the standard output.

opts

-d --decompress

-c --stdout                           # Write output on standard output; keep original files unchanged.

-t --test

-l --list                                 # 與 -v 一起使用會見到 method crc date

-v --verbose

i.e.

gzip -l test.tar.gz

         compressed        uncompressed  ratio uncompressed_name
             220280             3348480  93.4% test.tar

compressed size: size of the compressed file

uncompressed size: size of the uncompressed file

 * tar 了 folder 後 size 係會細左

gzip -dc test.tar.gz > test.tar

zip without delete original

gzip < access.log > access.log.gz

Unzipping a .gz file without removing the gzipped

gzip -d < file.gz > file

OR

zcat somefile.gz > somefile

gzip slow

gzip slow despite CPU and hard drive performance not being maxed out

放棄 gzip 改用 lz4

 


bzip2

 

bzip2 = a block-sorting file compressor

It using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding.

 * bzip2 uses 32-bit CRCs

 * the  default  900k block size, bunzip2 will require about 3700 kbytes to decompress. (100k + ( 2.5 x block size ))

-d --decompress
-t --test
-c --stdout
-k --keep            # Keep (don’t delete) input files during compression or decompression.

-1 (or --fast) to -9 (or --best) <- Default

Set  the  block size to 100 k, 200 k ..  900 k when compressing.

Other bin

bzcat - decompresses files to stdout

 


pigz

 

Compression

The input is broken up into 128 KB chunks with each compressed in  parallel.

The compressed data format generated is in the gzip, zlib,  or  single-entry zip format

The compression produces partial raw deflate streams which are concatenated by a single write thread

and wrapped with the appropriate header and trailer, where the trailer contains the combined check value.

Example

# -p n # number of compress threads
# -b nK # input block size (128)
# The default is -6 ( --fast=-1 )

pigz -c --fast -b 1024 --rsyncable

rsyncable

-R --rsyncable         # Input-determined block locations for rsync.

–rsyncable which produces slightly larger archive that is more ‘friendly’ for the rsync algorithm
using regular gzip or bz2 leads to output files having too much differences between each dump, leading to very large diffs produced by the rsync algorithm of rdiff.

it can re-sync inside the file

Other

-k --keep              # Do not delete original file after processing.

-t --test                # Test the integrity of the compressed input.

Decompression

 * Decompression can't be parallelized
    (but will create three  other threads for reading, writing, and check calculation)

-d --decompress

 


lz4

 

Intro

 * focused on compression and decompression speed.

    (lz4 offers compression speeds of 400 MB/s per core, linearly scalable with multi-core CPUs.)

 * It belongs to the LZ77 family of byte-oriented compression schemes.

 * LZ4 was also implemented natively in the Linux kernel 3.11

modinfo lz4

filename:       /lib/modules/4.15.0-999-generic/kernel/crypto/lz4.ko

LZ77

LZ77 algorithms achieve compression by replacing repeated occurrences of data

with references to a single copy of that data existing earlier in the uncompressed data stream.

A match is encoded by a pair of numbers called a length-distance pair,

which is equivalent to the statement

"each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream"

To spot matches, the encoder must keep track of some amount of the most recent data, such as the last 2 kB, 4 kB, or 32 kB.

The structure in which this data is held is called a sliding window, which is why LZ77 is sometimes called sliding-window compression.

Install

# U16.04

apt-get install liblz4-tool

Version

lz4 -V

*** LZ4 command line interface 32-bits r131, by Yann Collet (Jan 25 2017) ***

Opts

  • -z, --compress ( -#, 1(default)~9 )
  • -d, --decompress
  • -k, --keep           # Don't delete source file (Default behavior)
  • -t, --test
  • -B#                    # block size [4-7](default : 7),  B4= 64KB ; B5= 256KB ; B6= 1MB ; B7= 4MB

Benchmark test

  • -b
  • -iN                    # iteration loops [1-9](default : 3)

tar with lz4

# -I, --use-compress-program PROG. filter through PROG (must accept -d)

tar -I=lz4 -cf OUTPUT.tar.gz paths_to_archive

OR

tar cf - paths_to_archive | lz4 > OUTPUT_FILE.tar.gz