更新時間: 2019-07-08
目錄
壓縮比
用解壓後的 clonezilla 的 filesystem.squashfs 測試壓縮
壓縮比: zstd, lz4, zstd < gzip, zip < bzip < xz, lzma (7zip)
# Size: 1.1G
tar -cf squashfs-root.tar squashfs-root
# Result: 341M
gzip < squashfs-root.tar > squashfs-root.tar.gz
# Result: 341M
zip squashfs-root.tar.zip squashfs-root.tar
# Result: 309M
bzip2 < squashfs-root.tar > squashfs-root.tar.bz
# Result: 217M
xz -k squashfs-root.tar
tar
Useful opts:
- --numeric-owner # always use numbers for user/group names
- --exclude=PATTERN
- -C, --directory DIR # change to directory DIR
- -t, --list
- --strip-components count # strip NUMBER leading components on extraction
-
-a, --auto-compress # use archive suffix to determine the compression program
# 接受 "tar.xz", 不接受 "tzx"
Folder 尾的 "/" 不影響
tar -cvf MyFolder.tar MyFolder
tar -cvf MyFolder.tar MyFolder/
測試 tar 檔(-t, --list)
tar -tf test.tar
查看檔內檔案的 uid 及 gid
tar --numeric-owner -vtf test.tar
drwxr-xr-x 0/0 0 2023-12-24 21:23 test/ -rw-r--r-- 0/0 0 2023-12-24 21:22 test/test.txt
解開來自不同系統的 tar 檔時一定要用到它
tar --numeric-owner -zxf ubuntu-12.04.tar.gz -C /lxc/u12/rootfs/
個別 subfolders / files 不打包(--exclude)
整個 uploads Folder 不 keep (public_html/uploads)
tar -czvf backup.tar.gz public_html --exclude "public_html/uploads"
* 支援 multiple exclude
* use shell syntax, or globbing, rather than regexp syntax
* public_html/uploads2 不會中被 exclude
* "--exclude public_html/uploads/" 會令 exclude 無效 !!
* "--exclude uploads" 所有層內的uploads都會被 exclude
PATTERN
- '*' or '?' # Globbing
- '[a-e]' # character class
保留 uploads Folder, 但沒有內容
tar -czvf backup.tar.gz ./public_html --exclude "./public_html/uploads/*"
* quoting the patterns to protect them from shell expansion
Exclude By Full Path
# "./" 必須對應
tar -zcf public_html.tar.gz --exclude=./public_html/tmp ./public_html
* 此方法(Full Path)不會 exclude 其他層內的 "public_html/tmp"
(i.e. public_html/test/public_html/tmp)
測試
mkdir public_html/tmp public_html/test/public_html/tmp -p
touch public_html/a.txt public_html/tmp/b.txt public_html/test/public_html/tmp/c.txt
tar -zcf public_html.tar.gz --exclude=./public_html/tmp ./public_html
tar -ztf public_html.tar.gz
P.S.
# 對比
tar -zcf public_html.tar.gz --exclude=public_html/tmp ./public_html
Exclude files matching patterns listed in FILE
-X, --exclude-from=FILE
解開到另一個地方
tar zxf ~/backup.tgz -C /You/Path files
打包時不保持頂層的目錄
tar --strip-components=3 -zxf test.tar.gz a/b/c/test.txt
tar 的安全性測試("-t")
破壞一個 tar 檔
xxd -l 16 -s 102400 etc.tar
dd if=/dev/urandom of=etc.tar bs=1 count=16 conv=notrunc seek=102400
xxd -l 16 -s 1048576 etc.tar
tar -tf etc.tar
echo $?
Result 境然係無 Error ...
破壞一個 tar.gz 檔
xxd -l 16 -s 102400 etc.tar.gz
dd if=/dev/urandom of=etc.tar.gz bs=1 count=16 conv=notrunc seek=102400
xxd -l 16 -s 102400 etc.tar.gz
# stdout 會出 file list, stderr 才顯示 error
tar -ztf etc.tar.gz > /dev/null
tar: Skipping to next header gzip: stdin: invalid compressed data--crc error gzip: stdin: invalid compressed data--length error tar: Child returned status 1 tar: Error is not recoverable: exiting now
tar extract single file
tar -zxvf public_html.gz public_html/cms/config.php
# 會保留了層的結構
ls public_html/cms/config.php
Un-Pack bz2, xz
tar -jxf Folder.tar.bz2 Folder
tar -Jxf Folder.tar.xz Folder
zip 與 unzip
注意: 在 Debian 上, zip 與 unzip 分別是在兩個獨立的 package 內
unzip:
Usage:
unzip [opts] file[.zip] [file(s) ...] [-x xfile(s) ...] [-d exdir]
測試原整性:
# -t test archive files, extracts each specified file in memory and compares the CRC
root@debian6:~# unzip -t cms.zip
查看有什麼 file:
unzip -l cms.zip
Length Date Time Name --------- ---------- ----- ----
查看 comment:
unzip -z cms.zip
就地解開:
unzip cms.zip
解到某 Folder:
-d 解開到另一個目錄內
unzip nic-traffic.sh.zip -d wts/
其他相關選項:
-X restore UID/GID info
-o overwrite files WITHOUT prompting
# the TZ (timezone) environment variable must be set correctly in order for -f and -u to work properly
# By default unzip queries before overwriting, but the -o option may be used to suppress the queries.
-u update existing files and create new ones if needed.
-f freshen existing files only
zipinfo
# list header line. The archive name, actual size (in bytes) and total number of files is printed.
zipinfo archive.zip | less
Archive: archive.zip Zip file size: 6218637709 bytes, number of entries: 103847 <-- 真實 filesize drwxr-x--- 3.0 unx 0 bx stor 22-Mar-15 17:08 archive/ drwx------ 3.0 unx 0 bx stor 16-Sep-08 10:02 archive/vincent/ -rw------- 3.0 unx 95 bx defN 16-Sep-08 10:02 archive/vincent/.dovecot.lda-dupes ... 103847 files, 10557281066 bytes uncompressed, 6187645423 bytes compressed: 41.4%
Opts
-h # list header line. The archive name, actual size (in bytes) and total number of files is printed.
-t # list totals for files listed or for all files.
Note that
the total compressed (data) size will never match the actual zipfile size,
since the latter includes all of the internal zipfile headers in addition to the compressed data.
zip:
zip [option] [-r] file.zip [list] ..... [-xi list]
功能:
- -e 加密
- -c 加 comment
操作:
- -r # --recurse-paths, including files with names starting with "."
- -f # freshen: only changed files
- -u # 更新及加檔案, 不過不會刪除檔案
- -d # delete entries in zipfile
Linux 特性:
- -K keep setuid/setgid/tacky permissions
- -y store symbolic links
Filter:
- -x exclude the following names
- -i include only the following names
- -n don't compress these suffixe
Compression level
-0 indicates no compression
-6 Default
-9 optimal compression
Example:
打包一個目錄:
zip -r cms.zip cms
不連目錄打包
cd portal
zip -r ../portal.zip *
加注:
# Prompt for a multi-line comment for the entire zip archive.
# ended by a line containing just a period
zip -z cms.zip
zip -z foo < foowhat
加密:
zip -e -r cms.zip cms
更新:
zip -u -r cms.zip cms
xz compression
lossless data compression
- stripped-down version of the 7-Zip
- incorporates the LZMA/LZMA2
- single files as input
Install
apt-get install xz-utils
yum install xz
獲得
- /usr/bin/unxz (-d)
- /usr/bin/xz (-z)
- /usr/bin/xzless
- /usr/bin/xzcat
- /usr/bin/xzgrep
- /usr/bin/xzdiff
- /usr/bin/lzmainfo
xz --version
xz (XZ Utils) 5.1.0alpha liblzma 5.1.0alpha
Usage
* xz / unxz 後原本的 file 會不見了, 所以記得加 "-k" # -k = Don't delete the input files.
xz
-TN, --threads=NUM # use at most N threads; Default 1; set to 0 All processor cores
# memory usage 係以 threads 上的
-eN, --extreme # try to improve compression ratio by using more CPU time;
# does not affect decompressor memory requirements
-v, --verbose # If standard error is connected to a terminal, xz will display a progress indicator.
--- % 2.5 MiB / 15.2 MiB = 0.162 1.8 MiB/s 0:08
- Completion percentage is shown if the size of the input file is known.
- Amount of compressed data produced
- Amount of uncompressed data consumed
- Compression ratio
- Compression or decompression speed
- Elapsed time in the format M:SS or H:MM:SS.
- Estimated remaining time
Compression preset levels with memory usage
Preset DictSize CompCPU CompMem DecMem -6 8 MiB 6 94 MiB 9 MiB -7 16 MiB 6 186 MiB 17 MiB -8 32 MiB 6 370 MiB 33 MiB -9 64 MiB 6 674 MiB 65 MiB
unxz
unxz is equivalent to xz --decompress
xzcat is equivalent to xz --decompress --stdout
- -t # Test the integrity of compressed files
- -l # Print information about compressed files
- -k, --keep # Don't delete the input files.
- -d, --decompress
ie.
xz -l 2023-05-03-raspios-bullseye-armhf-lite.img.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename 1 10 363.9 MiB 1,876.0 MiB 0.194 CRC64 2023-05-03-raspios-bullseye-armhf-lite.img.xz
xzcat 2023-05-03-raspios-bullseye-armhf-lite.img.xz > /dev/sdX
ENV - XZ_OPT
This is for passing options to xz when it is not possible to set the options directly on the xz command line.
(e.g. xz is run by tar)
Usage
XZ_OPT="-T0 -8 -v" tar Jcf foo.tar.xz foo
gzip
Lempel-Ziv coding (LZ77)
If no files are specified, or if a file name is "-" , the stdin is compressed to the stdout.
opts
- -d --decompress
- -c --stdout # Write output on standard output; keep original files unchanged.
- -t --test
- -l --list # 與 -v 一起使用會見到 method crc date
- -v --verbose
i.e.
-l
du -sh *
452G 20220210 23G 20220210.tar.gz
gzip -l 20220210.tar.gz
compressed uncompressed ratio uncompressed_name 24360201377 3415758848 -613.2% 20220210.tar
compressed size: size of the compressed file
uncompressed size: size of the uncompressed file
gzip uncompressed file estimation wrong?
原因:
gzip spec (RFC 1952) - ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input data modulo 2^32.
The "gzip" format represents the the input size modulo 2^32,
so the uncompressed size and compression ratio are listed incorrectly for uncompressed files 4 GB and larger.
-dc
gzip -dc test.tar.gz > test.tar
zip without delete original
gzip < access.log > access.log.gz
Unzipping a .gz file without removing the gzipped
gzip -d < file.gz > file
OR
zcat somefile.gz > somefile
gzip slow
gzip slow despite(儘管) CPU and hard drive performance not being maxed out
=> 放棄 gzip 改用 lz4
bzip2
bzip2 = a block-sorting file compressor
It using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding.
* bzip2 uses 32-bit CRCs
* the default 900k block size, bunzip2 will require about 3700 kbytes to decompress. (100k + ( 2.5 x block size ))
-d --decompress
-t --test
-c --stdout
-k --keep # Keep (don’t delete) input files during compression or decompression.
-1 (or --fast) to -9 (or --best) <- Default
Set the block size to 100 k, 200 k .. 900 k when compressing.
Other bin
bzcat - decompresses files to stdout
pigz
stands for parallel implementation of gzip
Install
apt-get install pigz
Compression
* The input is broken up into 128 KB chunks with each compressed in parallel
* The compressed data format generated is in the gzip, zlib, or single-entry zip format
using the deflate compression method.
* The compression produces partial raw deflate streams which are concatenated by a single write thread
and wrapped with the appropriate header and trailer, where the trailer contains the combined check value.
Example
# -k --keep Do not delete original file after processing.
# -p n number of compress threads. Default: ALL Core
# -b nK input block size. Default: 128 # 128KiB
# -N The default is -6 ( --fast=-1 )
# 2 CPU Core, 1MiB Block Size
pigz -k -p 2 -b 1024 --fast --rsyncable filename # 會建立 filename.gz
rsyncable
-R --rsyncable
Input-determined block locations for rsync. it can re-sync inside the file
rsyncable which produces slightly larger archive that is more ‘friendly’ for the rsync algorithm
using regular gzip or bz2 leads to output files having too much differences between each dump,
leading to very large diffs produced by the rsync algorithm of rdiff.
Format
Default: gz
- -K --zip # PKWare zip (.zip)
- -z --zlib # zlib (.zz)
Other Opts
- -t --test # Test the integrity of the compressed input.
Decompression
unpigz OR -d (--decompress)
* Decompression can't be parallelized
(But will create three other threads for reading, writing, and check calculation)
ie.
pigz -k -d esxi6.7.raw.gz
lz4
Intro
* focused on compression and decompression speed.
(lz4 offers compression speeds of 400 MB/s per core, linearly scalable with multi-core CPUs.)
* It belongs to the LZ77 family of byte-oriented compression schemes.
* LZ4 was also implemented natively in the Linux kernel 3.11
modinfo lz4
filename: /lib/modules/4.15.0-999-generic/kernel/crypto/lz4.ko
LZ77
LZ77 algorithms achieve compression by replacing repeated occurrences of data
with references to a single copy of that data existing earlier in the uncompressed data stream.
A match is encoded by a pair of numbers called a length-distance pair,
which is equivalent to the statement
"each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream"
To spot matches, the encoder must keep track of some amount of the most recent data, such as the last 2 kB, 4 kB, or 32 kB.
The structure in which this data is held is called a sliding window, which is why LZ77 is sometimes called sliding-window compression.
Install
# U16.04
apt-get install liblz4-tool
Version
lz4 -V
*** LZ4 command line interface 32-bits r131, by Yann Collet (Jan 25 2017) ***
Opts
- -z, --compress ( -#, 1(default)~9 )
- -d, --decompress
- -k, --keep # Don't delete source file (Default behavior)
- -t, --test
- -B# # block size [4-7](default : 7), B4= 64KB ; B5= 256KB ; B6= 1MB ; B7= 4MB
Benchmark test
- -b
- -iN # iteration loops [1-9](default : 3)
tar with lz4
# -I, --use-compress-program PROG. filter through PROG (must accept -d)
tar -I=lz4 -cf OUTPUT.tar.gz paths_to_archive
OR
tar cf - paths_to_archive | lz4 > OUTPUT_FILE.tar.gz
cpio
Copy-out Mode: Copy files named in name-list to the archive
cpio -o < name-list > archive
find . -depth -print | cpio -o > /path/archive.cpio
Copy-in Mode: Extract files from the archive
cpio -i < archive
Copy-pass Mode: Copy files named in name-list to destination-directory
cpio -p destination-directory < name-list
List: Print a table of contents of all the inputs present.
cpio -t < archive.cpio
-v, –verbose: List the files processed in a particular task.
zstd
zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files
It is based on the LZ77 family. In fast modes(--fast) at > 200 MB/s per core
- zstdmt is equivalent to zstd -T0 # ALL Core
- unzstd is equivalent to zstd -d
- zstdcat is equivalent to zstd -dcf
Opts
-N # Compression level: 1-19, default: 3, --fast=>1
-c, --stdout # Force write to standard output
-T#, --threads=N # default: 1; 0 = ALL
-t, --test
-bN # Benchmark file(s) using compression level
-l, --list
--rsyncable # zstd will periodically synchronize the compression state to make the compressed file more rsync-friendly.