最後更新: 2019-06-05
目錄
- 設定檔
- Background 下載
- 續存
- 下載整個目錄
- veiw header
- Download list of file
- Mirror Website
- 下載目錄內某類檔案
- login
- Limit Speed
- 其他 Opts
- wget 401 then 200
- Drupal cron jobs
- Other Tools
前言
Speed Unit
- Unit: MB = MByte
wget http://x.x.x.x:8080/systemrescuecd-amd64-6.1.8.iso
--2022-06-15 10:36:12-- http://x.x.x.x:8080/systemrescuecd-amd64-6.1.8.iso Connecting to x.x.x.x:8080... connected. HTTP request sent, awaiting response... 200 OK Length: 716177408 (683M) [application/octet-stream] Saving to: ‘systemrescuecd-amd64-6.1.8.iso.1’ systemrescuecd-amd64-6.1.8 100%[=======================================>] 683.00M 17.8MB/s in 36s
Limitation
wget 不會 parallel download
如果想 parallel dl, 可以考慮用 aria2
設定檔
/etc/wgetrc
~/.wgetrc # For store password in wgetrc
wget --config=/path/to/wgetrc ... # 設定用其他 config file
Example: Background 下載
# -t, --tries= Set number of tries to number. Default:20, 0=inf.
# -o logfile Log all messages to logfile.
wget -t 3 -o log.txt http://link &
or
# -b, --background, If no "-o", it will log to wget-log
# -c, --continue Continue getting a partially-downloaded file
wget -b -t 3 -c http://link
Example: 續存
wget -c bigfile
# -c 續存
Example: 下載整個目錄
wget -cp http://link/directory/
# -p ‘--page-requisites’ # 下載 directory 內所有檔案
Example: veiw header
# 下載前會看到 header (--server-response)
wget -S http://web-site/
Example: Download list of file
wget -nc -i dl.file
# -i <file> file 內是一行一條 link 的
# -nc, --no-clobber 不再 Download 以存在的 File, 就算它未完整(與 -c 正好相反)
Mirror Website
方法1: recursive(-r)
-r, --recursive Create a mirror of the GNU web site (default 5 level)
-l,--level= 下載幾多層內的 file (nested levels)
-P /PATH saving them to /PATH ( Default PREFIX "." )
--convert-links view the documents offline
-N, --timestamping don't re-retrieve files unless newer than local.
-nX
-nd, --no-directories 不建立目錄 (假設 URL 是 A/B/C 沒有 -nd 時會建立 A/B/C 目錄)
-np, --no-parent not to recurse to the parent directory
-nH, --no-host-directories Disable generation of host-prefixed directories.
(http://fly.srk.fer.hr/ -> fly.srk.fer.hr/)
-I comma-separated list of directories included in the retrieval.
Any other directories will simply be ignored. The directories are absolute paths.
-L Follow relative links only, 以下的不是 relative links
- <a href="/foo.gif">
- <a href="/foo/bar.gif">
- <a href="http://www.server.com/foo/bar.gif">
-D <url> allows you to specify the domains that will be followed,
thus limiting the recursion only to the hosts that belong to these domains.
i.e.
# 下載回來後會有目錄結構 "dl.dahunter.org/mysql/c7/m80"
wget -r -l 1 -np https://dl.dahunter.org/mysql/c7/m80/
# 下載 file 到當前 "."
wget -r -l 1 -np -nd https://dl.dahunter.org/mysql/c7/m80/
# 轉換 html 檔的 link
wget --convert-links -N -l2 -P/tmp -r http://www.gnu.org/
Exclude directories
-X,--exclude-directories= list
A comma-separated list of directories. Elements of list may contain wildcards.
方法2: "-m"
wget -m -w 5 http://www.gnu.org/
- -m, --mirror 相當於 -N -r -l inf --no-remove-listing. ( -l inf 相當於 -l 0 )
- -k, --convert-links
- -w, --wait 下載一檔案後, 等一定時間才下載另一個, 單位 sec.
Example: 下載目錄內某類檔案
wget -r -l1 -A'.gif,.swf,.css,.html,.htm,.jpg,.jpeg' <url>
Example: Limit Speed
# 限速, 單位是 byte, 可以配合 k, m 使用
--limit-rate=100k
Example: 只下載較新的 file
-N, --timestamping
Login
i.e.
wget -O - ftp://USER:PASS@server/README
Remark
# -O - 把下載好的檔案內容 outpurt 到 -
# -O file
登入方式:
- USER:PASS@URL
- --user=USER --password=PASS # 不加 --password 是不會問 password 的
- --user=USER --ask-password # wget prompt for password
- --use-askpass=command # If no command is specified, ENV - WGET_ASKPASS is used
~/.wgetrc
chmod 600 ~/.wgetrc
~/.wgetrc
user=???? # ask_password = on/off password=????
其他 Opts
(-U)--user-agent="user agent"
--referer=
--accept=jpg,gif
--reject=html
--wait=5
wget 401 then 200
401: 時會由 Server 返回 realm="..."
200: wget 傳出 password 並 login
wget and most other programs request a basic authentication challenge from the server side before sending the credentials.
This is wget's default behavior since version 1.10.2.
You can change that behaviour using --auth-no-challenge option
Drupal cron jobs
-O, --output-document=file
-q, --quiet
0 * * * * wget -O - -q http://????? > /dev/null 2>&1 && touch /root/getlink
Notes
-nv
Turn off verbose without being completely quiet (use -q for that)
which means that error messages and basic information still get printed.
Other Tools
Parallel download tools