最後更新: 2022-03-21
介紹
pattern scanning and processing language
目錄
- test.txt 的內容
- Basic Usage
- filter
- 多行 program
- Run program(-f)
- Call by filter
- awk as script
- Print starting from column
- BEGIN & END
- 選擇某幾行(FNR)
- Column Select($N)
- 設定Column的分隔符號 (FS)
- filter 的 OR 與 AND
-
if ... else ...
Ternary operator ( ?:; ) - Feild Number (NF)
- 移除空行
- Maths: + - * / == != < > >= <= ~ !~
- Assign Variable & Call Variable
- 特別的 Variable
- String Functions
- printf
- Casting
- Associative Arrays
- 應用例子
- 多 Filter
- Capturing Groups
- awk - system()
test.txt 的內容
# Name Ages Salary
Beth 18 9000 Dan 23 Kathy 32 6400 Mark 28 7030 Mary 25
Basic Usage
awk 'Program' input-file1 input-file2 ...
Program 的格式
- action
- 'filter {action}'
/regex/ # 相當於 $0 ~ /regex/ $2 !~ /regex/ # $2 不中 regex
# $2 及 $4 的 Value 會連在一起(中間無空格)
awk '$2 !~ /th/ {print $2 $4}' log.txt
# commas(",")
awk '$2 !~ /th/ {print $2, $4}' log.txt
new line
Each print statement makes at least one line of output.
awk 'BEGIN { print "line one\nline two\nline three" }'
多行 program
awk '/This program is too long, so continue it\
on the next line/ { print $1 }'
Run program(-f)
# -f program-file # Read the AWK program source from the file program-file
prog.txt
{ # "{}" 依然要 keep print NF, $1 ... }
NF: The number of fields in the current input record.
awk -f prog.txt test.txt
3 Beth 2 Dan 3 Kathy 3 Mark 2 Mary
Call by filter
awk-filter.awk
#!/usr/bin/awk -f BEGIN { print "Name\tSalary" } { print $1, "\t", $3} END { print " - DONE -" }
chmod o+x awk-filter.awk
cat test.txt | ./awk-filter.awk
Remark
# -f program-file # Read the AWK program source from the file program-file
awk as script
chmod o+x myscript.awk
myscript.awk:
#!/usr/bin/awk -f
BEGIN{
mystr="abcdefgh"
print length(mystr)
}
BEGIN Block is needed if you are not providing any input to awk either by file or standard input.
This block executes at the very start of awk execution even before the first file is opened.
Print starting from column
# Print all but the first column:
awk '{$1=""; print $0}' somefile
# Print all but the first two columns:
awk '{$1=$2=""; print $0}' somefile
# i=start, i<stop
awk '{for(i=1;i<=NF-1;i++) printf $i" "; print ""}' somefile
BEGIN & END
BEGIN: 在所有 row 之前執行
END: 行完所有 row 之後執行
Example: 計算月份 Jan size
ls -l | awk '$6 == "Jan" { sum += $5 } END { print sum }'
Example: 直行 program, 不用 input (stdin / file)
awk 'BEGIN { print "line one\nline two\nline three" }'
選擇某幾行(FNR)
# FNR: The input record number in the current input file
awk 'FNR == 2 {print}' test.txt
* FNR 由 1 開始數, 不是由 0
Column Select($N)
As each input record is read, gawk splits the record into fields, ($1, $2 ...)
using the value of the env FS variable / "-F fs" as the field separator. (預設係空格/Tab)
ie. 1
awk '{ print $1,$3 }' test.txt
ie.2
awk '$3 > 0 {print $1, $2 * $3}' test.txt
* $0 whole line
* Last field $NF
ie. 3
awk '$1 == "findtext" {print $3}' filename
* 一定要 "" value
設定Column的分隔符號 (FS)
awk 'BEGIN { FS = "," } ; { print $2 }'
Default FS: space and tab act as field separators
regexp
# 找出中某 pattern 的 row
ls -l | awk '/-rw-------/ {print $9}'
# This method uses regexp, it should work:
awk '$2 ~ /findtext/ {print $3}' filename
- \ # This is used to suppress the special meaning of a character
- ^ # This matches the beginning
- $ # end of a string.
- [...] # This is called a character list.( [A-Za-z0-9] )
- [^ ...] # It matches any characters except those in the square brackets
- | # This is the alternation operator
- . # matches any single character(including the newline character) (1)
- * # as many times as necessary (any)
- + # at least once (>1)
- ? # once or not (0,1)
[...]
\]
設定 digits_regexp
BEGIN { digits_regexp = "[[:digit:]]+" } $0 ~ digits_regexp { print }
集合
- '[0-9]' is equivalent to '[0123456789]'
- [:alnum:] Alphanumeric characters
- [:alpha:] Alphabetic characters
- [:blank:] Space and TAB characters
- [:digit:] Numeric characters
- [:xdigit:] Characters that are hexadecimal digits
- [:space:] Space characters (these are: space, TAB, newline, carriage return, formfeed and vertical tab)
Example
The following condition prints the line if it contains the word "whole" or columns 1 and 2 contain "part1" and "part2" respectively.
($0 ~ /whole/) || (($1 ~ /part1/) && ($2 ~ /part2/)) {print}
This can be shortened to
/whole/ || $1 ~ /part1/ && $2 ~ /part2/ {print}
The following prints all lines between 20 and 40:
(NR==20),(NR==40) {print}
filter 的 OR 與 AND
AND --> &&
OR --> ||
test.txt
Y 1 N 2 X 3 Y 4 N 5 X 6
兩 filter (OR)
# 相當於 ls -l | grep -e Nov -e Dec
# filter1 filter2 cat test.txt | awk '$1~/Y/{print $2}; $1~/N/{print $2}'
可寫成
($1~/Y/ || $1~/N/){print $2}
awk multiple condition (AND)
awk 'BEGIN {FS=":"}; $3>500 && $3<65534 { print $1 }' /etc/passwd
"&&" and "||". There is also the unary not operator: "!".
if ... else ...
Syntax:
if (conditional-expression) { action1; action2; }
"{" 及 "}" 非必須
if(conditional-expression1) action1; else if(conditional-expression2) action2; else action n;
i.e.
if ($(NF+1) != "") print "can't happen" else print "everything is normal"
Ternary operator ( ?: )
Syntax
(conditional-expression) ? expr1 : expr2 ;
Example: 找出大那個的值
# 錯誤
awk 'BEGIN { a = 10; b = 20; (a > b) ? print a : print b;}'
原因: Ternary operator 的結果是 expr
expression v.s. statements
statements are actions that are carried out, while expressions are computations that result in a value.
# 正確寫法
awk 'BEGIN { a = 10; b = 20; (a > b) ? max = a : max = b; print max}'
Feild Number (NF)
awk '{ print NF, $1 }' test.txt
Output:
3 Beth 3 Dan 2 Kathy 3 Kathy 3 Mark 2 Mary 3 Susie
移除空行
awk 'NF > 0' input.txt > output.txt # 相當於 sed -i '/^$/d' input.txt
Maths: + - * / == != < > >= <= ~ !~
awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
Assign Variable
Local Variable
cat test.txt
1 2 3 4
awk '$1>2{MyVar=$1; print MyVar}' test.txt
3 4
Outside Variable
-v var=value # assigns value to program variable var
* Such variable values are available to the BEGIN block of an AWK program
/root/test.txt
1 a 2 b 3 c
awk -v myval=mymsg '{print myval ": " $2;}' /root/test.txt
output
mymsg: a mymsg: b mymsg: c
* var 不用加 "$" 號 !!
特別的 Variable
FNR # The current record number in the current file
NF # The number of fields in the current input record
NR # The number of input records awk has processed since the beginning of the program's execution
BEGIN # 行完所有 row 後執行
END
String Functions
- length(string)
- index(string,search)
- match(str, regex)
- substr(string, position [, length])
- sub(regexp, replacement [, target])
- gsub(r, s [, t])
- split(string, array, separator)
- strtonum(str) # 12, 012, 0x12
- tolower(str)
- toupper(str)
length(str)
test.awk
#!/usr/bin/awk -f BEGIN{ mystr="abcdefgh" print length(mystr) }
index(str, char)
* start with 1
test.awk
#!/usr/bin/awk -f BEGIN{ sentence = "This is a short, meaningless sentence."; if (index(sentence, ",") > 0) { printf("Found a comma in position %d \n", index(sentence, ",")); } }
match(str, regex)
It returns the index of the first longest match of regex in string str.
It returns 0 if no match found.
The regexp argument may be either a regexp constant (/…/) or a string constant ("…").
In the latter case, the string is treated as a regexp to be matched.
substr(string, position [, length])
# 提取 String 的其中一段. 當沒有設定 length 時, 那就會由 position 去到尾
#!/bin/awk -f { print substr($0, 40, 10) }
sub(regexp, replacement [, target])
sub = substitution
This function performs single substitution. (It replaces the first occurrence)
(for the leftmost, longest substring matched by the regular expression regexp)
The third parameter(target) is optional. If it is omitted, $0 is used.
str = "water, water" sub(/at/, "ith", str) print str # wither, water
i.e. Remore String
# 當 $3 是 "%[email protected]" 時,
# Output 可轉成 [email protected]
'sub(/%sasl_username=/,"",$3); print $3'
gsub(r, s [, t])
g = global
For each substring matching the regex r in the string t,
substitute the string s, and return the number of substitutions.
split(string, array, search)
# This script breaks up the sentence into words, using a space as the character separating the words
* array 係由 1 開始數起
#!/usr/bin/awk -f BEGIN { string = "This is a string, is it not?"; search = " "; n = split(string, array, search); for (i=1; i<=n; i++) { printf("Word[%d]=%s\n", i, array[i]); } exit; }
Result:
Word[1]=This Word[2]=is Word[3]=a Word[4]=string, Word[5]=is Word[6]=it Word[7]=not?
printf
有了 printf, 我地可以 print no newline
print will insert a newline => use printf instead
{printf "%s", $n}
Usage
printf format, item1, item2, …
Format-Control Letters
- %c # Print a number as a character; thus, ‘printf "%c", 65’ outputs the letter ‘A’.
- %d, %i # Print a decimal integer.
- %f # Print a number in floating-point notation. (printf "%4.3f", 1950)
- %s # Print a string.
Modifiers
width
This is a number specifying the desired minimum width of a field. (pad with spaces)
( Inserting any number between the ‘%’ sign and the format-control character)
- (Minus)
The minus sign, used before the width modifier
.prec
A period followed by an integer constant specifies the precision to use when printing.
i.e.
ps -axl | awk '{printf "%3d %s" "\n", $6, $13}'
Casting
int()
test.txt
'1' '12' '123'
cat test.txt | awk '{gsub(/[^0-9]/,"",$0); print $0}'
1 # 這裡的 1, 12, 123 都不是數字來. 12 123
Casting
cat test.txt | awk '{gsub(/[^0-9]/,"",$0); if (int($0) > 11){print $0} }'
Associative Arrays
Set
array[index-expression] = value
Get
if (a["foo"] != "") … { for (i = 1; i <= NF; i++) used[$i] = 1 }
應用例子
#1 找出 sasl user的 login 及時間
grep sasl_username /var/log/maillog > report.tmp
awk -f sasl_login.awk report.tmp > report.txt
sasl_login.awk
{ # remove tag 's name sub(/sasl_username=/,"",$9) sp = index($7, "[") np = index($7, "]") len = np - sp - 1 ip = substr($7, sp + 1, len) print $1" "$2" "$3" "ip"\t"$9 }
Notes
* IP 有長有短, 所以不應用空格, 要用 "\t"
report.tmp
Aug 14 04:03:00 <HOSTNAME> <Daemon> MID: client=unknown[R.R.R.R], sasl_method=LOGIN, sasl_username=U@D
#2 當前 TCP connection 的種類
netstat -ant | awk 'FNR>2{print $6}' | sort | uniq -c | sort -n
2 TIME_WAIT 4 FIN_WAIT2 7 ESTABLISHED 8 LISTEN
P.S.
Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN ...
多 Filter
get_outgoing_mail_statistics.sh
#!/bin/bash quota=1 /usr/sbin/postfwd --dumpcache |\ awk \ '($7 == "$count"){ \ sub(/%sasl_username=/,"",$3); \ gsub(/[^0-9]/,"",$9); \ if (int($9) >= '$quota'){printf $3 " " $9}};\ ($7 == "$maxcount"){ \ gsub(/[^0-9]/,"",$9); \ {print "/"$9}}; '
Capturing Groups
match(string, regexp [, array])
arr[0] # entire portion of string matched by regexp
arr[1], arr[2] ... # contain the portion of string matching the corresponding parenthesized subexpression
* 如果 array 已經存在, 那它會被清空先.
應用
Postfix smtp 的 mail flow 最終有兩個 status, 分別是
- status=bounced
- status=sent
# "status=bounced" => /var/log/mail
# $1$2" "$3" "$7" => 日期時間 to=<r@R>, bounced|sent
echo "status=bounced" | awk 'match($0, /status=(bounced|sent)/, arr) {print "$1 $2 $3" - "$7" "arr[1]}'
awk - system()
system() = awk system() function
ipcs -s | awk -v user=apache '$3==user {system("ipcrm -s "$2)}'