最後更新: 2023-09-12
目錄
- 術語
- Install
- tc 指令 Syntax
- Token bucket (algorithm)
- Units
- pfifo_fast (Linux default qdisc)
- fifo (First In, First Out)
- SFQ (PRIO qdisc)
- Incoming traffic
- roots, handles, siblings and parents
- Timer Interrupt Frequency Configuration
- NetEm - Network Emulator
- tc Filter (filtering commands)
- Limit Outgoing (HTB)
- Over root traffic
- tc on container node
- The Intermediate queueing device (IMQ)
- Doc
術語
qdisc (Queue Discipline)
* Queueing: Determine the way in which data is SENT
-
pfifo_fast - FIFOs with band (default)
它一共有 3 個 band: 0(H) ~ 2(L)
man tc-pfifo -
sfq - Stochastic Fair Queuing
Full Bandwith Usage 時才有用 -
htb - Hierarchy Token Bucket
CLASSFUL
replacement for the CBQ (支援層的 borrow)
man tc-htb -
tbf - Token Bucket Filter (Default)
CLASSLESS
man tc-tbf - cbq
-
fq_codel - Fair Queuing (FQ) with Controlled Delay (CoDel)
man tc-fq_codel
Classful qdisc
A classful qdisc contains multiple classes. Some of these classes contains a further qdisc
Level
class determines its position in hierarchy. Leaves has level 0, root classes LEVEL_COUNT-1
Scheduling
A qdisc may, with the help of a classifier,
decide that some packets need to go out earlier than others.
This process is called Scheduling ( Example: pfifo_fast )
Classes
- A class, in turn, may have several classes added to it.
- A classful qdisc may have many classes
- A leaf class is a class with no child classes
-
When you create a class, a fifo qdisc is attached to it.
(When you add a child class, this qdisc is removed)
Classifier
Each classful qdisc needs to determine to which class it needs to send a packet.
Shaping
The process of delaying packets before they go out to make traffic confirm to a configured maximum rate.
Shaping is performed on egress.
Policing
Delaying or dropping packets in order to make traffic stay below a configured bandwidth.
In Linux, policing can only "drop" a packet and not delay it
non-Work-Conserving
Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth.
This means that they sometimes refuse to pass a packet, even though they have one available.
Ingress Qdisc
This happens at a very early stage, before it has seen a lot of the kernel.
It is therefore a very good place to drop traffic very early, without consuming a lot of CPU power.
dequeueing
The packet now sits in the qdisc,
waiting for the kernel to ask for it for transmission over the network interface.
Install
yum -y install iproute # Centos 7
yum -y install iproute-tc # Centos 8
tc 指令 Syntax
tc [ OPTIONS ] OBJECT COMMAND dev <eth0 | ppp0> [ parent n:m | root ]
OBJECT:
- qdisc
- class
- filter
- action
- monitor
action:
- add
- del
- show
- replace
dev
- 要是 primary interface, 不可以是 eth0:0
parent
- n:m
- root
tc qdisc
elementary to understanding traffic control.
tc class
Some qdiscs can contain classes, which contain further qdiscs
traffic may then be enqueued in any of the inner qdiscs, which are within the classes.
tc filter
A filter is used by a classful qdisc to determine in which class a packet will be enqueued.
COMMANDS
add, remove, change, replace( Performs a nearly atomic remove/add on an existing node id.)
Token bucket (algorithm)
- A token is added to the bucket every 1/r seconds.
- token:data => 1:1
- The bucket can hold at the most b tokens.
Average rate = r
accumulation of token => allows a short burst
* 10mbit/s on Intel, you need at least 10kbyte buffer
* latency => maximum amount of time a packet can sit in the TBF
* mpu => A zero-sized packet does not use zero bandwidth.(size < 64 bytes)
* peakrate 的高抵與 bucket size 有關
* due to the default 10ms timer resolution(CONFIG_HZ_?) of Unix, with 10.000 bits average packets,
(we are limited to 1mbit/s of peakrate!)
應用
# If you have a networking device with a large queue, like a DSL modem or a cable modem,
# and you talk to it over a fast device, like over an ethernet interface,
# you will find that uploading absolutely destroys interactivity.
tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540
Units
SI prefix (k-, m-, g-, t-) # 1000
IEC prefix (ki-, mi-, gi- and ti-) # 1024
i.e.
Bits per second(bit)
- kbit
- mbit
Bytes per second(bps)
- kibps
- mibps
pfifo_fast (Linux default qdisc)
pfifo_fast: "fast" provides three different bands (individual FIFOs)
0 (highest priority)
1
2
* Within a particular class packets are sent in the order they arrived.
* pfifo_fast does not delay packets - it sends them at the speed the device can accept them
Mapping
TOS (它一共有 4 bit, 總共 16 個組合)
Binary Decimcal Meaning ----------------------------------------- 1000 8 Minimize delay (md) 0100 4 Maximize throughput (mt) 0010 2 Maximize reliability (mr) 0001 1 Minimize monetary cost (mmc) 0000 0 Normal Service
TOS 一共有 4 組
TOS Bits Means Linux Priority Band ------------------------------------------------------------ 0x0 0 Normal Service 0 Best Effort 1 0x2 1 Minimize Monetary Cost 1 Filler 2 0x4 2 Maximize Reliability 0 Best Effort 1 0x6 3 mmc+mr 0 Best Effort 1 0x8 4 Maximize Throughput 2 Bulk 2 0xa 5 mmc+mt 2 Bulk 2 0xc 6 mr+mt 2 Bulk 2 0xe 7 mmc+mr+mt 2 Bulk 2 0x10 8 Minimize Delay 6 Interactive 0 0x12 9 mmc+md 6 Interactive 0 0x14 10 mr+md 6 Interactive 0 0x16 11 mmc+mr+md 6 Interactive 0 0x18 12 mt+md 4 Int. Bulk 1 0x1a 13 mmc+mt+md 4 Int. Bulk 1 0x1c 14 mr+mt+md 4 Int. Bulk 1 0x1e 15 mmc+mr+mt+md 4 Int. Bulk 1
Linux Priority => 對應 priomap 第幾個位的值
Service 的 TOS
TELNET 1000(8) (minimize delay) FTP Control 1000(8) (minimize delay) Data 0100(4) (maximize throughput)
查看 priomap 的 對應
tc qdisc show dev eth0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
priomap classForPrio_0 classForPrio_1 ... classForPrio_15
查看 TOS
用 "tcpdump -v -v" 才看到
fifo (First In, First Out)
bfifo (Byte limited)
tc qdisc ... add bfifo [ limit bytes ]
pfifo (Packet limited)
tc qdisc ... add pfifo [ limit packets ]
limit => Maximum queue size.
If the list is too long, no further packets are allowed on. This is called 'tail drop'.
* [p|b]fifo, pfifo_fast (CLASSLESS)
# 用圖
If you don't want to shape, but only want to see if your interface is so loaded that it has to queue
# To list current rules
tc [-s] qdisc show [dev ethX]
-s[tatistics]
i.e.
tc -s qdisc show
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 30482712 bytes 528027 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0
No packets were dropped => not slow down packets
limited FIFO
limit default = interface txqueuelen
tc qdisc add root dev <eth0> pfifo [ limit packets ]
tc qdisc add root dev <eth0> bfifo [ limit bytes ]
tc qdisc change dev eth0 pfifo limit 100
Default:
For pfifo, it defaults to the interface txqueuelen
For bfifo, it defaults txqueuelen X MTU
SFQ (PRIO qdisc)
* SFQ is only useful in case your actual outgoing interface is really full!
* If you were only to run SFQ, nothing would happen, as packets enter & leave your router without delay
'Stochastic' because it doesn't really allocate a queue for each session,
it has an algorithm which divides traffic over a limited number of queues using a hashing algorithm. (perturb)
Set
tc qdisc add dev ppp0 root sfq perturb 10
perturb # Reconfigure hashing once this many seconds.
limit # The total number of packets that will be queued by this SFQ (after that it starts dropping them)
Show
tc -s -d qdisc ls
limit 128p flows 128/1024 perturb 10sec
(1) 128 packets can wait in this queue
(2) 128 can be active at a time
(3) 1024 hashbuckets available for accounting
(4) every 10 seconds, the hashes are reconfigured
應用
1: root qdisc / | \ / | \ 1:1 1:2 1:3 classes | | | 10: 20: 30: qdiscs qdiscs sfq tbf sfq band 0 1 2
CLI
tc qdisc add dev eth0 root handle 1: prio
# This *instantly* creates classes 1:1, 1:2, 1:3
tc qdisc add dev eth0 parent 1:1 handle 10: sfq tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000 tc qdisc add dev eth0 parent 1:3 handle 30: sfq
** The bands are classes, and are called major:1 to major:3 by default
Incoming traffic
To 'shape' incoming traffic which you are not forwarding, use the Ingress Policer.
Incoming shaping is called 'policing', by the way, not 'shaping'.
roots, handles, siblings and parents
handle = a unique identifier within the traffic control structure for class and classful qdisc
("handle" "1:" just a name or identifier)
major:minor # x:y
1) the object as a qdisc if minor is 0. (x: / x:0) Any other value identifies the object as a class.
2) All classes sharing a parent must have unique minor numbers.
qdisc 與 class
########## classifier chain ########## Kernel ======================================== 1: # root qdisc (1:0) | 1:1 # child class / | \ / | \ 1:10 1:11 1:12 # child class | | | | 11: | # leaf class | | 10: 12: # qdisc / \ / \ 10:1 10:2 12:1 12:2 # leaf classes ======================================== NIC
* Packets get enqueued and dequeued at the root qdisc
* classes never get dequeued faster than their parents allow.
* nested classes ONLY talk to their parent qdiscs, never to an interface.
1: -> 1:1 -> 1:12 -> 12: -> 12:12
相當於
1: -> 12:2
* A classful qdisc can only have children classes of its type.
For example, an HTB qdisc can only have HTB classes as children.
Timer Interrupt Frequency Configuration
a fast response for user interaction and that may experience bus
contention and cacheline bounces as a result of timer interrupts.
Note that the timer interrupt occurs on each processor in an SMP
environment leading to NR_CPUS * HZ number of timer interrupts per second.
egrep '^CONFIG_HZ_[0-9]+' /boot/config-`uname -r`
100 Hz is a typical choice for servers, SMP and NUMA systems with lots of processors
250 Hz is a good compromise choice allowing server performance
1000 Hz is the preferred choice for desktop systems
NetEm - Network Emulator
* setting on outgoing packets from the chosen network interface.
limit packets delay TIME loss PERCENT JITTER # loss = loss randomly corrupt PERCENT duplicate PERCENT reorder PERCENT rate RATE
Slow down traffic by 200 ms
1. To delete all rules
tc qdisc del dev eth0 root
# delay 100ms
tc qdisc add dev eth0 root netem delay 100ms
# show
tc -s qdisc ls dev eth0
qdisc netem 8001: root refcnt 2 limit 1000 delay 200.0ms
Sent 18994 bytes 305 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 1p requeues 0
# jitter: 30ms ~ 80 ms
tc qdisc replace dev eth0 root netem delay 50ms 30ms
1% random drop packet
tc qdisc add dev ens4 root netem loss 1%
* The smallest possible non-zero value: 2^32 (0.0000000232%)
tc Filter (filtering commands)
filtertype: u32
It extracts a bit field from a 32 bit word in the packet
Bases the decision on fields within the packet and if it is equal to a value supplied by you it has a match.
* lower priority(higher preference number) will be processed first (first prio win)
# attach to eth0, root 1:0
# set a priority 50 's u32 filter
# Remote 的 port 是 22
# send it to band 10:101
tc filter add dev eth0 protocol ip parent 1: \
prio 60 u32 \
match ip dport 22 0xffff \
flowid 1:101
支援:
- match ip dst 3.2.1.0/24 # dst, src, all
- match ip sport 80 0xffff # dport, sport
filtertype: fwmark (iptables)
iptables -A PREROUTING -t mangle -i eth0 -j MARK --set-mark 6
tc filter add dev eth1 protocol ip parent 1: prio 1 handle 6 fw flowid 1:1
# show
iptables -L -t mangle -n -v
Delete Filter Example
設定一個 filter 先
tc filter add dev eth0 parent 1: protocol ip handle 80 fw flowid 1:20
查看
tc filter show dev eth0
filter parent 1: protocol ip pref 49152 fw
filter parent 1: protocol ip pref 49152 fw handle 0x50 classid 20:
# Delete 它
tc filter del dev eth0 protocol ip pref 49152
Limit Outgoing (HTB)
HTB ensures that the amount of service provided to each class is
at least the minimum of the amount it requests and the amount assigned to it.
When a class requests less than the amount assigned,
the remaining (excess) bandwidth is distributed to other classes which request service.
* 每層 class 的 rate 的總和一定要小過上層才有效
* With HTB, you should attach all filters to the root !!
* Each node within the tree can have its own filters
* HTB use of the outbound bandwidth on a given link
* each class has a single parent
* each class contains a "leaf" qdisc which by default has pfifo
* one root class cannot borrow from another root class
Doc
- Homepage: http://luxik.cdi.cz/~devik/qos/htb/
- Doc: http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm
- HTB theory: http://luxik.cdi.cz/~devik/qos/htb/manual/theory.htm
Example 1: Sharing Hierarchy with u32
Diagram
qdisc (1:) # attach htb | _1:1_ # root class. 80mbit Level 3 / \ 1:11 1:12 # leaf class: 1:12; child class: 1:11 Level 2 / \ 1:21 1:22 # Level 1 | | 21: 22: # Level 0 (leaf: 21:, 22:)
source port map to class
- 1:12 - *
- 1:21 - 8021/tcp
- 1:22 - 8022/tcp
[1] Delete existing rules
tc qdisc del dev eth0 root iptables -t mangle -F # 非必要時勿行
P.S.
tc qdisc del dev eth0 root # 不只 qdisc, 會連 tc filter 一起 del 埋
當 interface 沒有 qdisc 時, 會見到
Error: Cannot delete qdisc with handle of zero. # OS: Rocky8
[2] Attaches queue discipline HTB to eth0, and set default class(1:12)
tc qdisc add dev eth0 root handle 1: htb default 12
說明
"handle 1:" => "handle 1:0" # x:y
just a name or identifier with which to refer
The handle for a qdisc must have zero for its y value.
(Default minor id of class to which unclassified packets are sent "0")
"default minor-id" # i.e. "default 12"
any traffic that is not otherwise classified will be assigned to class 1:12
Unclassified traffic gets sent to the class with this minor-id.
[3] "root" class, "1:1" under the qdisc "1:"
tc class add dev eth0 parent 1: classid 1:1 htb rate 80mbit
說明
classid major:minor
classes can be named.
* The major number must be equal to the major number of the qdisc to which it belongs.
[4] Create two classes directly under the htb qdisc
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 48mbit ceil 80mbit tc class add dev eth0 parent 1:1 classid 1:12 htb rate 32mbit
說明
ceil:
Specifies the maximum bandwidth(burst) that a class can use.(borrow)
The default ceil is the same as the rate.
[5] Create two child classes under the "1:11"
tc class add dev eth0 parent 1:11 classid 11:21 htb rate 32mbit tc class add dev eth0 parent 1:11 classid 11:22 htb rate 16mbit
說明
parent major:minor:
Place of this class within the hierarchy.
[6] Attach queuing disciplines to the leaf classes (沒有設定時, 預設是 pfifo)[非必要 Step]
tc qdisc add dev eth0 parent 1:21 handle 21: sfq perturb 10
tc qdisc add dev eth0 parent 1:22 handle 22: pfifo
[7] Which packets belong in which class
使用 tc filter 直接設定版
Backup Server IP (n.n.n.n)
# By IP - traffic to backup server (float 100k)
tc filter add dev eth0 protocol ip parent 1:0 prio 50 u32 \
match ip dst n.n.n.n flowid 1:101
Source Port (8080)
# By Port - tcp port 8080 (fix 300k)
tc filter add dev eth0 protocol ip parent 1:0 prio 49 u32 \
match ip sport 8080 0xffff flowid 1:102
# By IP & Port
tc filter add dev eth0 protocol ip parent 1:0 prio 48 u32 \
match ip src 192.168.123.10 match ip sport 1080 0xffff flowid 1:12
prio
* classes with higher priority are offered excess bandwidth first.
What class should you priorize? Generaly those classes where you really need low delays.
1 => highest priority
filter 駁 iptables 版
# 有 mark 的 package to 某 qdisc
tc filter add dev eth0 parent 1: protocol ip handle 8021 fw flowid 1:21 tc filter add dev eth0 parent 1: protocol ip handle 8022 fw flowid 1:22
* flowid = classid
# 為 source port 80 的 packet set mark
iptables -t mangle -A OUTPUT -p tcp --sport 8012 -j MARK --set-mark 8021 iptables -t mangle -A OUTPUT -p tcp --sport 8012 -j MARK --set-mark 8022
* mark 只可以用數字. iptables -t mangle 時是 0x????
Checking
(1)
tc qdisc show dev eth0
qdisc htb 1: root refcnt 2 r2q 10 default 0x12 direct_packets_stat 0 direct_qlen 1000 qdisc sfq 21: parent 1:21 limit 127p quantum 1514b depth 127 divisor 1024 perturb 10sec qdisc pfifo 22: parent 1:22 limit 1000p
(2)
tc class show dev eth0
class htb 1:22 parent 1:11 leaf 22: prio 0 rate 16Mbit ceil 16Mbit burst 1600b cburst 1600b class htb 1:11 parent 1:1 rate 48Mbit ceil 80Mbit burst 1590b cburst 1600b class htb 1:1 root rate 80Mbit ceil 80Mbit burst 1600b cburst 1600b class htb 1:12 parent 1:1 prio 0 rate 32Mbit ceil 32Mbit burst 1600b cburst 1600b class htb 1:21 parent 1:11 leaf 21: prio 0 rate 32Mbit ceil 32Mbit burst 1600b cburst 1600b
(3)
tc filter show dev eth0
filter parent 1: protocol ip pref 49151 fw chain 0 filter parent 1: protocol ip pref 49151 fw chain 0 handle 0x1f56 classid 1:22 filter parent 1: protocol ip pref 49152 fw chain 0 filter parent 1: protocol ip pref 49152 fw chain 0 handle 0x1f55 classid 1:21
(4)
iptables -v -nL -t mangle | grep MARK
0 0 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8012 MARK set 0x1f55 0 0 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp spt:8012 MARK set 0x1f56
Note
* "firewall-cmd --reload" 會清了之前用 iptables 的設定
Testing
# 在 123.10 上面測試
IP=192.168.88.128
wget $IP:8012/test.bin -O /dev/null
wget $IP:8021/test.bin -O /dev/null
wget $IP:8022/test.bin -O /dev/null
Statistics
tc -s -d qdisc show dev eth0
tc -s -d class show dev eth0
-d[etails]
-s[tatistics]
overlimits
how many times the discipline delayed a packet.
level
quantum
don't need to specify quantums manualy as HTB chooses precomputed values.
pps
tells you actual (10 sec averaged) rate going thru class.
giants (Default: 1600 bytes)
number of packets larger than mtu set in tc command.
HTB will work with these but rates will not be accurate at all.
lended
packets donated by this class
borrowed (borrows are transitive)
borrowed from parent.
Other
quantums:
In fact when more classes want to borrow bandwidth they are each given some number of bytes before serving other competing class.
This number is called quantum.
burst
burst bytes: Amount of bytes that can be burst at ceil speed
cburst bytes: Amount of bytes that can be burst at "infinite" speed
why I want bursts. Well it is cheap and simple way how to improve response times on congested link.
* The burst and cburst of a class should always be at least as high as that of any of it children.
i.e.
... burst 2k
Notes
nginx config
http { server { listen 8012 default_server; listen 8021 default_server; listen 8022 default_server; ... } ... }
Over root traffic
Leave Over Root Total
parent 1: classid 1:1 htb rate 100kbps
parent 1:1 classid 1:80 htb rate 60kbps parent 1:1 classid 1:81 htb rate 70kbps parent 1:1 classid 1:82 htb rate 80kbps
在以上 setting, class 80, 81, 82 它們都有 60kbps, 70kbps, 80kbps, 會無視了 root class 的 100kbps limit !!
ceil 的應用
parent 1: classid 1:1 htb rate 300kbps parent 1:1 classid 1:80 htb rate 60kbps ceil 100kbps parent 1:1 classid 1:81 htb rate 70kbps ceil 100kbps parent 1:1 classid 1:82 htb rate 80kbps ceil 100kbps
在以上 setting, class 80, 81, 82 都有 9x kbps
tc on container node
Packet routes Diagram
venet0:0 venet0 eth0 CT ------------->------------- HN --------->-------- Remote venet0:0 venet0 eth0 CT -------------<------------- HN ---------<-------- Remote
Limiting outgoing bandwidth
We can limit container outgoing bandwidth by setting the tc filter on eth0.
DEV=eth0 tc qdisc del dev $DEV root tc qdisc add dev $DEV root handle 1: cbq avpkt 1000 bandwidth 100mbit tc class add dev $DEV parent 1: classid 1:1 cbq rate 256kbit allot 1500 prio 5 bounded isolated tc filter add dev $DEV parent 1: protocol ip prio 16 u32 match ip src X.X.X.X flowid 1:1 tc qdisc add dev $DEV parent 1:1 sfq perturb 10
Limiting incoming bandwidth
This can be done by setting the tc filter on:
DEV=venet0 tc qdisc del dev $DEV root tc qdisc add dev $DEV root handle 1: cbq avpkt 1000 bandwidth 100mbit tc class add dev $DEV parent 1: classid 1:1 cbq rate 256kbit allot 1500 prio 5 bounded isolated tc filter add dev $DEV parent 1: protocol ip prio 16 u32 match ip dst X.X.X.X flowid 1:1 tc qdisc add dev $DEV parent 1:1 sfq perturb 10
Limiting CT to HN talks
DEV=venet0 tc filter add dev $DEV parent 1: protocol ip prio 20 u32 match u32 1 0x0000 police rate 2kbit buffer 10k drop flowid :1
Limiting packets per second rate from container
DEV=eth0 iptables -I FORWARD 1 -o $DEV -s X.X.X.X -m limit --limit 200/sec -j ACCEPT iptables -I FORWARD 2 -o $DEV -s X.X.X.X -j DROP
The Intermediate queueing device (IMQ)
Doc
Linux Advanced Routing & Traffic Control
- http://lartc.org/howto/index.html
tc man page
- http://lartc.org/manpages/tc.txt
附加檔案 | 大小 |
---|---|
test.sh | 917 位元 |
limit.sh | 2.03 KB |