最後更新: 2024-07-24
目錄
- Install
- lscgroup
- cgcreate
- cgdelete
- ---------------------------------------------
- cgget & cgset
- cgget 應用
- ---------------------------------------------
- Move a process into a cgroup (cgclassify)
- Launch processes in a cgroup (cgexec)
- cgred
- ---------------------------------------------
- 將當前 cgroup 保存成設定 - cgsnapshot
- Subsystem Settings
- OOM Test
- freezer
Install
apt-get install cgroup-tools libcgroup1
dnf install libcgroup libcgroup-tools
Program
- /usr/bin/lscgroup
- /usr/bin/lssubsys
- /usr/bin/cgcreate
- /usr/bin/cgdelete
- /usr/bin/cgexec
- /usr/bin/cgget
- /usr/bin/cgset
- /usr/bin/cgsnapshot
- /usr/sbin/cgclear
- /usr/sbin/cgrulesengd
- /usr/sbin/cgconfigparser
lscgroup
Syntax
lscgroup [[-g] <controllers>:<path>]
List all cgroups
lscgroup
blkio:/ ... memory:/ ...
它們係掛到相應的 mountpoint
e.g.
mount | grep -e blkio -e memory
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
blkio:/ => /sys/fs/cgroup/blkio
memory:/ => /sys/fs/cgroup/memory
lssubsys
# list hierarchies containing given subsystem (與用 mount check 差不多)
Opts
-m, --mount-points Display mount points.
lssubsys -m
cpuset /sys/fs/cgroup/cpuset cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct blkio /sys/fs/cgroup/blkio memory /sys/fs/cgroup/memory devices /sys/fs/cgroup/devices freezer /sys/fs/cgroup/freezer net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio perf_event /sys/fs/cgroup/perf_event hugetlb /sys/fs/cgroup/hugetlb pids /sys/fs/cgroup/pids rdma /sys/fs/cgroup/rdma
cgcreate
create new cgroup
Defines cgroup
-g <controllers>:<path> # controllers = subsystems (i.e. blkio,memory)
cgroup permissions
- -d, --dperm=mode # the permissions of a control groups directory
- -f, --fperm=mode # the permissions of the control groups parameters
tasks file permissions
- -s, --tperm=mode # the permissions of the control group tasks file
- -t <tuid>:<tgid> # user and the group, which owns tasks file
Example
cgcreate -g cpuset,memory:/copy_job
ls /sys/fs/cgroup/memory/copy_job/
ls /sys/fs/cgroup/cpuset/copy_job/
cgdelete
cgdelete <controllers>:<path> [<controllers>:<path>] ...
- -r # Recursively remove all subgroups
* When you delete a cgroup, all its tasks move to its parent group
e.g.
cgdelete cpuset,memory:/copy_job
cgget & cgset
cgset
# cgset - set the parameters of given cgroup(s)
cgset -r parameter=value path_to_cgroup
* path_to_cgroup is the path to the cgroup relative to the root of the hierarchy
("/" = root of the hierarchy)
cgget
syntax: cgget [-r <name>] [-g <controller>] [-a] <cgroup_path>
- -r, --variable <name> # defines parameter to display
- -g <contoller> # defines controllers whose values should be displayed
- -a, --all # print the variables for all controllers
Display opt
- -n # Do not print headers. e.g. name of groups
- -v, --values-only
Default output exmaple
cgget -r cpuset.cpus copy_job
copy_job: cpuset.cpus: 2-3
e.g.
# 獲得所有設定
cgget /
...
# 獲得所有有關 CPU 的設定
cgget -g cpu /
...
# 查看某 cgroup 的設定
cgget limit_httpd
Exampe: Get Memory Usage
# 查看 mybash 的 memory settings
cgget -r memory.usage_in_bytes mybash
mybash: memory.usage_in_bytes: 1900544
cgget -r memory.memsw.usage_in_bytes mybash
mybash: memory.memsw.usage_in_bytes: 1896448
cgget -r memory.failcnt mybash
/limit_httpd: memory.failcnt: 0
cgget -nv -r memory.memsw.failcnt mybash # 可以輸入成 "-nvr"
0
cgset 應用
首先建立 cgroup
cgcreate -g cpuset,memory:/copy_job
# Memory
cgset -r memory.limit_in_bytes=300m copy_job
cgget -r memory.memsw.limit_in_bytes copy_job
9223372036854771712
* memsw 係承繼了頂屠的, 很可能係最大值來 (Default), 所以必須限制它
cgset -r memory.memsw.limit_in_bytes=300m copy_job
Notes: Checking
cgget -nvr memory.limit_in_bytes copy_job
cgget -nvr memory.memsw.limit_in_bytes copy_job
# CPU
# 當 CPU 有 4 Core: core0 ... core3.
# 以下設定只用 core2 及 core3
cgset -r cpuset.cpus=2,3 copy_job
cgset -r cpuset.mems=0 copy_job
Notes: Checking
cgget -nvr cpuset.cpus copy_job # 2-3
cgget -nvr cpuset.mems copy_job # 0
使用
export "PS1=CopyShell# "
cgclassify -g cpuset,memory:/copy_job $$
Checking
echo $$; # 901002
cat /sys/fs/cgroup/{memory,cpuset}/copy_job/tasks
901002 3041840 # cat 901002 3041840 # cat
使用完後 Cleanup
cgdelete cpuset,memory:/copy_job
Move a process into a cgroup (cgclassify)
Usage
cgclassify -g subsystems:path_to_cgroup pidlist
e.g.
export "PS1=CopyShell# "
cgclassify -g blkio,memory:/mybash $$
查看 process 所在的 cgroup
cat /proc/$$/cgroup
12:memory:/mybash 11:freezer:/user/root/0 10:perf_event:/ 9:rdma:/ 8:pids:/user.slice/user-0.slice/session-1.scope 7:devices:/user.slice 6:cpuset:/ 5:hugetlb:/ 4:net_cls,net_prio:/ 3:cpu,cpuacct:/user.slice 2:blkio:/mybash 1:name=systemd:/user.slice/user-0.slice/session-1.scope 0::/user.slice/user-0.slice/session-1.scope
Notes
ps -O cgroup
Launch processes in a cgroup (cgexec)
Syntax:
cgexec -g subsystems:path_to_cgroup command arguments
Opts
--sticky # Keep any child processes in the same cgroup.
當沒有加 "--sticky" 而 cgred 在執行時
The cgred daemon does not change the task of the command
but it changes the child tasks to the right cgroup based on /etc/cgrules.conf automatically
e.g.
cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m
cgred
Process allocated to cgroups based on the settings found in /etc/cgrules.conf
cgsnapshot
Generate the configuration file for given controllers
Subsystem Settings
memory
swappiness
lower than 60 decrease the kernel's tendency to swap out process memory
memory.swappiness: 60
memory.oom_control
# When the OOM killer is disabled,
# tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.
# To disable it, write 1 to the "memory.oom_control" file (Default: 0)
cgget -r memory.oom_control mybash
mybash: memory.oom_control: oom_kill_disable 0 under_oom 0 oom_kill 0
cpu
# shares of CPU time are distributed per all CPU cores on multi-core systems.
# Default: cpu.shares: 1024
cpu.shares: 700
cpuacct
# cpuacct.usage
reports the total CPU time (in nanoseconds) consumed by all tasks in this cgroup
# reset:
echo 0 > /cgroup/cpuacct/cpuacct.usage
cpuacct.stat
- user — CPU time consumed by tasks in user mode.
- system — CPU time consumed by tasks in system (kernel) mode.
cgroup.procs
pid ...
cpuset
cpuset.cpus (mandatory)
i.e.
0-2,16
cpuset.mems (mandatory)
# memory nodes that tasks in this cgroup are permitted to access.
i.e.
0-2,16
blkio
weight
implemented in the Completely Fair Queuing (CFQ)
(cat /sys/block/sdf/queue/scheduler)
- blkio.weight # 100~1000. Default: 500
R/W throttle
- blkio.throttle.read_bps_device
- blkio.throttle.read_iops_device
- blkio.throttle.write_bps_device
- blkio.throttle.write_iops_device
i.e.
ls -l /dev/sdf
brw-rw---- 1 root disk 8, 80 Jan 20 12:28 /dev/sdf
speed=$((10*1024*1024))
cgset -r blkio.throttle.write_bps_device="8:80 $speed" mybash
Reports
reports the number of I/O operations performed on specific devices
Entries have four fields: major, minor, operation, and number.
- blkio.throttle.io_serviced # output: "major:minor operation number"
- blkio.throttle.io_service_bytes
i.e.
cgget -r blkio.throttle.io_serviced mybash
mybash: blkio.throttle.io_serviced: 8:80 Read 30 8:80 Write 0 8:80 Sync 30 8:80 Async 0 8:80 Discard 0 8:80 Total 30 8:16 Read 45839 8:16 Write 0 8:16 Sync 45839 8:16 Async 0 8:16 Discard 0 8:16 Total 45839 253:2 Read 45839 253:2 Write 0 253:2 Sync 45839 253:2 Async 0 253:2 Discard 0 253:2 Total 45839 Total 91708
這是 "pv /dev/zero > /dev/sdX" 的情況來
8:80 = sdX
8:16 = sda (swap)
253:2 = dm-2
* 8:80 沒有統計數據, 要 process end 時才統計到 (pv 時 ctrl+c)
Notes
dm-xx = Device Mapper
dmsetup ls # 亦可以用 lsblk 去查看
myvg-swap (253:1) myvg-root (253:2)
net_cls
The net_cls subsystem tags network packets with a class identifier (classid)
that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup.
net_cls.classid = ID;
計算 net_cls.classid 的值
01:20 ==> 0x010020 (DEC=1 x 65536 + 20 x 16 =65568 )
所以
net_cls.classid = 65568 ;
i.e.
net_cls { net_cls.classid = 65858; }
# 設定 tc filter (parent 1: & handle 20: 相當於 1:20)
tc filter add dev eth0 parent 1: protocol ip handle 20: cgroup
OOM Test
oom = Out of Memory
Testing
cgcreate -g memory:/test_oom
cgset -r memory.limit_in_bytes=256m test_oom
cgset -r memory.memsw.limit_in_bytes=256m test_oom
cgget -r memory.oom_control test_oom
test_oom:
memory.oom_control: oom_kill_disable 0
under_oom 0
oom_kill 0
cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m
ps aux | grep [s]tress-ng
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 22939 0.0 0.0 51848 6352 pts/2 SL+ 15:31 0:00 stress-ng -m 1 --vm-bytes 200m
root 22940 0.0 0.0 51852 472 pts/2 S+ 15:31 0:00 stress-ng-vm [run]
root 22941 100 1.2 256652 206432 pts/2 R+ 15:31 1:13 stress-ng-vm [run]
cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 300m
dmesg
... stress-ng invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=1000 ...
cgget -r memory.oom_control test_oom
test_oom:
memory.oom_control: oom_kill_disable 0
under_oom 0
oom_kill 819
under_oom # if 1, the memory cgroup is under OOM, tasks may be paused.
freezer
freezer.state
* only available in non-root cgroups
FROZEN — tasks in the cgroup are suspended.
FREEZING — the system is in the process of suspending tasks in the cgroup.
THAWED — tasks in the cgroup have resumed.
# To suspend a specific process:
Move that process to a cgroup in a hierarchy which has the freezer subsystem attached to it.
Freeze that particular cgroup to suspend the process contained in it.
Usage
# all subsystems in one go: "mount -t cgroup none /cgroups"
mount -t cgroup freezer /freezer -o freezer
# Create a child cgroup:
mkdir /freezer/0
# Put a task into this cgroup:
echo $task_pid > /freezer/0/tasks
# Freeze it:
echo FROZEN > /freezer/0/freezer.state
The freezer allows the checkpoint code to obtain a consistent
image of the tasks by attempting to force the tasks in a cgroup into a
quiescent state.
對比 kill 的 SIGSTOP and SIGCONT
Any programs designed to watch for SIGSTOP and SIGCONT could be broken by
attempting to use SIGSTOP and SIGCONT to stop and resume tasks.
We can demonstrate this problem using nested bash shells:
$ echo $$
16644
$ bash
$ echo $$
16690
# From a second, unrelated bash shell:
$ kill -SIGSTOP 16690
# 行 "jobs" 會見到它
$ kill -SIGCONT 16690
In contrast, the cgroup freezer uses the kernel freezer code to prevent the freeze/unfreeze cycle from becoming visible to the tasks
being frozen. This allows the bash example above and gdb to run as expected.
The cgroup freezer is hierarchical.
freezer.self_freezing: Read only.
freezer.parent_freezing: Read only.
Help
https://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt