cgroup-tools (cgget & cgset)

最後更新: 2024-07-24

目錄

 


Install

 

apt-get install cgroup-tools libcgroup1

dnf install libcgroup libcgroup-tools

Program

  • /usr/bin/lscgroup
  • /usr/bin/lssubsys
  • /usr/bin/cgcreate
  • /usr/bin/cgdelete
  • /usr/bin/cgexec
  • /usr/bin/cgget
  • /usr/bin/cgset
  • /usr/bin/cgsnapshot
  • /usr/sbin/cgclear
  • /usr/sbin/cgrulesengd
  • /usr/sbin/cgconfigparser

 


lscgroup

 

Syntax

lscgroup [[-g] <controllers>:<path>]

List all cgroups

lscgroup

blkio:/
...
memory:/
...

它們係掛到相應的 mountpoint

e.g.

mount | grep -e blkio -e memory

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

blkio:/             => /sys/fs/cgroup/blkio

memory:/        => /sys/fs/cgroup/memory

lssubsys

# list hierarchies containing given subsystem (與用 mount check 差不多)

Opts

-m, --mount-points     Display mount points.

lssubsys -m

cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
rdma /sys/fs/cgroup/rdma

 


cgcreate

 

create new cgroup

Defines cgroup

-g <controllers>:<path>    # controllers = subsystems (i.e. blkio,memory)

cgroup permissions

  • -d, --dperm=mode            # the permissions of a control groups directory
  • -f, --fperm=mode              # the permissions of the control groups parameters

tasks file permissions

  • -s, --tperm=mode       # the permissions of the control group tasks file
  • -t <tuid>:<tgid>        # user and the group, which owns tasks file

Example

cgcreate -g cpuset,memory:/copy_job

ls /sys/fs/cgroup/memory/copy_job/

ls /sys/fs/cgroup/cpuset/copy_job/

 


cgdelete

 

cgdelete <controllers>:<path> [<controllers>:<path>] ...

  • -r           # Recursively remove all subgroups

 * When you delete a cgroup, all its tasks move to its parent group

e.g.

cgdelete cpuset,memory:/copy_job

 


cgget & cgset

 

cgset

# cgset - set the parameters of given cgroup(s)

cgset -r parameter=value path_to_cgroup

 * path_to_cgroup is the path to the cgroup relative to the root of the hierarchy
    ("/" = root of the hierarchy)

cgget

syntax: cgget [-r <name>] [-g <controller>] [-a] <cgroup_path>

  • -r, --variable <name>   # defines parameter to display
  • -g <contoller>             # defines  controllers  whose  values should be displayed
  • -a, --all                       # print the variables for all controllers

Display opt

  • -n                              # Do not print headers. e.g. name of groups
  • -v, --values-only

Default output exmaple

cgget -r cpuset.cpus copy_job

copy_job:
cpuset.cpus: 2-3

e.g.

# 獲得所有設定

cgget /

...

# 獲得所有有關 CPU 的設定

cgget -g cpu /

...

# 查看某 cgroup 的設定

cgget limit_httpd

Exampe: Get Memory Usage

# 查看 mybash 的 memory settings

cgget -r memory.usage_in_bytes mybash

mybash:
memory.usage_in_bytes: 1900544

cgget -r memory.memsw.usage_in_bytes mybash

mybash:
memory.memsw.usage_in_bytes: 1896448

cgget -r memory.failcnt mybash

/limit_httpd:
memory.failcnt: 0

cgget -nv -r memory.memsw.failcnt mybash    # 可以輸入成 "-nvr"

0

 


cgset 應用

 

首先建立 cgroup

cgcreate -g cpuset,memory:/copy_job

# Memory

cgset -r memory.limit_in_bytes=300m copy_job

cgget -r memory.memsw.limit_in_bytes copy_job

9223372036854771712

 * memsw 係承繼了頂屠的, 很可能係最大值來 (Default), 所以必須限制它

cgset -r memory.memsw.limit_in_bytes=300m copy_job

Notes: Checking

cgget -nvr memory.limit_in_bytes copy_job

cgget -nvr memory.memsw.limit_in_bytes copy_job

# CPU

# 當 CPU 有 4 Core: core0 ... core3.

# 以下設定只用 core2 及 core3

cgset -r cpuset.cpus=2,3 copy_job

cgset -r cpuset.mems=0 copy_job

Notes: Checking

cgget -nvr cpuset.cpus copy_job           # 2-3

cgget -nvr cpuset.mems copy_job         # 0

使用

export "PS1=CopyShell# "

cgclassify -g cpuset,memory:/copy_job $$

Checking

echo $$;                               # 901002

cat /sys/fs/cgroup/{memory,cpuset}/copy_job/tasks

901002
3041840   # cat
901002
3041840   # cat

使用完後 Cleanup

cgdelete cpuset,memory:/copy_job

 


Move a process into a cgroup (cgclassify)

 

Usage

cgclassify -g subsystems:path_to_cgroup pidlist

e.g.

export "PS1=CopyShell# "

cgclassify -g blkio,memory:/mybash $$

查看 process 所在的 cgroup

cat /proc/$$/cgroup

12:memory:/mybash
11:freezer:/user/root/0
10:perf_event:/
9:rdma:/
8:pids:/user.slice/user-0.slice/session-1.scope
7:devices:/user.slice
6:cpuset:/
5:hugetlb:/
4:net_cls,net_prio:/
3:cpu,cpuacct:/user.slice
2:blkio:/mybash
1:name=systemd:/user.slice/user-0.slice/session-1.scope
0::/user.slice/user-0.slice/session-1.scope

Notes

ps -O cgroup

 


Launch processes in a cgroup (cgexec)

 

Syntax:

cgexec -g subsystems:path_to_cgroup command arguments

Opts

--sticky           # Keep any child processes in the same cgroup.

當沒有加 "--sticky" 而 cgred 在執行時

The cgred daemon does not change the task of the command
but it changes the child tasks to the right cgroup based on /etc/cgrules.conf automatically

e.g.

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m

 


cgred

 

Process allocated to cgroups based on the settings found in /etc/cgrules.conf

 


cgsnapshot

 

Generate the configuration file for given controllers

 

 


Subsystem Settings

 

memory

swappiness

lower than 60 decrease the kernel's tendency to swap out process memory

memory.swappiness: 60

memory.oom_control

# When the OOM killer is disabled,

# tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.

# To disable it, write 1 to the "memory.oom_control" file (Default: 0)

cgget -r memory.oom_control mybash

mybash:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 0

cpu

# shares of CPU time are distributed per all CPU cores on multi-core systems.
# Default: cpu.shares: 1024

cpu.shares: 700

cpuacct

# cpuacct.usage

reports the total CPU time (in nanoseconds) consumed by all tasks in this cgroup

# reset:

     echo 0 > /cgroup/cpuacct/cpuacct.usage

cpuacct.stat

  • user — CPU time consumed by tasks in user mode.
  • system — CPU time consumed by tasks in system (kernel) mode.

cgroup.procs

pid
...

cpuset

cpuset.cpus (mandatory)

i.e.

0-2,16

cpuset.mems (mandatory)

# memory nodes that tasks in this cgroup are permitted to access.

i.e.

0-2,16

blkio

weight

implemented in the Completely Fair Queuing (CFQ)

(cat /sys/block/sdf/queue/scheduler)

  • blkio.weight           # 100~1000. Default: 500

R/W throttle

  • blkio.throttle.read_bps_device
  • blkio.throttle.read_iops_device
  • blkio.throttle.write_bps_device
  • blkio.throttle.write_iops_device

i.e.

ls -l /dev/sdf

brw-rw---- 1 root disk 8, 80 Jan 20 12:28 /dev/sdf

speed=$((10*1024*1024))

cgset -r blkio.throttle.write_bps_device="8:80 $speed" mybash

Reports

reports the number of I/O operations performed on specific devices
Entries have four fields: major, minor, operation, and number.

  • blkio.throttle.io_serviced            # output: "major:minor operation number"
  • blkio.throttle.io_service_bytes

i.e.

cgget -r blkio.throttle.io_serviced mybash

mybash:
blkio.throttle.io_serviced: 8:80 Read 30
        8:80 Write 0
        8:80 Sync 30
        8:80 Async 0
        8:80 Discard 0
        8:80 Total 30
        8:16 Read 45839
        8:16 Write 0
        8:16 Sync 45839
        8:16 Async 0
        8:16 Discard 0
        8:16 Total 45839
        253:2 Read 45839
        253:2 Write 0
        253:2 Sync 45839
        253:2 Async 0
        253:2 Discard 0
        253:2 Total 45839
        Total 91708

這是 "pv /dev/zero > /dev/sdX" 的情況來

8:80 = sdX

8:16 = sda (swap)

253:2 = dm-2

* 8:80 沒有統計數據, 要 process end 時才統計到 (pv 時 ctrl+c)

Notes

dm-xx = Device Mapper

dmsetup ls     # 亦可以用 lsblk 去查看

myvg-swap       (253:1)
myvg-root       (253:2)

net_cls

The net_cls subsystem tags network packets with a class identifier (classid)
that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup.

net_cls.classid = ID;

計算 net_cls.classid  的值

01:20 ==> 0x010020 (DEC=1 x 65536 + 20 x 16 =65568 )

所以

net_cls.classid = 65568 ;

i.e.

net_cls {
    net_cls.classid = 65858;
}

# 設定 tc filter (parent 1: & handle 20: 相當於 1:20)

tc filter add dev eth0 parent 1: protocol ip handle 20: cgroup

 


OOM Test

 

oom = Out of Memory

Testing

cgcreate -g memory:/test_oom

cgset -r memory.limit_in_bytes=256m test_oom

cgset -r memory.memsw.limit_in_bytes=256m test_oom

cgget -r memory.oom_control test_oom

test_oom:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 0

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m

ps aux | grep [s]tress-ng

USER         PID %CPU %MEM    VSZ    RSS TTY      STAT START   TIME COMMAND
root       22939  0.0  0.0  51848   6352 pts/2    SL+  15:31   0:00 stress-ng -m 1 --vm-bytes 200m
root       22940  0.0  0.0  51852    472 pts/2    S+   15:31   0:00 stress-ng-vm [run]
root       22941  100  1.2 256652 206432 pts/2   R+   15:31   1:13 stress-ng-vm [run]

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 300m

dmesg

... stress-ng invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=1000
...

cgget -r memory.oom_control test_oom

test_oom:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 819

under_oom    # if 1, the memory cgroup is under OOM, tasks may be paused.

 


freezer

 

freezer.state

* only available in non-root cgroups

    FROZEN — tasks in the cgroup are suspended.
    FREEZING — the system is in the process of suspending tasks in the cgroup.
    THAWED — tasks in the cgroup have resumed.

# To suspend a specific process:

    Move that process to a cgroup in a hierarchy which has the freezer subsystem attached to it.
    Freeze that particular cgroup to suspend the process contained in it.

Usage

# all subsystems in one go: "mount -t cgroup none /cgroups"
mount -t cgroup freezer  /freezer -o freezer

# Create a child cgroup:
mkdir /freezer/0

# Put a task into this cgroup:
echo $task_pid > /freezer/0/tasks

# Freeze it:
echo FROZEN > /freezer/0/freezer.state

The freezer allows the checkpoint code to obtain a consistent
image of the tasks by attempting to force the tasks in a cgroup into a
quiescent state.

對比 kill 的 SIGSTOP and SIGCONT

Any programs designed to watch for SIGSTOP and SIGCONT could be broken by
attempting to use SIGSTOP and SIGCONT to stop and resume tasks.

We can demonstrate this problem using nested bash shells:

    $ echo $$
    16644

    $ bash

    $ echo $$
    16690

    # From a second, unrelated bash shell:

    $ kill -SIGSTOP 16690

    # 行 "jobs" 會見到它

    $ kill -SIGCONT 16690

In contrast, the cgroup freezer uses the kernel freezer code to prevent the freeze/unfreeze cycle from becoming visible to the tasks
being frozen. This allows the bash example above and gdb to run as expected.

The cgroup freezer is hierarchical.

freezer.self_freezing: Read only.

freezer.parent_freezing: Read only.

Help

https://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt

 

 

Creative Commons license icon Creative Commons license icon