cgroup-tools

最後更新: 2023-01-30

目錄


Install

 

apt-get install cgroup-tools libcgroup1

Program

  • /usr/bin/lscgroup
  • /usr/bin/cgcreate
  • /usr/bin/cgdelete
  • /usr/bin/cgexec
  • /usr/bin/cgget
  • /usr/bin/cgset

 

 


lscgroup

 

# list all cgroups

lscgroup

blkio:/
...
memory:/
...

它們係掛到相應的 mountpoint

i.e.

mount | grep -e blkio -e memory

cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)

blkio:/             => /sys/fs/cgroup/blkio

memory:/        => /sys/fs/cgroup/memory

lssubsys

# list hierarchies containing given subsystem (與用 mount check 差不多)

Opts

-m, --mount-points     Display mount points.

lssubsys -m

cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
rdma /sys/fs/cgroup/rdma

 


cgcreate

 

# create new cgroup

Defines cgroup

-g <controllers>:<path>    # controllers = subsystems (i.e. blkio,memory)

cgroup permissions

-d, --dperm=mode            # the permissions of a control groups directory

-f, --fperm=mode              # the permissions of the control groups parameters

tasks file permissions

-s, --tperm=mode       # the permissions of the control group tasks file

-t <tuid>:<tgid>        # user and the group, which owns tasks file

 


cgdelete

 

cgdelete <controllers>:<path> [<controllers>:<path>] ...

-r           # Recursively remove all subgroups

 * When you delete a cgroup, all its tasks move to its parent group.

 


Example: cgcreate, lscgroup, cgdelete

 

cgcreate -g blkio,memory:/mybash

lscgroup | grep mybash

blkio:/mybash
memory:/mybash

cgdelete blkio,memory:/mybash

 


cgget & cgset

 

cgset

# cgset - set the parameters of given cgroup(s)

cgset -r parameter=value path_to_cgroup

 * path_to_cgroup is the path to the cgroup relative to the root of the hierarchy
    ("/" = root of the hierarchy)

cgget

syntax: cgget [-r <name>] [-g <controller>] [-a] <cgroup_path>

  • -r, --variable <name>   # defines parameter to display
  • -g <contoller>             # defines  controllers  whose  values should be displayed
  • -a, --all                       # print the variables for all controllers

Display opt

  • -n                              # do not print headers, i.e. name of groups.
  • -v, --values-only

i.e.

cgget /

...

cgget -g cpu /

...

cgget -r memory.failcnt limit_httpd

/limit_httpd:
memory.failcnt: 0

cgget -nvr memory.memsw.failcnt limit_httpd

0

cgget limit_httpd | grep usage | grep -v max

memory.memsw.usage_in_bytes: 0
memory.usage_in_bytes: 0

cgget limit_httpd | grep failcnt

memory.memsw.failcnt: 0
memory.failcnt: 0

Example

cgset -r memory.limit_in_bytes=256m mybash

cgget -r memory.limit_in_bytes mybash

cgget -r memory.memsw.limit_in_bytes mybash

 * memsw 係承繼了頂屠的, 很可能係最大值來 (Default), 所以必須限制它

cgset -r memory.memsw.limit_in_bytes=256m mybash

Get Usage

# Memory Usage

cgget -r memory.usage_in_bytes mybash

mybash:
memory.usage_in_bytes: 1900544

cgget -r memory.memsw.usage_in_bytes mybash

mybash:
memory.memsw.usage_in_bytes: 1896448

 


Move a process into a cgroup (cgclassify)

 

#

cgclassify -g subsystems:path_to_cgroup pidlist

i.e.

export "PS1=CopyShell# "

cgclassify -g blkio,memory:/mybash $$

查看 process 所在的 cgroup

cat /proc/$$/cgroup

12:memory:/mybash
11:freezer:/user/root/0
10:perf_event:/
9:rdma:/
8:pids:/user.slice/user-0.slice/session-1.scope
7:devices:/user.slice
6:cpuset:/
5:hugetlb:/
4:net_cls,net_prio:/
3:cpu,cpuacct:/user.slice
2:blkio:/mybash
1:name=systemd:/user.slice/user-0.slice/session-1.scope
0::/user.slice/user-0.slice/session-1.scope

Notes

ps -O cgroup

 


Launch processes in a cgroup (cgexec)

 

Syntax:

cgexec -g subsystems:path_to_cgroup command arguments

Opts

--sticky

Keep any child processes in the same cgroup.

當沒有加 "--sticky" 而 cgred 在執行時

The cgred daemon does not change the task of the command

but it changes the child tasks to the right cgroup based on /etc/cgrules.conf automaticall

i.e.

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m

 

 


cgsnapshot

 

generate the configuration file for given controllers

 


Subsystem Settings

 

memory

swappiness

lower than 60 decrease the kernel's tendency to swap out process memory

memory.swappiness: 60

memory.oom_control

# When the OOM killer is disabled,

# tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.

# To disable it, write 1 to the "memory.oom_control" file (Default: 0)

cgget -r memory.oom_control mybash

mybash:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 0

cpu

# shares of CPU time are distributed per all CPU cores on multi-core systems.
# Default: cpu.shares: 1024

cpu.shares: 700

cpuacct

# cpuacct.usage

reports the total CPU time (in nanoseconds) consumed by all tasks in this cgroup

# reset:

     echo 0 > /cgroup/cpuacct/cpuacct.usage

cpuacct.stat

  • user — CPU time consumed by tasks in user mode.
  • system — CPU time consumed by tasks in system (kernel) mode.

cgroup.procs

pid
...

cpuset

cpuset.cpus (mandatory)

i.e.

0-2,16

cpuset.mems (mandatory)

# memory nodes that tasks in this cgroup are permitted to access.

i.e.

0-2,16

blkio

weight

implemented in the Completely Fair Queuing (CFQ)

(cat /sys/block/sdf/queue/scheduler)

  • blkio.weight           # 100~1000. Default: 500

R/W throttle

  • blkio.throttle.read_bps_device
  • blkio.throttle.read_iops_device
  • blkio.throttle.write_bps_device
  • blkio.throttle.write_iops_device

i.e.

ls -l /dev/sdf

brw-rw---- 1 root disk 8, 80 Jan 20 12:28 /dev/sdf

speed=$((10*1024*1024))

cgset -r blkio.throttle.write_bps_device="8:80 $speed" mybash

Reports

reports the number of I/O operations performed on specific devices
Entries have four fields: major, minor, operation, and number.

  • blkio.throttle.io_serviced            # output: "major:minor operation number"
  • blkio.throttle.io_service_bytes

i.e.

cgget -r blkio.throttle.io_serviced mybash

mybash:
blkio.throttle.io_serviced: 8:80 Read 30
        8:80 Write 0
        8:80 Sync 30
        8:80 Async 0
        8:80 Discard 0
        8:80 Total 30
        8:16 Read 45839
        8:16 Write 0
        8:16 Sync 45839
        8:16 Async 0
        8:16 Discard 0
        8:16 Total 45839
        253:2 Read 45839
        253:2 Write 0
        253:2 Sync 45839
        253:2 Async 0
        253:2 Discard 0
        253:2 Total 45839
        Total 91708

這是 "pv /dev/zero > /dev/sdX" 的情況來

8:80 = sdX

8:16 = sda (swap)

253:2 = dm-2

* 8:80 沒有統計數據, 要 process end 時才統計到 (pv 時 ctrl+c)

Notes

dm-xx = Device Mapper

dmsetup ls     # 亦可以用 lsblk 去查看

myvg-swap       (253:1)
myvg-root       (253:2)

net_cls

The net_cls subsystem tags network packets with a class identifier (classid)
that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup.

net_cls.classid = ID;

計算 net_cls.classid  的值

01:20 ==> 0x010020 (DEC=1 x 65536 + 20 x 16 =65568 )

所以

net_cls.classid = 65568 ;

i.e.

net_cls {
    net_cls.classid = 65858;
}

# 設定 tc filter (parent 1: & handle 20: 相當於 1:20)

tc filter add dev eth0 parent 1: protocol ip handle 20: cgroup

 


OOM Test

 

oom = Out of Memory

Testing

cgcreate -g memory:/test_oom

cgset -r memory.limit_in_bytes=256m test_oom

cgset -r memory.memsw.limit_in_bytes=256m test_oom

cgget -r memory.oom_control test_oom

test_oom:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 0

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 200m

ps aux | grep [s]tress-ng

USER         PID %CPU %MEM    VSZ    RSS TTY      STAT START   TIME COMMAND
root       22939  0.0  0.0  51848   6352 pts/2    SL+  15:31   0:00 stress-ng -m 1 --vm-bytes 200m
root       22940  0.0  0.0  51852    472 pts/2    S+   15:31   0:00 stress-ng-vm [run]
root       22941  100  1.2 256652 206432 pts/2   R+   15:31   1:13 stress-ng-vm [run]

cgexec -g memory:/test_oom stress-ng -m 1 --vm-bytes 300m

dmesg

... stress-ng invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=1000
...

cgget -r memory.oom_control test_oom

test_oom:
memory.oom_control: oom_kill_disable 0
        under_oom 0
        oom_kill 819

under_oom    # if 1, the memory cgroup is under OOM, tasks may be paused.

 


cgred

 

Process allocated to cgroups based on the settings found in /etc/cgrules.conf

 


freezer

 

freezer.state

* only available in non-root cgroups

    FROZEN — tasks in the cgroup are suspended.
    FREEZING — the system is in the process of suspending tasks in the cgroup.
    THAWED — tasks in the cgroup have resumed.

# To suspend a specific process:

    Move that process to a cgroup in a hierarchy which has the freezer subsystem attached to it.
    Freeze that particular cgroup to suspend the process contained in it.

Usage

# all subsystems in one go: "mount -t cgroup none /cgroups"
mount -t cgroup freezer  /freezer -o freezer

# Create a child cgroup:
mkdir /freezer/0

# Put a task into this cgroup:
echo $task_pid > /freezer/0/tasks

# Freeze it:
echo FROZEN > /freezer/0/freezer.state

The freezer allows the checkpoint code to obtain a consistent
image of the tasks by attempting to force the tasks in a cgroup into a
quiescent state.

對比 kill 的 SIGSTOP and SIGCONT

Any programs designed to watch for SIGSTOP and SIGCONT could be broken by
attempting to use SIGSTOP and SIGCONT to stop and resume tasks.

We can demonstrate this problem using nested bash shells:

    $ echo $$
    16644

    $ bash

    $ echo $$
    16690

    # From a second, unrelated bash shell:

    $ kill -SIGSTOP 16690

    # 行 "jobs" 會見到它

    $ kill -SIGCONT 16690

In contrast, the cgroup freezer uses the kernel freezer code to prevent the freeze/unfreeze cycle from becoming visible to the tasks
being frozen. This allows the bash example above and gdb to run as expected.

The cgroup freezer is hierarchical.

freezer.self_freezing: Read only.

freezer.parent_freezing: Read only.

Help

https://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt