lxc - resource

最後更新: 2020-04-17

目錄

 


CPU

 

cpuset

設定 CPU 的 placement (CPU affinity)

e.g.

0-2,7

cpu.shares

  • The weight of each group living in the same hierarchy(same level).    # Default: 1024
  • CPU Shares are relative. (當其中一隻 CT 閒置時, 另一隻 CT 就可以 CPU 100%)

cfs    # only affects non-RT tasks

  • cpu.cfs_period_us                             # Default: 100000 (100ms)

The each scheduler period. Larger periods will improve throughput at the expense of latency

  • cpu.cfs_quota_us                              # Default: -1

during each cfs_period_us in for the current group will be allowed to run.

This represents aggregate time over all CPUs in the system:

allow full usage of two CPUs => this value to twice the value of cfs_period_us

rt

  • cpu.rt_runtime_us  4000000             # Default: 0
  • cpu.rt_period_us 5000000                # Default: 1000000

Statistics

# CPU 的總使用時間. All tasks in this group (nanoseconds)

cpuacct.usage

reset counter

echo 0 > /cgroups/cpuacct/cpuacct.usage

# 在每一個 core 所用的時間

cpuacct.usage_percpu

# aggregate user and system time consumed by tasks in this group.

cpuacct.stat

user 589182                    <-- USER_HZ
system 41986

CPU CGroup 應用:

firefox 及 movie_player 得到不用的 CPU 使用量

# mount -t tmpfs cgroup_root /sys/fs/cgroup
# mkdir /sys/fs/cgroup/cpu
# mount -t cgroup -o cpu none /sys/fs/cgroup/cpu
# cd /sys/fs/cgroup/cpu

# mkdir multimedia    # create "multimedia" group of tasks
# mkdir browser        # create "browser" group of tasks

# #Configure the multimedia group to receive twice the CPU bandwidth
# #that of browser group

# echo 2048 > multimedia/cpu.shares
# echo 1024 > browser/cpu.shares

# firefox &    # Launch firefox and move it to "browser" group
# echo <firefox_pid> > browser/tasks

# #Launch gmplayer (or your favourite movie player)
# echo <movie_player_pid> > multimedia/tasks

 


memory

 

詳見: Documentation/cgroups/memory.txt

  • memory.limit_in_bytes
  • memory.memsw.limit_in_bytes

單位有 k, m, g, 不論單位的大小寫, 它們一律是 byte 來, -1 即取消已設定的限制

RAM

memory.limit_in_bytes

Set limit of memory(RAM)

memory.soft_limit_in_bytes

upper limit for user memory including the file cache.

當系統不夠 RAM 時, 它會把 cgroup 盡量推回 soft_limit 去

查看時 (cat memory.soft_limit_in_bytes) 的 Unit 係 bytes

soft_limit_in_bytes vs limit_in_bytes

Set/Show soft limit of memory usage

當整個 OS 不夠 RAM 時才會觸發此限制

limit_in_bytes > soft_limit_in_bytes

Swap

memory.memsw.limit_in_bytes

Set limit of memory(RAM) + Swap

 * 必須設定 memory.limit_in_bytes 後才可以設定 memory.memsw.limit_in_bytes

memory.swappiness   # Default: 60

Checking

memory.usage_in_bytes 

show current usage for memory

memory.memsw.usage_in_bytes

show current usage for memory+Swap

=> 連 swap 都不夠用

memory.failcnt

show the number of memory usage hits limits

=> 用了 swap 幾多次

memory.memsw.failcnt 

show the number of memory+Swap hits limits

memory.stat

show various statistics

e.g.

cache 0
rss 0
rss_huge 0
shmem 0
mapped_file 0
dirty 0
writeback 0
swap 0
...
total_*

OOM

cat memory.oom_control

oom_kill_disable 0
under_oom 0

LXC 的 Location

/sys/fs/cgroup/memory/lxc/CT_NAME

 


Network

 

net_prio.prioidx         // kernel uses as an internal representation of this cgroup.
net_prio.ifpriomap

net_cls.classid           // 讀出來時是 10 進的, 它的配合 tc 一起用
                                  0xAAAABBBB (0x10001=1:1)
 

 


Block Device IO(blkio)

 

它可以 control

  • Proportional weight division (Implemented in CFQ)
  • I/O throttling

cat /sys/block/sda/queue/scheduler

echo cfq > /sys/block/sda/queue/scheduler

Check Kernel Support

  • grep CONFIG_BLK_CGROUP /boot/config-*                       # Block IO controller
  • grep CONFIG_BLK_DEV_THROTTLING /boot/config-*         # throttling in block layer
  • grep CONFIG_CFQ_GROUP_IOSCHED /boot/config-*         # group scheduling in CFQ
  • grep CONFIG_IOSCHED_CFQ /boot/config-*

Setting

weight

blkio.weight            #100 ~ 1000

relative proportion (weight) of block I/O access available

This value is overridden for specific devices by the blkio.weight_device

throttle

blkio.throttle.read_bps_device                # bytes/second

Format: major:minor bytes_per_second

blkio.throttle.read_iops_device

blkio.throttle.write_bps_device

blkio.throttle.write_iops_device

Report

blkio.reset_stats

resets the statistics recorded in the other pseudofiles.

blkio.time

reports the time that a cgroup had I/O access to specific devices.

major, minor, and time    # time in milliseconds (ms)

blkio.sectors

reports the number of sectors transferred to or from specific devices by a cgroup.

major, minor, and sectors

Throttle Report

blkio.throttle.io_serviced

Number of IOs completed to/from the disk by the group(read or write, sync or async)

Major:Minor Operation number_of_IO

blkio.throttle.io_service_bytes

Number of bytes transferred to/from the disk by the group.

CFQ Report

blkio.io_serviced

reports the number of I/O operations performed on specific devices by a cgroup as seen by the CFQ scheduler.

Format: major, minor, operation, and number

operation: read, write, sync, or async

The other hand, blkio.throttle.io_serviced counts number of IO in terms of number of bios as seen by throttling policy.

blkio.io_service_bytes

blkio.io_service_time

Test 1

準備

mkdir /cgroup

mount -t tmpfs cgroup_root /cgroup

mkdir /cgroup/blkio

mount -t cgroup -o blkio none /cgroup/blkio

mkdir /cgroup/blkio/test1

mkdir /cgroup/blkio/test2

# Create two same size files on same disk

cd /root

dd if=/dev/zero of=zerofile1 bs=1M count=4096

cp zerofile1 zerofile2

設定限制

echo 1000 > /cgroup/blkio/test1/blkio.weight

echo 500 > /cgroup/blkio/test2/blkio.weight

Start Test

sync; echo 3 > /proc/sys/vm/drop_caches

cgexec -g blkio:test1 time dd if=zerofile1 of=/dev/null

cgexec -g blkio:test2 time dd if=zerofile2 of=/dev/null

Or

dd if=zerofile1 of=/dev/null &
P1=$!; echo  $P1 > /cgroup/blkio/test1/tasks

dd if=zerofile2 of=/dev/null &
P2=$!; echo  $P2 > /cgroup/blkio/test2/tasks

Checking

iotop -qqq -p $P1 -p $P2

Remark: Cleanup

rmdir /cgroup/blkio/test1 /cgroup/blkio/test2

umount /cgroup/blkio  /cgroup

Test 2

準備

ls -l /dev/sda # brw-rw---- 1 root disk 8, 0 Apr  3 17:26 /dev/sda

echo "10 * 1024 * 1024" | bc                       # 10485760

echo "8:0 10485760" > /cgroup/blkio/test1/blkio.throttle.read_bps_device

echo "8:0 10485760" > /cgroup/blkio/test1/blkio.throttle.write_bps_device

Start Test

# "oflag=direct" <= Currently only sync IO queues are support.

# All the buffered writes are still system wide and not per group. Throttle 不支援 Buffer IO

dd if=/dev/zero of=zerofile1 bs=1M count=4096  oflag=direct &

echo  $! > /cgroup/blkio/test1/tasks

iotop -p $!

# Test

sync; echo 3 > /proc/sys/vm/drop_caches

Block Device IO

 

blkio.weight           // default: 500

cat blkio.time         // ms

8:0 29478              // 8,   0 = sda

blkio.weight_device

echo 8:0 1000 > blkio.weight_device

info

blkio.sectors         // number of sectors transferred to or from
blkio.time             // ms
blkio.io_serviced

8:0 Read 33258
8:0 Write 0
8:0 Sync 33258
8:0 Async 0
8:0 Total 33258
Total 33258

blkio.io_service_bytes
8:0 Read 857239552
8:0 Write 0
8:0 Sync 857239552
8:0 Async 0
8:0 Total 857239552
Total 857239552

blkio.io_wait_time // spent waiting for service in the scheduler queues

blkio.io_queued
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0

blkio.throttle.read_bps_device
blkio.throttle.write_bps_device
blkio.throttle.read_iops_device
blkio.throttle.write_iops_device

echo "8:0 10" > blkio.throttle.write_iops_device
echo "8:0 10485760" > blkio.throttle.write_bps_device      // 10 mb

blkio.reset_stats    // reset couter
echo 1> blkio.reset_stats

 

 


Block Device IO CFQ cgroup settings

 

Setting

slice_idle = 0
group_idle = 1
quantum = 16

group_isolation

When group isolation is disabled, fairness can be expected only for a sequential workload.

By default, group isolation is enabled and fairness can be expected for random I/O workloads as well.

echo 1 > /sys/block/<disk_device>/queue/iosched/group_isolation

If group_isolation=0, then CFQ automatically moves all the random seeky queues in the root group.

That means there will be no service differentiation for that kind of workload.

This leads to better throughput as we do collective idling on root sync-noidle tree.

slice_idle

This specifies how long CFQ should idle for next request on certain cfq queues (for sequential workloads)

and service trees (for random workloads) before queue is expired and CFQ selects next queue to dispatch from.

By default slice_idle is a non-zero value. That means by default we idle on queues/service trees.

This can be very helpful on highly seeky media like single spindle SATA/SAS disks

where we can cut down on overall number of seeks and see improved throughput.

"0" => CFQ will not idle between cfq queues of a cfq group => able to driver higher queue depth => achieve better throughput.

group_idle

When set, CFQ will idle on the last process issuing I/O in a cgroup.

This should be set to 1 when using proportional weight I/O cgroups and setting slice_idle to 0

By default group_idle is same as slice_idle and does not do anything if slice_idle is enabled.

quantum

The quantum controls the number of I/Os that CFQ will send to the storage at a time,

essentially limiting the device queue depth. By default, this is set to 8.

 

 

 

Creative Commons license icon Creative Commons license icon