free & (buffer v.s cache) & drop_caches & swappiness & fincore

由 datahunter 在日, 12/02/2012 - 14:38 發表

最後更新: 2019-10-23

Ram 的使用情況

free -m

             total       used       free     shared    buffers     cached
Mem:          3133       2705        428          0         84       2269
-/+ buffers/cache:        352       2781 # <= 減了 cache 及 buffer 後的真 used 及 free
Swap:         1906          1       1905

真正的 free 及 usage:

total used = real_usage + buffer + cached = 352 + 84 + 2269 = 2705

total free = total - real_usage = 3133 - 352 = 2781

總結:

第 2 行才真的有用.

PAGE_SIZE

# A page is a fixed length block of main memory
# Kernel swap and allocates memory using pages

getconf PAGE_SIZE

buffer v.s cache

Buffer : temporarily hold data ( active I/O operations ) (inode, fs metadata)

Block Device 的讀寫緩衝區

Bache: frequently accessed data (result of completed I/O operations) (與 writethrough 及writeback 有關)

Filesystem 的 cache

Write-through / back

Write-through: write is done synchronously both to the cache and to the backing store.

Write-back: writing is done only to the cache.

The write to the backing store is postponed until the modified content is about to be replaced by another cache block.

記憶體裡面 cache 有一個 bit (dirty) 用來指示這筆資料已經被 CPU 修改過但是尚未回寫到儲存裝置中.

Diagram

CPU <-> Cache
        Buffer <-> Device

drop_caches

# To free pagecache:

echo 1 > /proc/sys/vm/drop_caches

# To free dentry and inodes (slab cache memory):

echo 2 > /proc/sys/vm/drop_caches

* dirty objects cannot be freed, running sync before

Remark

- An inode in your context is a data structure that represents a file

- A dentries is a data structure that represents a directory

# To free pagecache, dentries and inodes:

echo 3 > /proc/sys/vm/drop_caches

Kernel 的 swap 偏好 (swappiness)

Linux uses a Split Least Recently Used (LRU) page replacement strategy.

查看

cat /proc/sys/vm/swappiness

60    <--- default

value:

0: The kernel will swap only to avoid an out of memory condition.

100: The kernel will swap aggressively. (prefer to find inactive pages and swap them out)

更改設定

sysctl -w vm.swappiness=30

/etc/sysctl.conf

vm.swappiness=60

vm.min_free_kbytes

設定

# number of free pages the system maintains, 當小於 N 時, kswapd 就會開始工作

vm.min_free_kbytes = 102400

min_free_kbytes

allows this memory to be instantly available and reduces the memory pressure when new processes need to start,

run and finish while there is a high memory load and a full buffer cache.

This controls the amount of memory that is kept free for use by special reserves including "atomic" allocations

設定 too low

prevents the system from reclaiming memory. (導致出事時會 OOM-killing multiple processes)

設定 too high (5-10%)

results in the system spending too much time reclaiming memory.

(原因: Linux is designed to use all available RAM to cache file system data.

pdflush (Writeout of dirty data)

dirty_background_ratio

cat /proc/sys/vm/dirty_background_ratio

# Unit: %
# Writeout of dirty data begins in the background

dirty_ratio

# absolute maximum amount of system memory that can be filled with dirty pages

cat /proc/sys/vm/dirty_background_ratio

dirty_ratio vs dirty_bytes

dirty_bytes Contains the amount of dirty memory at which a process generating disk writes will itself start writeback.

dirty_bytes is the counterpart of dirty_ratio. Only one of them may be specified at a time.

When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and

the other appears as 0 when read.

page-cluster

logarithmic value: 0 => "1 page"(disables swap readahead), 1 => "1 page", 2=> "4 page"

number of pages up to which consecutive pages are read in from swap in a single attempt.

Default: 3

vfs_cache_pressure

percentage value

controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects

當 n < 100 => prefer to retain dentry and inode caches

Default: 100

value

0     never reclaim (easily lead to out-of-memory conditions)
100   reclaim at "fair" rate
>100  prefer to reclaim (may have negative performance impact)

overcommit_memory (vm.overcommit_memory)

Default value is 0

# 0: 足夠才分配(OVERCOMMIT_GUESS)[heuristic]
# 1: 不管當前的內存狀態如何 (OVERCOMMIT_ALWAYS)
# 2: OVERCOMMIT_NEVER (小於 /proc/sys/vm/overcommit_ratio x Total_RAM + swap)

/proc/meminfo

Buffers

The amount, in kibibytes, of temporary storage for raw disk blocks.

SwapCached

The amount of memory, in kibibytes, that has once been moved into swap,

then back into the main memory, but still also remains in the swapfile.

This saves I/O, because the memory does not need to be moved into swap again.

Active

that has been used more recently and is usually not reclaimed unless absolutely necessary.

Active(anon)

The amount of anonymous and tmpfs/shmem memory, in kibibytes, that is in active use,

or was in active use since the last time the system moved something to swap.

Active(file)

The amount of file cache memory, in kibibytes, that is in active use,

or was in active use since the last time the system reclaimed memory.

Unevictable

The amount of memory, in kibibytes, discovered by the pageout code,
that is not evictable because it is locked into memory by user programs.

Mlocked

The total amount of memory, in kibibytes,
that is not evictable because it is locked into memory by user programs.

Dirty

The total amount of memory, in kibibytes, waiting to be written back to the disk.

Writeback

The total amount of memory, in kibibytes, actively being written back to the disk.

Shmem

The total amount of memory, in kibibytes, used by shared memory (shmem) and tmpfs.

/proc/meminfo 的 Shmem = free 的 shared

Slab

The total amount of memory, in kibibytes, used by the kernel to cache data structures for its own use.

Dirty Memory

Page cache (disk cache) is used to reduce the number of disk reads.

Setting: /etc/sysctl.conf

dirty ratio

Memory:  |   blocking    L      nonblocking      U         |

Upper limit

# kernel start background writing out dirty data

# 觸發 pdflush/flush/kdmflush

# a percentage of total available memory
vm.dirty_background_ratio = 10

Lower limit

the process doing writes would block and wait kernel write out dirty pages to the disks

vm.dirty_ratio = 20

# 多久才觸發一次 pdflush/flush/kdmflush processes wake up. Unit: 百分之一秒

vm.dirty_writeback_centisecs = 500

# dirty page 過了多久之後下次 pdflush 會被寫入 (作用: safeguard against data loss)

vm.dirty_expire_centisecs = 3000

_bytes 與 _ratio

# If you set the _bytes version the _ratio version will become 0, and vice-versa.

vm.dirty_background_bytes and vm.dirty_bytes

egrep -w "Dirty|Writeback" /proc/meminfo

Dirty:               200 kB    # sync 後它會歸 0
Writeback:             0 kB

egrep "nr_dirty|nr_writeback" /proc/vmstat

nr_dirty 50                    # nr = number
nr_writeback 0

Read cache 測試

測試

echo 1 > /proc/sys/vm/drop_caches

free -m

              total        used        free      shared  buff/cache   available
Mem:          15857        7576        8091          27         189        8002
Swap:          7167         922        6245

cat vda.qcow2 > /dev/null

free -m

              total        used        free      shared  buff/cache   available
Mem:          15857        7576        6362          27        1917        7940
Swap:          7167         922        6245

cache 由 189 -> 1917

cat, cp, dd call would put the file into cache

Bypass Copy Using Cache

Link