Objectives
- Know how to use entries in
- /proc/sys/vm
- Decipher
- /proc/meminfo
- Use vmstat
- To display information about memory, paging, I/O, processor, activity, and processes memory consumption
- Understand
- How the OOM-Killer decides when to take action and selects which processes should be exterminated to open up some memory
Memory Tuning considerations
Tunning the memory sub-system can be a complex process. First of all one has to take note that memory usage and I/O throughput are intrinsically related, as in most cases most memory is being used to cache the contents of files on disk.
Thus changing memory parameters can have a large effect on I/O performance, and changing I/O paramenters can have an equally large converse effect on the virtual memory sub-system.
When tweaking parameters in
/proc/sys/vm
The usual best practice is to adjust one thing at a time and look for effects. The primary (inter-related) tasks are:
- Controlling flushing parameters
- How many pages are allowed to be dirty and how often they are flushed out to disk
- Controlling swap behavior
- How much pages that reflect file contents are allowed to remain in memory as opposed to those that need to be swapped out as they have no other backing store
- Controlling how much memory overcommission is allowed
- Since many programs never need the full amount of memory they request, particularly because of copy on write (COW) techniques
Memory tuning can often be subtle, and what works in one system situation or load may be far from optimal in other cicumstances.
Memory Monitoring Tools
Utility | Purpose | Package |
---|---|---|
free | Brief summary of memory usage | procps |
vmstat | Detailed virtual memory statistics and block I/O, dynamically updated | procps |
pmap | Process memory map | procps |
/proc/sys/vm
This directory contains many tubale knobs to control the virtual memory system. Exactly what appears in this directory will depend somewhat on kernel version. Almost all of the entries are writable (by root).
Remember these values can be changed either by directly writing to the entry, or using the sysctl utility. Furthermore, by modifying
/etc/sysctl.conf
values can be set at boot time
vmstat
Is a multipurpose tool that displays information about memory, paging, I/O, processor activity and processes. It has many options. The general form of the command is
$ vmstat [options] [delay] [count]
If delay is given in seconds, the report is repeated at the interval count times. If count is not given vmstat will keep reporting statistics forever until it is killed by a signal, such as Ctl-c
sample
$ vmstat 2 4 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 1631696 30828 650552 0 0 241 44 92 251 2 1 97 0 0 0 0 0 1631696 30828 650560 0 0 0 0 79 281 2 1 98 0 0 0 0 0 1631448 30836 650560 0 0 0 8 291 696 12 1 87 0 0 2 0 0 1527064 31156 703748 0 0 26206 6 1234 2894 69 13 5 12
Fields
Field | Subfield | Meaning |
---|---|---|
Processes | r | Number of processes waiting to be scheduled in |
Processes | b | Number of processes in uninterruptible sleep |
memory | swpd | Virtual memory used (KB) |
memory | free | Free (idle) memory (KB) |
memory | buff | Buffer memory (KB) |
memory | cache | Cached memory (KB) |
swap | si | Memory swapped in (KB) |
swap | so | Memory swapped out (KB) |
I/O | bi | Blocks written to devices (blocks/sec) |
I/O | bo | Blocks read from devices (block/sec) |
system | in | Interrupts/second |
system | cs | Context switches/second |
CPU | us | CPU time running user code (percentage) |
CPU | sy | CPU time running kernel (system) code (percentage) |
CPU | id | CPU time idle (percentage) |
CPU | wa | Time waiting for I/O (percentage) |
CPU | st | Time "stolen" from virtual machine (percentage) |
If the option -S m is given, memory statistics will be in MB instead of KB.
Within the -a option, vmstat displays information about active and inactive memory
$ vmstat -a 2 4 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free inact active si so bi bo in cs us sy id wa st 5 0 0 1340280 392304 1152588 0 0 215 41 126 430 3 1 96 0 0 0 0 0 1340156 392304 1152600 0 0 0 0 56 171 1 1 98 1 0 0 0 0 1340156 392304 1152560 0 0 0 0 52 156 2 0 98 0 0 0 0 0 1340156 392304 1152564 0 0 0 22 52 164 1 1 98 0 0
Where active memory pages are those that have been used recently, they may be clean (disk content are up to date) or dirty (need to be flushed to disk eventually). By contrast inactive memory pages have not been recently used and are more likely to be clean and are released sooner under memory pressure.
Memory can move back and forth between active and inactive lists as they get newly referenced, or go a long time between uses.
To get a table of memory statistics and certain event counters use the -s option
$ vmstat -s 3030444 K total memory 967148 K used memory 1200696 K active memory 392660 K inactive memory 1291556 K free memory 32304 K buffer memory 739436 K swap cache 0 K total swap 0 K used swap 0 K free swap 12365 non-nice user cpu ticks 107 nice user cpu ticks 2108 system cpu ticks 341316 idle cpu ticks 348 IO-wait cpu ticks 0 IRQ cpu ticks 251 softirq cpu ticks 0 stolen cpu ticks 695214 pages paged in 133713 pages paged out 0 pages swapped in 0 pages swapped out 447805 interrupts 1575344 CPU context switches 1470695353 boot time 2906 forks
To get a table of disk statistics use the -d option:
$ vmstat -d disk- ------------reads------------ ------------writes----------- -----IO------ total merged sectors ms total merged sectors ms cur sec sda 19442 1192 1372984 30380 4219 4822 267642 27717 0 19 sdb 313 0 7968 254 1 0 2 0 0 0 sdc 193 0 3092 135 3 0 4 6 0 0 sdd 134 0 1934 86 3 0 4 6 0 0 sr0 20 0 128 18 0 0 0 0 0 0 dm-0 20303 0 1358378 42552 8825 0 267624 47023 0 19 dm-1 94 0 1264 134 0 0 0 0 0 0 loop0 54 2 2098 208 6 2 16 17 0 0 dm-2 61 0 2212 44 1 0 2 0 0 0 md0 65 0 1330 0 1 0 2 0 0 0 loop1 63 0 2225 42 0 0 0 0 0 0 dm-3 43 0 2072 39 0 0 0 0 0 0
Disk Fields
Field | Subfield | Meaning |
---|---|---|
reads | total | Total reads completed successfully |
reads | merged | Grouped reads (resulting in on I/O) |
reads | ms | Milliseconds spent reading |
writes | total | Total writes completed successfully |
writes | merged | Grouped writes (resulting in one I/O) |
writes | ms | Milliseconds spent writing |
I/O | cur | I/O in progress |
I/O | sec | seconds spent for I/O |
For a quick reference statistics in a define partition, use the -p option
$ vmstat -p /dev/sda1 2 4 sda1 reads read sectors writes requested writes 119 8510 3 18 119 8510 3 18 119 8510 3 18 119 8510 3 18
/proc/meminfo
$ cat /proc/meminfo MemTotal: 3030444 kB MemFree: 1326732 kB MemAvailable: 2037868 kB Buffers: 32940 kB Cached: 666512 kB SwapCached: 0 kB Active: 1164908 kB Inactive: 393316 kB Active(anon): 859588 kB Inactive(anon): 1620 kB Active(file): 305320 kB Inactive(file): 391696 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 858768 kB Mapped: 263828 kB Shmem: 2440 kB Slab: 78924 kB SReclaimable: 39896 kB SUnreclaim: 39028 kB KernelStack: 7856 kB PageTables: 38552 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1515220 kB Committed_AS: 4240404 kB VmallocTotal: 34359738367 kB VmallocUsed: 10140 kB VmallocChunk: 34359720316 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 98240 kB DirectMap2M: 2996224 kB
Fields Description
Entry | Meaning |
---|---|
MemTotal | Total usable RAM (Physical minus some kernel reserved memory) |
MemFree | Free memory in both low and high zones |
Buffers | Memory used for temporary block I/O storage |
Cached | Page cache memory, mostly for file I/O |
SwapCached | Memory that was swapped back in but is still in the swap file |
Active | Recently used memory, not to be claimed first |
Inactive | Memory not recently used, more elegible for reclamation |
Active (anon) | Active memory for anonymous pages |
Inactive (anon) | Inactive memory for anonymous pages |
Active (file) | Active memory for file-backed pages |
Inactive (file) | Inactive memory for file-backed pages |
Unevictable | Pages which can not be swapped out of memory or released |
Mlocked | Pages which are locked in memory |
SwapTotal | Total swap space available |
SwapFree | Swap space not being used |
Dirty | Memory which needs to be written back to disk |
Writeback | Memory actively being written back to disk |
AnonPages | Non-file back pages in cache |
Mapped | Memory mapped pages, such as libraries |
Shmem | Pages used for shared memory |
Slab | Memory used in slabs |
SReclaimable | Cache memory in slabs that can be reclaimed |
SUnreclaim | Memory in slabs that can't be reclaimed |
KernelStack | Memory used in kernel stack |
PageTables | Memory being used by page table structures |
Bounce | Memory used for block device bounce buffers |
WritebackTmp | Memory used by FUSE filesystems for writeback buffers |
CommitLimit | Total memory available to be used, including overcommission |
Committed_AS | Total memory presently allocated, whether or not it is used |
VmallocTotal | Total memory available in kernel for vmalloc allocations |
VmallocUsed | Memory actually used by vmalloc allocations |
VmallocChunk | Largest possible contiguous vmalloc area |
HugePages_Total | Total size of the huge page pool |
HugePages_Free | Huge pages that are not yet allocated |
HugePages_Rsvd | Huge pages that have been reserved, but not yet used |
HugePages_Surp | Huge pages that are surplus, used for overcommission |
Hugepagesize | Size of a huge page |
OOM Killer
Its a mechanism that decides which processes should be exterminated to open up some memory. This is activated when the available memory space is exhausted.
In order to determine which process shall be killed, there is a value called badness, which can be read from
/proc/[pid]/oom_score
Two entries in the same directory can be used to promote or demote the likelihood of extermination. The value of oom_adj is the number of bits the points should be adjusted by. Normal users can only increase badness, a decrease (a negative value in oom_adj) can only be set by superuser. Now oom_adj is deprecated and oom_adj_score is used instead.
Also Linux offers the possibility to use the overcommit memory mechanism in order to expand the memory usability beyond its capabilities.
The behavior of this mechanism can be tuned within
/proc/sys/vm/overcommit_memory
- 0
- Permit overcommision, but refuse obvious overcommits, and give root users somewhat more memory allocation than normal users.
- 1
- All memory requests are allowed to overcommit
- 2
- Turn off overcommission. Memory requests will fail when the total memory commit reaches the size of the swap space plus a configurable percentage (50 by default) of RAM. This factor can be modified changing
- /proc/sys/vm/overcommit_ratio
- Turn off overcommission. Memory requests will fail when the total memory commit reaches the size of the swap space plus a configurable percentage (50 by default) of RAM. This factor can be modified changing