Introduction
We will cover common utilities for process monitoring like ps, top and sar. Most of this commands make heavy access to the pseudo-filesystems mounted at /proc and /sys.
Objectives
- Recognize and use available system monitoring tools
- Use /proc and /sys pseudo-filesystems
- Use sar to gather system activity and performance data
- Create reports
Available monitoring tools
Utility | Purpose | Package |
---|---|---|
top | Display process activity, dinamically updated | procps |
uptime | How long the system is running and the average load | procps |
ps | Detailed information about processes | procps |
pstree | A tree of processes and their connections | procps |
mpstat | Multiple processor usage | sysstat |
iostat | CPU utilization and I/O statistics | sysstat |
sar | Display and collect information about system activity | sysstat |
numastat | Information about NUMA (Non Uniform Memory Architecture) | numactl |
strace | Information about all system calls and process makes | strace |
Memory Monitoring Utilities
Utility | Purpose | Package |
---|---|---|
free | Brief summary of memory usage | procps |
vmstat | Detailed virtual memory statistics and block I/O | procps |
pmap | Process memory map | procps |
I/O Monitoring Utilities
Utility | Purpose | Package |
---|---|---|
iostat | CPU utilization and I/O statistics | sysstat |
sar | Display and collect information about system activity | sysstat |
vmstat | Detailed virtual memory statistics and block I/O, dynamically updated | procps |
Network Monitoring Utilities
Utility | Purpose | Package |
---|---|---|
netstat | Detailed networking statistics | netstat |
iptraf | Gather information on network interfaces | iptraf |
tcpdump | Detailed analysis of network packets and traffic | tcpdump |
wireshark | Detailed network traffic analysis | wireshark |
/proc Basics
This directory contains information about processes, over time it grew to contain a lot of information about system properties, such as interrupts, memory, networking, etc.
A survey of /proc
Let's take a look at what resides in the /proc pseudo-filesystem
$ ls -F /proc 1/ 1465/ 1739/ 1966/ 2213/ 2401/ 2748/ 615/ 752/ 836/ 89/ devices latency_stats swaps 10/ 1476/ 1751/ 1970/ 2218/ 2406/ 2777/ 621/ 755/ 837/ 895/ diskstats loadavg sys/ 1014/ 1486/ 1759/ 2/ 2222/ 2409/ 29/ 630/ 756/ 838/ 899/ dma locks sysrq-trigger 1027/ 15/ 1762/ 20/ 2229/ 2411/ 3/ 641/ 76/ 84/ 9/ driver/ mdstat sysvipc/ 1043/ 1517/ 177/ 2013/ 2231/ 2414/ 30/ 643/ 77/ 846/ 909/ execdomains meminfo thread-self@ 1046/ 1528/ 1770/ 2018/ 2233/ 2431/ 3060/ 644/ 78/ 847/ 916/ fb misc timer_list 1050/ 1534/ 1775/ 2078/ 2248/ 2470/ 3503/ 647/ 79/ 848/ 917/ filesystems modules timer_stats 1072/ 1536/ 1780/ 2083/ 2249/ 2475/ 3511/ 664/ 797/ 85/ 92/ fs/ mounts@ tty/ 1081/ 1555/ 1784/ 21/ 2261/ 25/ 3516/ 687/ 8/ 850/ 93/ interrupts mtrr uptime 11/ 16/ 1791/ 2102/ 2272/ 2549/ 3526/ 688/ 80/ 854/ 94/ iomem net@ version 12/ 1601/ 1794/ 2107/ 2286/ 2561/ 354/ 695/ 807/ 855/ acpi/ ioports pagetypeinfo vmallocinfo 1269/ 1609/ 1796/ 2111/ 2293/ 2563/ 461/ 696/ 809/ 86/ asound/ irq/ partitions vmstat 1293/ 1611/ 18/ 2123/ 23/ 2588/ 462/ 7/ 81/ 861/ buddyinfo kallsyms sched_debug zoneinfo 13/ 1621/ 1855/ 2145/ 2301/ 26/ 472/ 727/ 82/ 862/ bus/ kcore schedstat 1300/ 1696/ 19/ 2155/ 2308/ 2631/ 473/ 728/ 828/ 864/ cgroups keys scsi/ 1319/ 1697/ 1910/ 2172/ 2389/ 2640/ 492/ 73/ 83/ 868/ cmdline key-users self@ 1326/ 17/ 1919/ 2182/ 2395/ 2643/ 493/ 74/ 830/ 869/ consoles kmsg slabinfo 135/ 1732/ 1921/ 2187/ 2399/ 2644/ 5/ 75/ 834/ 87/ cpuinfo kpagecount softirqs 14/ 1737/ 1930/ 22/ 24/ 2693/ 593/ 750/ 835/ 879/ crypto kpageflags stat
Let's take a look at a single process 2111 for example
$ ls -lF /proc/2111 total 0 dr-xr-xr-x. 2 abernal abernal 0 Jul 23 22:48 attr/ -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 autogroup -r--------. 1 abernal abernal 0 Jul 23 22:48 auxv -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 cgroup --w-------. 1 abernal abernal 0 Jul 23 22:48 clear_refs -r--r--r--. 1 abernal abernal 0 Jul 23 17:26 cmdline -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 comm -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 coredump_filter -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 cpuset lrwxrwxrwx. 1 abernal abernal 0 Jul 23 22:48 cwd -> /home/abernal/ -r--------. 1 abernal abernal 0 Jul 23 22:48 environ lrwxrwxrwx. 1 abernal abernal 0 Jul 23 17:34 exe -> /usr/libexec/at-spi2-registryd* dr-x------. 2 abernal abernal 0 Jul 23 22:48 fd/ dr-x------. 2 abernal abernal 0 Jul 23 22:48 fdinfo/ -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 gid_map -r--------. 1 abernal abernal 0 Jul 23 22:48 io -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 latency -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 limits -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 loginuid dr-x------. 2 abernal abernal 0 Jul 23 22:48 map_files/ -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 maps -rw-------. 1 abernal abernal 0 Jul 23 22:48 mem -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 mountinfo -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 mounts -r--------. 1 abernal abernal 0 Jul 23 22:48 mountstats dr-xr-xr-x. 6 abernal abernal 0 Jul 23 22:48 net/ dr-x--x--x. 2 abernal abernal 0 Jul 23 22:48 ns/ -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 numa_maps -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 oom_adj -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 oom_score -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 oom_score_adj -r--------. 1 abernal abernal 0 Jul 23 22:48 pagemap -r--------. 1 abernal abernal 0 Jul 23 22:48 personality -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 projid_map lrwxrwxrwx. 1 abernal abernal 0 Jul 23 22:48 root -> // -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 sched -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 schedstat -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 sessionid -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 setgroups -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 smaps -r--------. 1 abernal abernal 0 Jul 23 22:48 stack -r--r--r--. 1 abernal abernal 0 Jul 23 17:27 stat -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 statm -r--r--r--. 1 abernal abernal 0 Jul 23 17:27 status -r--------. 1 abernal abernal 0 Jul 23 22:48 syscall dr-xr-xr-x. 5 abernal abernal 0 Jul 23 22:48 task/ -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 timers -rw-r--r--. 1 abernal abernal 0 Jul 23 22:48 uid_map -r--r--r--. 1 abernal abernal 0 Jul 23 22:48 wchan
Let's see the content of the status folder
$ cat /proc/2111/status Name: at-spi2-registr State: S (sleeping) Tgid: 2111 Ngid: 0 Pid: 2111 PPid: 1 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 64 Groups: 10 1000 NStgid: 2111 NSpid: 2111 NSpgid: 1966 NSsid: 1966 VmPeak: 290744 kB VmSize: 225208 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 7340 kB VmRSS: 7340 kB VmData: 148024 kB VmStk: 136 kB VmExe: 84 kB VmLib: 11108 kB VmPTE: 180 kB VmPMD: 12 kB VmSwap: 0 kB Threads: 3 SigQ: 0/11754 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff Seccomp: 0 Cpus_allowed: 1 Cpus_allowed_list: 0 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 12428 nonvoluntary_ctxt_switches: 67
Let's also check the interrupts
$ cat /proc/interrupts CPU0 0: 137 IO-APIC 2-edge timer 1: 4739 IO-APIC 1-edge i8042 8: 0 IO-APIC 8-edge rtc0 9: 0 IO-APIC 9-fasteoi acpi 12: 155 IO-APIC 12-edge i8042 14: 0 IO-APIC 14-edge ata_piix 15: 19586 IO-APIC 15-edge ata_piix 19: 104062 IO-APIC 19-fasteoi enp0s3 21: 29455 IO-APIC 21-fasteoi 0000:00:0d.0, snd_intel8x0 22: 16037 IO-APIC 22-fasteoi ohci_hcd:usb1 NMI: 0 Non-maskable interrupts LOC: 985933 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts DFR: 0 Deferred Error APIC interrupts MCE: 0 Machine check exceptions MCP: 67 Machine check polls HYP: 0 Hypervisor callback interrupts ERR: 0 MIS: 0 PIN: 0 Posted-interrupt notification event PIW: 0 Posted-interrupt wakeup event
Most of the tunnable parameters can be found at
$ ls -F /proc/sys abi/ crypto/ debug/ dev/ fs/ kernel/ net/ sunrpc/ vm/
- abi
- Contains files with application binary information, rarely used
- debug
- Debugging parameters, for now just some control of exception reporting
- dev
- Device parameters including subdirectories for cdrom, scsi, raid and parport
- fs
- Filesystem parameters, including quota, file handles used and maximums, inode and directory information, etc.
- kernel
- Kernel parameters
- net
- Network parameters, with subdirectories for ipv4, netfilter, etc
- vm
- Virtual memory parameters
Viewing and changing the parameters can be done with simple commands. For example the maximum number of threads allowed an the system can be seen by looking at
$ ls -l /proc/sys/kernel/threads-max $ cat /proc/sys/kernel/threads-max 129498
We can modify the value and verify the change
$ sudo bash -c 'echo 100000 > /proc/sys/kernel/threads-max' $ cat /proc/sys/kernel/threads-max 100000
The same can be accomplished with sysctl by
$ sudo sysctl kernel.threads-max=100000
/sys Basics
The /sys pseudo-filesystem is an integral part of the Unified Device Model. Conceptually it is based on a device tree, one can walk through it and see the buses, devices, etc. It also now contains information whcih may or may not be strictly related to devices, such as kernel modules.
A survey of /sys
Support for the sysfs virtual filesystem is built into all modern kernels.
$ls -F /sys block/ bus/ class/ dev/ devices/ firmware/ fs/ hypervisor/ kernel/ module/ power/
Network devices can be examined with
$ ls -lF /sys/class/net
Looking at the first Ethernet card
$ ls -l /sys/class/net/eth0
In order to check the mtu parameter we can type
$ cat /sys/class/net/eth0/mtu 1500
The intention with sysfs is to have one text value per line, although this is not expected to be rigorously enforced.
Checking the device of the Ethernet card by
$ ls -l /sys/class/net/eth0/device
sar
Systems Activity Reporter, an all purpose tool for gathering system activity and performance data, also useful to create reports.
The backend tool for sar is sadc (system activity data collector), which accumulates the statistics. Stores information in /var/log/sa directory, with a daily frecuency by default, but which can be adjusted. Data collection can be started by the command line and regular periodic collection is usually started as a cron job stored in /etc/cron.d/sysstat
The sar tool then reads in this data and then produces a report.
Syntaxis
$ sar [options] [interval] [count]
Sample
$ sar 3 3 Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/23/2016 _x86_64_ (1 CPU) 11:47:04 PM CPU %user %nice %system %iowait %steal %idle 11:47:07 PM all 2.05 0.00 0.00 0.00 0.00 97.95 11:47:10 PM all 2.04 0.00 0.34 0.00 0.00 97.62 11:47:13 PM all 1.71 0.00 0.00 0.00 0.00 98.29 Average: all 1.93 0.00 0.11 0.00 0.00 97.95
This command gives a report about the CPU usage
Sar Options
Option | Meaning |
---|---|
-A | Almost all information |
-b | I/O and transfer rate statistics (similar to iostat) |
-B | Paging statistics including page faults |
-x | Block device activity |
-n | Network statistics |
-P | Per CPU statistics (as in sar -P ALL 3) |
-q | Queue lengths (run queue, processes and threads) |
-r | Swap and memory utilization statistics |
-R | Memory statistics |
-u | CPU utilization |
-v | Statistics about inodes and files and file handlers |
-w | Context switching statistics |
-W | Swapping statistics, pages in and out per second |
-f | Extract information from specified file, created by the -o option |
-o | Save readings in the file specified, to be read in later with -f option |
Giving paging statistics
$ sar -B 3 3 Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/23/2016 _x86_64_ (1 CPU) 11:58:45 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 11:58:48 PM 0.00 4.14 26.21 0.00 1861.72 0.00 0.00 0.00 0.00 11:58:51 PM 0.00 0.00 7.51 0.00 1412.63 0.00 0.00 0.00 0.00 11:58:54 PM 0.00 13.56 40.00 0.00 1587.80 0.00 0.00 0.00 0.00 Average: 0.00 5.92 24.60 0.00 1619.82 0.00 0.00 0.00 0.00
And giving I/O transfer statistics
$ sar -b 3 3 Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/24/2016 _x86_64_ (1 CPU) 12:00:52 AM tps rtps wtps bread/s bwrtn/s 12:00:55 AM 0.00 0.00 0.00 0.00 0.00 12:00:58 AM 0.00 0.00 0.00 0.00 0.00 12:01:01 AM 0.00 0.00 0.00 0.00 0.00 Average: 0.00 0.00 0.00 0.00 0.00
Lab 22.1: Using stress
stress is a C language program written by Amos Waterland at the University of Oklahoma, licensed under the GPL v2. It is designed to place a configurable amount of stress by generating various kinds of workloads on the system.
If you are lucky you can install stress directly from your distribution’s packaging system. Otherwise, you can obtain the
source from http://people.seas.harvard.edu/~apw/stress, and then compile and install by doing:
$ tar zxvf stress-1.0.4.tar.gz $ cd stress-1.0.4 $ ./configure $ make $ sudo make install
There may exist pre-packaged downloadable binaries in the .deb and .rpm formats; see the home page for details and locations.
Once installed, you can do:
$ stress --help
for a quick list of options, or
$ info stress
for more detailed documentation.
As an example, the command:
$ stress -c 8 -i 4 -m 6 -t 20s
will:
- Fork off 8 CPU-intensive processes, each spinning on a sqrt() calculation.
- Fork off 4 I/O-intensive processes, each spinning on sync().
- Fork off 6 memory-intensive processes, each spinning on malloc(), allocating 256 MB by default. The size can be changed as in --vm-bytes 128M.
- Run the stress test for 20 seconds.
After installing stress, you may want to start up your system’s graphical system monitor, which you can find on your application menu, or run from the command line, which is probably gnome-system-monitor or ksysguard.
Now begin to put stress on the system. The exact numbers you use will depend on your system’s resources, such as the as the number of CPU’s and RAM size.
For example, doing
$ stress -m 4 -t 20s
puts only a memory stressor on the system. Play with combinations of the switches and see how they impact each other. You may find the stress program useful to simulate various high load conditions.