Chapter 23. Process Monitoring

Introduction

Keep track of processes is a very important task which can be performed by the ps and top utility

Objectives

  • Use ps to check processes characteristics and statistics
  • Identify the ps output fields and customize such output
  • Use pstree to get a visual description of the process ancestry and multi threaded applications
  • Use top to view system loads interactively

Monitoring Tools

Utility Purpose
top Process activity, dinamically updated
uptime How long the system is running and the average load
ps Detailed information about processes
pstree A tree of processes and their connections
mpstat Multiple processor usage
iostat CPU utilization and I/O statistics
sar Display and collect information about system activity
numastat Information about NUMA (Non Uniform Memory Architecture)
strace Information about all system calls a process makes

Viewing Process States with PS

This command is a workhorse for displaying characteristics and statistics associated with processes, all of which are garnered from the /proc directory associated with the process.

There are three different flavors

  • UNIX options, which must be preceded by -, and which may be grouped
  • BSD options, which must not be preceded by -, and which may be grouped
  • GNU long options, each of which must be preceded by --

BDS Option format for ps

A typical usage with this format is 

$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  2.2  0.2  46624  8472 ?        Ss   12:14   0:01 /usr/lib/systemd/systemd --switched-root --system --deseriali
root         2  0.0  0.0      0     0 ?        S    12:14   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    12:14   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S    12:14   0:00 [kworker/0:0]
root         5  0.0  0.0      0     0 ?        S<   12:14   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S    12:14   0:00 [kworker/u2:0]
root         7  0.4  0.0      0     0 ?        S    12:14   0:00 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    12:14   0:00 [rcu_bh]
root         9  0.2  0.0      0     0 ?        S    12:14   0:00 [rcuos/0]
root        10  0.0  0.0      0     0 ?        S    12:14   0:00 [rcuob/0]
root        11  0.0  0.0      0     0 ?        S    12:14   0:00 [migration/0]
root        12  0.0  0.0      0     0 ?        S    12:14   0:00 [watchdog/0]
root        13  0.0  0.0      0     0 ?        S<   12:14   0:00 [khelper]
root        14  0.0  0.0      0     0 ?        S    12:14   0:00 [kdevtmpfs]
root        15  0.0  0.0      0     0 ?        S<   12:14   0:00 [netns]
root        16  0.0  0.0      0     0 ?        S<   12:14   0:00 [perf]
root        17  0.0  0.0      0     0 ?        S<   12:14   0:00 [writeback]
root        18  0.0  0.0      0     0 ?        SN   12:14   0:00 [ksmd]
root        19  0.0  0.0      0     0 ?        SN   12:14   0:00 [khugepaged]

​This command displays all processes

ps output fields

  • VSZ
    • Is the process virtual memory size in KB
  • RSS
    • Is the resident set size, the non-swapped physical memory a task is using in KB
  • STAT
    • Describes the state of the process
      • S
        • Sleeping
      • R
        • Running
      • <
        • High priority
      • N
        • For low priority
      • L
        • For having pages locked in memory
      • s
        • For session leader
      • l
        • For multi threaded
      • +
        • For being in the foregroup process group

Adding the f option will show how processes connect by ancestry

​$ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    12:14   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kworker/0:0H]
root         7  0.1  0.0      0     0 ?        S    12:14   0:01  \_ [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [rcuos/0]
root        10  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [rcuob/0]
root        11  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [migration/0]
root        12  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [watchdog/0]
root        13  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [khelper]
root        14  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [kdevtmpfs]
root        15  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [netns]
root        16  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [perf]
root        17  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [writeback]
root        18  0.0  0.0      0     0 ?        SN   12:14   0:00  \_ [ksmd]
root        19  0.0  0.0      0     0 ?        SN   12:14   0:00  \_ [khugepaged]
root        20  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [crypto]
root        21  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kintegrityd]
root        22  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [bioset]
root        23  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kblockd]
root        24  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [ata_sff]
root        25  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [md]
root        26  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [devfreq_wq]
root        29  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [kswapd0]
root        30  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [fsnotify_mark]
root        73  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kthrotld]
root        74  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [acpi_thermal_pm]
root        75  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_0]
root        76  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_0]
root        77  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_1]
root        78  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_1]
root        79  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_2]
root        80  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_2]
root        81  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_3]
root        82  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_3]
root        83  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_4]
root        85  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_4]
root        86  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [kworker/u2:3]
root        87  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [scsi_eh_5]
root        89  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [scsi_tmf_5]
root        90  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [kworker/u2:5]
root        92  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kpsmoused]
root        93  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [dm_bufio_cache]
root        94  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [ipv6_addrconf]
root       135  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [deferwq]
root       178  0.0  0.0      0     0 ?        S    12:14   0:00  \_ [kauditd]
root       354  0.0  0.0      0     0 ?        S<   12:14   0:00  \_ [kworker/0:1H]
root       461  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kdmflush]
root       462  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root       472  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kdmflush]
root       473  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root       492  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [jbd2/dm-0-8]
root       493  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [ext4-rsv-conver]
root       612  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [kworker/0:5]
root       619  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [rpciod]
root       642  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kloopd0]
root       643  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kworker/u3:1]
root       644  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kworker/u3:2]
root       645  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [jbd2/loop0-8]
root       646  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [ext4-rsv-conver]
root       686  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kdmflush]
root       687  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root       719  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [jbd2/sda1-8]
root       720  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [ext4-rsv-conver]
root       725  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root       726  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [md0_raid1]
root       750  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [jbd2/md0-8]
root       753  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [ext4-rsv-conver]
root       757  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [jbd2/dm-2-8]
root       758  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [ext4-rsv-conver]
root       829  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kloopd1]
root       831  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kdmflush]
root       836  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root       838  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kcryptd_io]
root       839  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [kcryptd]
root       840  0.0  0.0      0     0 ?        S    12:15   0:00  \_ [dmcrypt_write]
root       842  0.0  0.0      0     0 ?        S<   12:15   0:00  \_ [bioset]
root      2754  0.0  0.0      0     0 ?        S    12:21   0:00  \_ [kworker/0:0]
root      2779  0.0  0.0      0     0 ?        S    12:26   0:00  \_ [kworker/0:1]
root      2784  0.0  0.0      0     0 ?        S    12:30   0:00  \_ [kworker/0:2]
root         1  0.1  0.2 128552  8500 ?        Ss   12:14   0:01 /usr/lib/systemd/systemd --switched-root --system --deseriali
root       593  0.1  0.7  54016 23472 ?        Ss   12:15   0:01 /usr/lib/systemd/systemd-journald
root       629  0.0  0.2  46900  7080 ?        Ss   12:15   0:00 /usr/lib/systemd/systemd-udevd
root       631  0.0  0.1 271076  5948 ?        Ss   12:15   0:00 /usr/sbin/lvmetad -f
root       800  0.0  0.1  49056  3088 ?        S<sl 12:15   0:00 /sbin/auditd -n
root       811  0.0  0.0  80232  1648 ?        S<sl 12:15   0:00  \_ /sbin/audispd
root       813  0.0  0.1  52200  3308 ?        S<   12:15   0:00      \_ /usr/sbin/sedispatch

Unix Option Format for ps

A typical usage with Unix format

​$ ps -elf
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S root         1     0  0  80   0 - 32138 ep_pol 12:14 ?        00:00:01 /usr/lib/systemd/systemd --switched-root --system -
1 S root         2     0  0  80   0 -     0 kthrea 12:14 ?        00:00:00 [kthreadd]
1 S root         3     2  0  80   0 -     0 smpboo 12:14 ?        00:00:00 [ksoftirqd/0]
1 S root         5     2  0  60 -20 -     0 worker 12:14 ?        00:00:00 [kworker/0:0H]
1 S root         7     2  0  80   0 -     0 rcu_gp 12:14 ?        00:00:01 [rcu_sched]
1 S root         8     2  0  80   0 -     0 rcu_gp 12:14 ?        00:00:00 [rcu_bh]
1 S root         9     2  0  80   0 -     0 rcu_no 12:14 ?        00:00:00 [rcuos/0]
1 S root        10     2  0  80   0 -     0 rcu_no 12:14 ?        00:00:00 [rcuob/0]
1 S root        11     2  0 -40   - -     0 smpboo 12:14 ?        00:00:00 [migration/0]
5 S root        12     2  0 -40   - -     0 smpboo 12:14 ?        00:00:00 [watchdog/0]
1 S root        13     2  0  60 -20 -     0 rescue 12:14 ?        00:00:00 [khelper]

Note that this output has the PPID (Parent ID) and the niceness (NI). Other common selection options for UNIX format

  • -A or -e
    • Select all processes
  • -N
    • Negate selection
  • -C
    • Select by command name
  • -G
    • Select by real group ID
  • -U
    • Select by real user ID

Customizing the ps Output

Using the -o option will allow us to print out a customized list of ps fields

  • pid : process id number
  • uid : user id number of process owner
  • cmd : command with all arguments
  • cputime : cumulative CPU time
  • pmem : ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage.

Sample

$ ps -eo pid,ppid,command
  PID  PPID COMMAND
    1     0 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    2     0 [kthreadd]
    3     2 [ksoftirqd/0]
    5     2 [kworker/0:0H]
    7     2 [rcu_sched]
    8     2 [rcu_bh]
    9     2 [rcuos/0]
   10     2 [rcuob/0]
   11     2 [migration/0]
   12     2 [watchdog/0]
   13     2 [khelper]
   14     2 [kdevtmpfs]
   15     2 [netns]
   16     2 [perf]
   17     2 [writeback]
   18     2 [ksmd]
   19     2 [khugepaged]
   20     2 [crypto]
   21     2 [kintegrityd]
   22     2 [bioset]
   23     2 [kblockd]
   24     2 [ata_sff]
   25     2 [md]
   26     2 [devfreq_wq]
   29     2 [kswapd0]
   30     2 [fsnotify_mark]
   73     2 [kthrotld]
   74     2 [acpi_thermal_pm]
   75     2 [scsi_eh_0]
   76     2 [scsi_tmf_0]
   77     2 [scsi_eh_1]
   78     2 [scsi_tmf_1]
   79     2 [scsi_eh_2]
   80     2 [scsi_tmf_2]
   81     2 [scsi_eh_3]
   82     2 [scsi_tmf_3]
   83     2 [scsi_eh_4]
   85     2 [scsi_tmf_4]
   86     2 [kworker/u2:3]
   87     2 [scsi_eh_5]
   89     2 [scsi_tmf_5]
   90     2 [kworker/u2:5]
   92     2 [kpsmoused]
   93     2 [dm_bufio_cache]
   94     2 [ipv6_addrconf]
  135     2 [deferwq]
  178     2 [kauditd]
  354     2 [kworker/0:1H]
  461     2 [kdmflush]
  462     2 [bioset]
  472     2 [kdmflush]
  473     2 [bioset]
  492     2 [jbd2/dm-0-8]
  493     2 [ext4-rsv-conver]
  593     1 /usr/lib/systemd/systemd-journald
  619     2 [rpciod]
  629     1 /usr/lib/systemd/systemd-udevd
  631     1 /usr/sbin/lvmetad -f
  642     2 [kloopd0]
  643     2 [kworker/u3:1]
  644     2 [kworker/u3:2]
  645     2 [jbd2/loop0-8]
  646     2 [ext4-rsv-conver]
  686     2 [kdmflush]
  687     2 [bioset]
  719     2 [jbd2/sda1-8]
  720     2 [ext4-rsv-conver]
  725     2 [bioset]
  726     2 [md0_raid1]
  750     2 [jbd2/md0-8]
  753     2 [ext4-rsv-conver]
  757     2 [jbd2/dm-2-8]
  758     2 [ext4-rsv-conver]
  800     1 /sbin/auditd -n
  811   800 /sbin/audispd
  813   811 /usr/sbin/sedispatch
  829     2 [kloopd1]
  831     2 [kdmflush]
  836     2 [bioset]
  838     2 [kcryptd_io]
  839     2 [kcryptd]
  840     2 [dmcrypt_write]
  842     2 [bioset]
  851     1 /sbin/rngd -f
  852     1 /usr/libexec/accounts-daemon
  853     1 /usr/sbin/ModemManager
  854     1 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  857     1 /usr/bin/python3 -Es /usr/sbin/firewalld --nofork --nopid

​Using pstree

This command gives a visual display showing the ancestral and multi threading relationships

$ pstree -aAp 1267
gdm-session-wor,1267
  |-gdm-x-session,1320 /usr/bin/gnome-session --autostart /usr/share/gdm/greeter/autostart
  |   |-Xorg,1328 vt1 -displayfd 3 -auth /run/user/42/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty-verbose
  |   |-dbus-daemon,1477 --print-address 4 --session
  |   |   `-{dbus-daemon},1480
  |   |-gnome-session-b,1485 --autostart /usr/share/gdm/greeter/autostart
  |   |   |-gnome-settings-,1567
  |   |   |   |-{dconf worker},1595
  |   |   |   |-{gdbus},1590
  |   |   |   |-{gmain},1588
  |   |   |   `-{pool},1589
  |   |   |-gnome-shell,1607 --mode=gdm
  |   |   |   |-ibus-daemon,1728 --xim --panel disable
  |   |   |   |   |-ibus-dconf,1733
  |   |   |   |   |   |-{dconf worker},1739
  |   |   |   |   |   |-{gdbus},1737
  |   |   |   |   |   `-{gmain},1736
  |   |   |   |   |-ibus-engine-sim,1854
  |   |   |   |   |   |-{gdbus},1859
  |   |   |   |   |   `-{gmain},1858
  |   |   |   |   |-{gdbus},1731
  |   |   |   |   `-{gmain},1730
  |   |   |   |-{JS GC Helper},1685
  |   |   |   |-{JS Sour~ Thread},1686
  |   |   |   |-{dconf worker},1649
  |   |   |   |-{gdbus},1633
  |   |   |   |-{gmain},1625
  |   |   |   `-{threaded-ml},1684
  |   |   |-{dconf worker},1560
  |   |   |-{gdbus},1510
  |   |   `-{gmain},1509
  |   |-{gdbus},1483
  |   `-{gmain},1327
  |-{gdbus},1273
  `-{gmain},1272

We can see that the PID = 1567, has 4 children which can be seen by

$ ls -l /proc/1567/task/
total 0
dr-xr-xr-x. 7 gdm gdm 0 Jul 24 13:08 1567
dr-xr-xr-x. 7 gdm gdm 0 Jul 24 13:08 1588
dr-xr-xr-x. 7 gdm gdm 0 Jul 24 13:08 1589
dr-xr-xr-x. 7 gdm gdm 0 Jul 24 13:08 1590
dr-xr-xr-x. 7 gdm gdm 0 Jul 24 13:08 1595

​Top

This tool give us information about the system which is updated every 3 seconds, very useful to have an idea of which process are consuming most of the resources.

$ top

Options

  • 1
    • Hiting 1, top will display information about each CPU
  • i
    • Hiting i, top will display only active processes

Lab 23.1: Processes

1. Run ps with the options -ef. Then run it again with the options aux. Note the differences in the output.
2. Run ps so that only the process ID, priority, nice value, and the process command line are displayed.
3. Start a new bash session by typing bash at the command line. Start another bash session using the nice command but this time giving it a nice value of 10.
4. Run ps as in step 2 to note the differences in priority and nice values. Note the process ID of the two bash sessions.
5. Change the nice value of one of the bash sessions to 15 using renice. Once again, observe the change in priority and nice values.
6. Run top and watch the output as it changes. Hit q to stop the program.

Solution

1. Run ps with the options -ef. Then run it again with the options aux. Note the differences in the output.

$ ps -ef
$ ps aux

2. Run ps so that only the process ID, priority, nice value, and the process command line are displayed.

$ ps -o pid,pri,ni,cmd
PID PRI NI CMD
2389 19 0 bash
22079 19 0 ps -o pid,pri,ni,cmd
(Note: There should be no spaces between parameters.)

3. Start a new bash session by typing bash at the command line. Start another bash session using the nice command

$ bash
$ nice -n 10 bash
$ ps -o pid,pri,ni,cmd
2389 19 0 bash
22115 19 0 bash
22171 9 10 bash
22227 9 10 ps -o pid,pri,ni,cmd

4. Change the nice value of one of the bash sessions to 15 using renice. Once again, observe the change in priority
and nice values.

$ renice 15 -p 22227
$ ps -o pid,pri,ni,cmd
PID PRI NI CMD
2389 19 0 bash
22115 19 0 bash
22171 4 15 bash
22246 4 15 ps -o pid,pri,ni,cmd

5. Run top and watch the output as it changes. Hit q to stop the program.

$ top

Lab 23.2: Monitoring Process States

1. Use dd to start a background process which reads from /dev/urandom and writes to /dev/null.
2. Check the process state. What should it be?
3. Bring the process to the foreground using the fg command. Then hit Ctrl-Z. What does this do? Look at the process state again, what is it?
4. Run the jobs program. What does it tell you?
5. Bring the job back to the foreground, then terminate it using kill from another window.

Solution

1. Use dd to start a background process which reads from /dev/urandom and writes to /dev/null

$ dd if=/dev/urandom of=/dev/null &

2. Check the process state. What should it be?

$ ps -C dd -o pid,cmd,stat
25899 dd if=/dev/urandom of=/dev/ R
Should be S or R.

3. Bring the process to the foreground using the fg command. Then hit Ctrl-Z. What does this do? Look at the process state again, what is it?

$ fg
$ ^Z
$ ps -C dd -o pid,cmd,stat
PID CMD STAT
25899 dd if=/dev/urandom of=/dev/ T
State should be T.

4. Type the jobs command. What does it tell you?

$ jobs
[1]+ Stopped dd if=/dev/urandom of=/dev/null

5. Bring the job back to the foreground, then kill it using the kill command from another window.

$ fg
$ kill 25899