Introduction
The relationship among I/O and system performance is complex, in order to tackle the challenge of resource monitoring we can use iostat, iotop and ionice.
Objectives
- Use iostat to monitor system I/O device activity
- Use iotop to display a constantly updated table of current I/O usage
- Use ionice to set both the I/O scheduling class and the priority for a given process
iostat
Is the workhorse for I/O device activity monitoring on the system.
$ iostat Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/24/2016 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 3.52 0.01 0.59 0.07 0.00 95.81 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 4.22 104.74 18.39 621056 109057 sdb 0.05 0.67 0.00 3984 1 sdc 0.03 0.26 0.00 1546 2 sdd 0.02 0.16 0.00 967 2 scd0 0.00 0.01 0.00 64 0 dm-0 5.17 103.49 18.39 613625 109048 dm-1 0.02 0.13 0.00 760 0 loop0 0.01 0.18 0.00 1049 6 dm-2 0.01 0.19 0.00 1106 1 md0 0.01 0.11 0.00 665 1 loop1 0.01 0.19 0.00 1112 0 dm-3 0.01 0.17 0.00 1036 0
These data means
- tps
- Transactions per second
- blocks read and written per unit time
- Where blocks are generally sectors of 512 bytes
- total blocks read and written
Information is broken out by disk partition (and if LVM is being used also by dm (device mapper) logical partitions ).
Iostat Options
A different display is generated by the -k option, which shows results in KB instead of blocks
$ iostat -k
In order to display information by device name with the N option
$ iostat -N Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/24/2016 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.08 0.01 0.20 0.03 0.00 98.68 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.34 28.25 9.94 624720 219757 sdb 0.01 0.18 0.00 3984 1 sdc 0.01 0.07 0.00 1546 2 sdd 0.01 0.04 0.00 967 2 scd0 0.00 0.00 0.00 64 0 fedora-root 1.70 27.91 9.94 617289 219748 fedora-swap 0.00 0.03 0.00 760 0 loop0 0.00 0.05 0.00 1049 6 myvg-mylvm 0.00 0.05 0.00 1106 1 md0 0.00 0.03 0.00 665 1 loop1 0.00 0.05 0.00 1112 0 secret-disk1 0.00 0.05 0.00 1036 0
A much more detailed report can be obtained by using the -x option (for extended):
$ iostat -xk Linux 4.2.3-300.fc23.x86_64 (localhost.localdomain) 07/24/2016 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.11 0.01 0.20 0.03 0.00 98.66 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.07 0.33 0.85 0.48 27.65 9.88 56.61 0.00 1.85 1.64 2.21 0.79 0.10 sdb 0.00 0.00 0.01 0.00 0.18 0.00 25.38 0.00 0.86 0.86 1.00 0.57 0.00 sdc 0.00 0.00 0.01 0.00 0.07 0.00 15.80 0.00 0.76 0.75 1.33 0.65 0.00 sdd 0.00 0.00 0.01 0.00 0.04 0.00 14.15 0.00 0.61 0.59 1.33 0.53 0.00 scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.40 0.00 0.60 0.60 0.00 0.60 0.00 dm-0 0.00 0.00 0.91 0.78 27.32 9.88 44.13 0.00 2.01 2.18 1.81 0.61 0.10 dm-1 0.00 0.00 0.00 0.00 0.03 0.00 16.00 0.00 1.38 1.38 0.00 1.38 0.00 loop0 0.00 0.00 0.00 0.00 0.05 0.00 35.17 0.00 2.82 1.35 16.00 2.75 0.00 dm-2 0.00 0.00 0.00 0.00 0.05 0.00 35.71 0.00 0.87 0.87 1.00 0.50 0.00 md0 0.00 0.00 0.00 0.00 0.03 0.00 20.18 0.00 0.00 0.00 0.00 0.00 0.00 loop1 0.00 0.00 0.00 0.00 0.05 0.00 35.32 0.00 1.41 1.41 0.00 0.81 0.00 dm-3 0.00 0.00 0.00 0.00 0.05 0.00 48.19 0.00 1.88 1.88 0.00 1.07 0.00
Extended iostat fields
Field | Meaning |
---|---|
Device | Device or partition name |
rrqm/s | Number of read requests merged per second, queued to device |
wrqm/s | Number of write requests merged per second, queued to device |
r/s | Number of read request per second, issued to the device |
w/s | Number of write requests per second, issued to the device |
rkB/s | KB read from the device per second |
wkB/s | KB written to the device per second |
avgrq-sz | Average request size in 512 byte sectors per second |
avgqu-sz | Average queue length of requests issued to the device |
await | Average time (in msecs) I/O requests between when a request is issued and when it is completed: queue time plus service time |
svctm | Average service time (in msecs) for I/O requests |
%util | Percentage of CPU time during the device serviced requests |
Note that if the utilization percentage approaches 100 the system is saturated.
iotop
Another useful utility is iotop, which must be run as root. It displays a table of current I/O usage and updates periodically like top.
$ sudo iotop
Within the output the PRIO colum will display values like
- be
- Best Effort
- rt
- Real Time
Using ionice to set I/O priorities
The ionice utility lets you set both the I/O scheduling class and priority for a given process. It takes the form:
$ ionice [-c class] [-n priority] [-p pid] [COMMAND [ARGS] ]
If a pid is given with the -p argument results apply to the requested process, otherwise it is the process that will be started by COMMAND with possible arguments. If no arguments are given, ionice returns the scheduling class and priority of the current shell process, as in:
$ ionice
The -c parameter specifies the I/O scheduling class, which can have the following 3 values:
I/O Scheduling Class | -c value | Meaning |
---|---|---|
Idle | 1 | No access to disk I/O unless no other program has asked for it for a defined period. |
Best effort | 2 | All programs serviced in round-robin fashion, according to priority settings. The Default |
Real Time | 3 | Get first access to the disk, can starve other processes. The priority defines how big a time slice each process gets. |
Either Best effort or real time takes n as a parameter in order to set priority, which can range from 0 to 7 with 0 being the highest priority. An example
$ ionice -c 2 -n 3 -p 30078
Note: ionice works only when using the CFQ I/O Scheduler talked about in the next section
Lab 24.1: bonnie++
bonnie++ is a widely available benchmarking program that tests and measures the performance of drives and filesystems. It is descended from bonnie, an earlier implementation.
Results can be read from the terminal window or directed to a file, and also to a csv format (comma separated value). Companion programs, bon csv2html and bon csv2txt, can be used convert to html and plain text output formats. We recommend you read the man page for bonnie++ before using as it has quite a few options regarding which tests
to perform and how exhaustive and stressful they should be. A quick synopsis is obtained with:
$ bonnie++ -help bonnie++: invalid option -- ’h’ usage: bonnie++ [-d scratch-dir] [-c concurrency] [-s size(MiB)[:chunk-size(b)]] [-n number-to-stat[:max-size[:min-size][:num-directories[:chunk-size]]]] [-m machine-name] [-r ram-size-in-MiB] [-x number-of-tests] [-u uid-to-use:gid-to-use] [-g gid-to-use] [-q] [-f] [-b] [-p processes | -y] [-z seed | -Z random-file] [-D]
Version: 1.96
A quick test can be obtained with a command like:
$ time sudo bonnie++ -n 0 -u 0 -r 100 -f -b -d /mnt
where:
- -n 0 means don’t perform the file creation tests.
- -u 0 means run as root.
- -r 100 means pretend you have 100 MB of RAM.
- -f means skip per character I/O tests.
- -b means do a fsync after every write, which forces flushing to disk rather than just writing to cache.
- -d /mnt just specifies the directory to place the temporary file created; make sure it has enough space, in this case 300 MB, available.
If you don’t supply a figure for your memory size, the program will figure out how much the system has and will create
a testing file 2-3 times as large. We are not doing that here because it takes much longer to get a feel for things.
On an RHEL 7 system:
$ time sudo bonnie++ -n 0 -u 0 -r 100 -f -b -d /mnt Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start ’em...done...done...done...done...done... Version 1.96 ------Sequential Output------ --Sequential Input- --RandomConcurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP q7 300M 99769 14 106000 12 +++++ +++ 257.3 1 Latency 226us 237us 418us 624ms 1.96,1.96,q7,1,1415992158,300M,,,,99769,14,106000,12,,,+++++,+++,257.3,1,,,,,,,,,,,,,,,,,,,226us,237us,,418us,624ms,,,,,,
On an Ubuntu 14.04 system, running as a virtual machine under hypervisor on the same physical machine:
$ time sudo bonnie++ -n 0 -u 0 -r 100 -f -b -d /mnt Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start ’em...done...done...done...done...done... Version 1.97 ------Sequential Output------ --Sequential Input- --RandomConcurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ubuntu 300M 70000 61 43274 31 470061 96 2554 91 Latency 306ms 201ms 9276us 770ms 1.97,1.97,ubuntu,1,1415983257,300M,,,,70000,61,43274,31,,,470061,96,2554,91,,,,,,,,,,,,,,,,,,,306ms,201ms,,9276us,770ms,,,
You can clearly see the drop in performance.
Assuming you have saved the previous outputs as a file called bonnie++.out, you can convert the output to html:
$ bon_csv2html < bonnie++.out > bonnie++.html
or to plain text with:
$ bon_csv2txt < bonnie++.out > bonnie++.txt
After reading the documentation, try longer and larger, more ambitious tests. Try some of the tests we turned off. If
your system is behaving well, save the results for future benchmarking comparisons when the system is sick.
Lab 24.2: fs mark
The fs mark benchmark gives a low level bashing to file systems, using heavily asynchronous I/O across multiple directories and drives. It’s a rather old program written by Ric Wheeler that has stood the test of time. It can be downloaded from http://sourceforge.net/projects/fsmark/ Once you have obtained the tarball, you can unpack it and compile it with:
$ tar zxvf fs_mark-3.3.tgz $ cd fs_mark $ make
Read the README file as we are only going to touch the surface.
If the compile fails with an error like:
$ make .... /usr/bin/ld: cannot find -lc
it is because you haven’t installed the static version of glibc. You can do this on Red Hat-based systems by doing:
$ sudo yum install glibc-static
and on SUSE-related sytems with:
$ sudo zypper install glibc-devel-static
On Debian-based systems the relevant static library is installed along with the shared one so no additional package needs to be sought. For a test we are going to create 1000 files, each 10 KB in size, and after each write we’ll perform an fsync to flush out to disk. This can be done in the /tmp directory with the command:
$ fs_mark -d /tmp -n 1000 -s 10240
While this is running, gather extended iostat statistics with:
$ iostat -x -d /dev/sda 2 20
in another terminal window.
The numbers you should surely note are the number of files per second reported by fs mark and the percentage of CPU time utilized reported by iostat. If this is approaching 100 percent, you are I/O-bound. Depending on what kind of filesystem you are using you may be able to get improved results by changing the mount options. For example, for ext3 or ext4 you can try:
$ mount -o remount,barrier=1 /tmp
or for ext4 you can try:
$ mount -o remount,journal_async_commit /tmp
See how your results change. Note that these options may cause problems if you have a power failure, or other ungraceful system shutdown; i.e., there is likely to be a trade-off between stability and speed. Documentation about some of the mount options can be found with the kernel source under Documentation/filesystems and the man page for mount.