Chapter 25. I/O Scheduling

Aug

07

2016

By abernal

I/O scheduling

Objectives

Explain the importance of I/O scheduling
Understand how Completely Fair Queue and Deadline algorithms work

I/O Scheduling

It is the interface between the generic block layer and the low-level physical device. Both Virtual Memory and Virtual File system layers submit I/O request to block devices, its the job of the I/O scheduling layer to prioritize and order these requests before they are given to the block devices.

Any I/O scheduling system has to satisfy certain conflicting requirements:

Hardware access time should be minimized
- Request most be organized according to physical location on the disk
Requests should be merged to the extent possible to get as big a contiguous region as possible, which also minimizes disk access time
Request should be satisfied with as low a latency as is feasible
Write operations can usually wait to migrate from caches to disk without stalling processes. Read operations however almost always require a process to wait for completion before proceeding further. Favoring reads over writes leads to better parallelism and system responsiveness
Process should share the I/O bandwith in a fair, or at least consciously prioritized fashion; even it means some overall performance slowdown of the I/O layer, process throughput should not suffer inordinately

I/O Scheduler Choices

Since these demands can be conflicting, different I/O schedulers may be appropiate for different workloads, a large data base server vs a desktop system. Furthermore, different hardware may mandate different strategy. In order to provide flexibility, the Linux kernel has an object oriented scheme, in which pointers to the various needed functions are suppplied in a data structure, the particular one of which can be selected at boot on the kernel command line as in:

linux... elevator=[cfq|deadline|noop]

At least one of the I/O scheduling algorithms must be compiled into the kernel. The current choises are :

Completely Fair Queueing (CFQ)
Deadline Scheduling
noop (a simple scheme)

Modern distributions choose either CFQ or Deadline.

I/O Scheduling and SSD Devices

The gradual introduction of SSD (Solid State Drive) devices which use flash memory to emulate hard disks has important implications for I/O scheduling.

Such devices do not require an elevator scheme and benefit from wear leveling to spread I/O over the devices which have limited write/erase cycles. One can examine

/sys/block/<device>/queue/rotational

To see whether or not the device is an SSD or not, as in:

$ cat /sys/block/sda/queue/rotational
1

Tunables and Switching The I/O Scheduler at Runtime

Each of the I/O schedulers exposes parameters which can be used to tune behavior at run time. The parameters are accessed through the pseudo-filesystem mounted at

/sys

In addition, it is possible to use different I/O schedulers for different devices. The choice can be made easily through the command line. For example :

$ cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
$ echo noop > /sys/block/sda/queue/scheduler
$ cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

The actual tunables vary according to the particular I/O scheduler, and can be found under:

/sys/block/<device>/queue/iosched

For a disk using CFQ:

$ ls -l /sys/block/sda/queue/iosched

CFQ (Completely Fair Queue Scheduler)

The CFQ method has the goal of equal spreading of I/O bandwith among all processes submitting requests.

Theoretically each process has its own I/O queue, which works together with a dispatch queue which receives the actual requests on the way to the device. In actual practice, the number of queues is fixed (at 64) and a hash process based on the process ID is used to select a queue when a request is submitted.

Dequeuing of request is done round robin style on all the queues, each one of which works in FIFO (First In First Out) order. Thus the work is spread out. To avoid excessive seeking operations, an entire round is selected, and the sorted into the dispatch queue before actual I/O requests are issued to the device.

CFQ Tunables

In the example below the parameter HZ is a kernel-configured quantity, that corresponds to the number of jiffies per second, which the kernel uses as a coarse measure of time.

quantum
- Maximum queue lenght is one round of service (Default = 4)
queued
- Minimum request allocation per queue (Default = 8)
fifo_expire_sync
- FIFO timeout for sync request (Default = HZ/2)
fifo_expire_async
- FIFO timeout for async request (Default = 5 * HZ)
fifo_batch_expire
- Rate at which the FIFO´s expire (Default = HZ/8)
back_seek_max
- Maximum backwards seek, in KB (Default = 16K)
back_seek_penalty
- Penalty for a backwards see (Default = 2)

Deadline Scheduler

The Deadline I/O scheduler aggressively reorders requests with the simultaneous goals of improving overall performance and preventing large latencies for individuals requests limiting starvation.

With each and every request the kernel associates a deadline. Read request get higher priority than write requests.

Five separete I/O queues are maintained:

Two sorted lists are maintained, one for reading and one for writing, and arranged by starting block
Two FIFO lists are maintained, again one for reading and one for writing. These lists are sorted by submission time
A fifth queue contains the requests that are to be shoveled to the device driver itself. This is called the dispatch queue.

Exactly how the requests are peeled off the first four queues and placed on the fifth (dispatch queue) is where the art of the algorithm is

Deadline Tunables

read_expire
- How long (in milliseconds) a read request is guaranteed to occur within (Default = HZ/2 = 500)
write_expire
- How long (in milliseconds) a write request is guaranteed to occur within (Default = 5*HZ = 5000)
writes_starved
- How many requests we should give preference to reads over writes (Default = 2)
fifo_batch
- How many requests should be moved from the sorted scheduler list to the dispatch queue, when the deadlines have expired (Default = 16)
front_merges
- Back merges are more common than front merges as a contiguous request usually continues to the next block. Setting this parameter to 0 disables front merges and can give a boost if you know they are unlikely to be needed (Default = 1)

Script

Laboratory

linux

Share this post

Testimonial

I had the pleasure of working with Alejandro some time in the same team. Everyone on the team were astonished at his proactivity, initiative and constant interest in solving problems of all kinds. His technical skills are amazing and varied fueled by the enthusiasm he puts into his day-to-day work. Any team that can count with him, will appreciate his knowledges quickly. He is a great person, charismatic, with great values, and striking good mood that makes every day working along with him is really a pleasure. I strongly recommend his work

Matias Martin

CTO - Genwords

I had an opportunity to work with Alejandro in a couple of projects. He has excellent technical skills, there is nothing he cannot do or resolve, and he is always up to date and looking for best technologies for the solution.

Raúl Alborodo

Software Engineer at HP Enterprise Services

Search form

Chapter 25. I/O Scheduling

I/O scheduling

Objectives

I/O Scheduling

I/O Scheduler Choices

I/O Scheduling and SSD Devices

Tunables and Switching The I/O Scheduler at Runtime

CFQ (Completely Fair Queue Scheduler)

CFQ Tunables

Deadline Scheduler

Deadline Tunables

About

Popular Tags

Footer menu

Search form

Chapter 25. I/O Scheduling

I/O scheduling

Objectives

I/O Scheduling

I/O Scheduler Choices

I/O Scheduling and SSD Devices

Tunables and Switching The I/O Scheduler at Runtime

​CFQ (Completely Fair Queue Scheduler)

CFQ Tunables

Deadline Scheduler

Deadline Tunables

CFQ (Completely Fair Queue Scheduler)