Linux I/O Schedulers

A Schedule to Keep


The anticipatory I/O scheduler [3] was the default scheduler a long time ago (in kernel years). As the name implies, it anticipates subsequent block requests and implements request merging, a one-way elevator (a simple elevator), and read and write request batching. After the scheduler services an I/O request, it anticipates that the next request will be for the subsequent block. If the request comes, the disk head is in the correct location, and the request is serviced very quickly. This approach does add a little latency to the system because it pauses slightly to see if the next request is for the subsequent block. However, this latency is possibly outweighed by the increased performance for neighboring requests.

Putting on your storage expert hat, you can see that the anticipatory scheduler works really well for certain workloads. For example, one study [4] observed that the Apache web server could achieve up to 71 percent more throughput using the anticipatory I/O scheduler. On the other hand, the anticipatory scheduler has been observed to result in a slowdown on a database run.


The deadline I/O scheduler [5] was written by well-known kernel developer Jens Axboe. The fundamental principle is to guarantee a start time for servicing an I/O request by combining request merging, a one-way elevator, and a deadline on all requests (hence the name). It maintains two deadline queues in addition to the sorted queues for reads and writes. The deadline queues are sorted by their deadline times (time to expiration), with shorter times moving to the head of the queue. The queues are sorted according to their sector number (the elevator approach).

By moving the I/O requests that have been in the queues the longest (the same as having the shortest deadline time), they will be executed before others, which guarantees that I/O requests won't "starve" for various reasons, resulting in a very long time to execute the request.

A deadline scheduler really helps with distant reads (i.e., fairly far out on the disk or with a large sector number). Read I/O requests sometimes block applications because they have to be executed while the application waits. On the other hand, because writes are cached, execution can quickly return to the application – unless you have turned off the cache in the interest of making sure the data reaches the disk in the event of a power loss, in which case, writes would behave like read requests. Even worse, distant reads would be serviced very slowly because they are constantly moved to the back of the queue as requests for closer parts of the disk are serviced first. However, a deadline I/O scheduler makes sure that all I/O requests are serviced, even the distant read requests.

Diving into the scheduler a bit more, the concepts are surprisingly straightforward. The scheduler decides on the next request by first deciding which queue to use. It gives a higher priority to reads because, as mentioned, applications usually block on read requests. Next, it checks the first request to see if it has expired. If so, it is executed immediately; otherwise, the scheduler serves a batch of requests from the sorted queue.

The deadline scheduler is very useful for some applications. In particular, real-time systems use the deadline scheduler, because, in most cases, it keeps latency low (all requests are serviced within a short time frame). It has also been suggested that it works well for database systems that have TCQ-aware [6] disks.


The completely fair queue (CFQ) I/O scheduler [7] is the current default scheduler in the Linux kernel. It uses both request merging and elevators and is a bit more complex than the NOOP or deadline schedulers. CFQ synchronously puts requests from processes into a number of per-process queues and then allocates time slices for each of the queues to access the disk. The details of the length of the time slice and the number of requests a queue is allowed to submit are all dependent on the I/O priority of the given process. Asynchronous requests for all processes are batched together into fewer queues with one per priority.

Jens Axboe is the original author of the CFQ scheduler, incorporatingelevator_linus, which adds features to prevent starvation for worst case situations, as could happen with distant reads. An article by Axboe [8] has a good discussion on the design of the CFQ I/O scheduler (and others) and the intricacies of scheduler design.

CFQ gives all users (processes) of a particular device (storage) about the same number of I/O requests over a particular time interval, which can help multiuser systems see that all users get about the same level of responsiveness. Moreover, CFQ achieves some of the good throughput characteristics of the anticipatory scheduler because it allows a process queue to have some idle time at the end of a synchronous I/O request, creating some anticipatory time for I/O that might be close to the request just serviced.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Linux I/O Schedulers

    The Linux kernel has several I/O schedulers that can greatly influence performance. We take a quick look at I/O scheduler concepts and the options that exist within Linux.

  • Linux Storage Stack
    Abstraction layers are the alpha and omega in the design of complex architectures. The Linux Storage Stack is an excellent example of well-coordinated layers. Access to storage media is abstracted through a unified interface, without sacrificing functionality.
  • Defining measures
    IOPS is mentioned quite often when benchmarking and testing storage systems, but what does it really mean? We discuss and explain what an IOPS is and how to measure it.
  • What is an IOPS Really?

    IOPS is mentioned quite often when benchmarking and testing storage systems, but what does it really mean? We discuss and explain what an IOPS is, and how to measure it.

  • Optimizing utilization with the EDF scheduler
    The superior "Earliest Deadline First" task scheduling method has been part of Linux since kernel 3.14.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.