io_uring: the linux async I/O interface that changes everything

system

what is io_uring

io_uring is a Linux kernel interface introduced in kernel 5.1 by Jens Axboe. it is the most significant change to Linux I/O in two decades. before io_uring, async I/O in Linux was broken — aio only worked for O_DIRECT files, had poor error handling, and required copying data between kernel and user space on every operation.

io_uring fixes all of this with a fundamentally different design: a pair of shared ring buffers between kernel and user space that eliminate the need for system calls in the hot path.

the architecture

io_uring uses two ring buffers:

  • submission queue (SQ) — your application writes I/O requests here
  • completion queue (CQ) — the kernel writes results here when operations finish

both rings are mapped into user space with mmap. once set up, submitting and completing I/O requires zero system calls in the best case.

#include <liburing.h>
#include <stdio.h>
#include <fcntl.h>

int main() {
    struct io_uring ring;
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;
    char buf[4096];

    // initialize io_uring with a queue depth of 32
    io_uring_queue_init(32, &ring, 0);

    int fd = open("example.txt", O_RDONLY);

    // get a submission queue entry
    sqe = io_uring_get_sqe(&ring);

    // prepare a read operation
    io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
    sqe->user_data = 42; // tag for identifying completions

    // submit to the kernel
    io_uring_submit(&ring);

    // wait for completion
    io_uring_wait_cqe(&ring, &cqe);

    printf("read %d bytes, tag=%llu\n", cqe->res, cqe->user_data);

    io_uring_cqe_seen(&ring, cqe);
    io_uring_queue_exit(&ring);
    return 0;
}

fixed buffers and registered files

one of the most powerful features of io_uring is fixed buffers. instead of the kernel mapping and unmapping memory on every operation, you register buffers once:

struct iovec iov[4];
char bufs[4][4096];

for (int i = 0; i < 4; i++) {
    iov[i].iov_base = bufs[i];
    iov[i].iov_len  = 4096;
}

// register buffers with the kernel once
io_uring_register_buffers(&ring, iov, 4);

// now use prep_read_fixed instead of prep_read
sqe = io_uring_get_sqe(&ring);
io_uring_prep_read_fixed(sqe, fd, bufs[0], 4096, 0, 0); // last arg = buffer index

registered buffers can be 2-3x faster than regular reads for small files because the kernel skips the memory mapping overhead on every syscall.

sqpoll: zero syscall submission

with IORING_SETUP_SQPOLL, the kernel spawns a thread that polls the submission queue. your application writes to the SQ and the kernel picks it up without any io_uring_submit() call at all:

struct io_uring_params params = {
    .flags = IORING_SETUP_SQPOLL,
    .sq_thread_idle = 2000, // kernel thread sleeps after 2s of inactivity
};
io_uring_queue_init_params(32, &ring, &params);

this is what makes io_uring competitive with DPDK for network applications while still using the kernel network stack.

benchmarks

on a modern NVMe drive with 4KB random reads:

method IOPS CPU usage
read() syscall 180,000 85%
libaio 210,000 78%
io_uring (basic) 380,000 45%
io_uring (fixed bufs) 520,000 32%
io_uring (sqpoll) 610,000 18%

the throughput improvement is significant. the CPU reduction is transformational.

using io_uring from rust

the tokio-uring crate exposes io_uring through a familiar async interface:

use tokio_uring::fs::File;

#[tokio::main]
async fn main() -> std::io::Result<()> {
    tokio_uring::start(async {
        let file = File::open("example.txt").await?;
        let buf = vec![0u8; 4096];

        // ownership-based API: buf is moved in, returned on completion
        let (result, buf) = file.read_at(buf, 0).await;
        let n = result?;

        println!("read {} bytes: {:?}", n, &buf[..n]);
        Ok(())
    })
}

the ownership-based API is unusual but necessary — since the kernel holds a reference to the buffer during the operation, rust's borrow checker can't track it. moving ownership into the operation and returning it on completion is the correct solution.

when to use io_uring

  • database storage engines that need maximum I/O throughput
  • HTTP servers handling thousands of concurrent connections
  • any application doing heavy file or network I/O on Linux 5.1+

if you're building performance-critical I/O on Linux and not using io_uring, you are leaving significant throughput on the table.

Command Palette

Search for a command to run...