io_uring: the linux async I/O interface that changes everything
what is io_uring
io_uring is a Linux kernel interface introduced in kernel 5.1 by Jens Axboe. it is the most significant change to Linux I/O in two decades. before io_uring, async I/O in Linux was broken — aio only worked for O_DIRECT files, had poor error handling, and required copying data between kernel and user space on every operation.
io_uring fixes all of this with a fundamentally different design: a pair of shared ring buffers between kernel and user space that eliminate the need for system calls in the hot path.
the architecture
io_uring uses two ring buffers:
- submission queue (SQ) — your application writes I/O requests here
- completion queue (CQ) — the kernel writes results here when operations finish
both rings are mapped into user space with mmap. once set up, submitting and completing I/O requires zero system calls in the best case.
#include <liburing.h>
#include <stdio.h>
#include <fcntl.h>
int main() {
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
char buf[4096];
// initialize io_uring with a queue depth of 32
io_uring_queue_init(32, &ring, 0);
int fd = open("example.txt", O_RDONLY);
// get a submission queue entry
sqe = io_uring_get_sqe(&ring);
// prepare a read operation
io_uring_prep_read(sqe, fd, buf, sizeof(buf), 0);
sqe->user_data = 42; // tag for identifying completions
// submit to the kernel
io_uring_submit(&ring);
// wait for completion
io_uring_wait_cqe(&ring, &cqe);
printf("read %d bytes, tag=%llu\n", cqe->res, cqe->user_data);
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
return 0;
}
fixed buffers and registered files
one of the most powerful features of io_uring is fixed buffers. instead of the kernel mapping and unmapping memory on every operation, you register buffers once:
struct iovec iov[4];
char bufs[4][4096];
for (int i = 0; i < 4; i++) {
iov[i].iov_base = bufs[i];
iov[i].iov_len = 4096;
}
// register buffers with the kernel once
io_uring_register_buffers(&ring, iov, 4);
// now use prep_read_fixed instead of prep_read
sqe = io_uring_get_sqe(&ring);
io_uring_prep_read_fixed(sqe, fd, bufs[0], 4096, 0, 0); // last arg = buffer index
registered buffers can be 2-3x faster than regular reads for small files because the kernel skips the memory mapping overhead on every syscall.
sqpoll: zero syscall submission
with IORING_SETUP_SQPOLL, the kernel spawns a thread that polls the submission queue. your application writes to the SQ and the kernel picks it up without any io_uring_submit() call at all:
struct io_uring_params params = {
.flags = IORING_SETUP_SQPOLL,
.sq_thread_idle = 2000, // kernel thread sleeps after 2s of inactivity
};
io_uring_queue_init_params(32, &ring, ¶ms);
this is what makes io_uring competitive with DPDK for network applications while still using the kernel network stack.
benchmarks
on a modern NVMe drive with 4KB random reads:
| method | IOPS | CPU usage |
|---|---|---|
read() syscall |
180,000 | 85% |
libaio |
210,000 | 78% |
| io_uring (basic) | 380,000 | 45% |
| io_uring (fixed bufs) | 520,000 | 32% |
| io_uring (sqpoll) | 610,000 | 18% |
the throughput improvement is significant. the CPU reduction is transformational.
using io_uring from rust
the tokio-uring crate exposes io_uring through a familiar async interface:
use tokio_uring::fs::File;
#[tokio::main]
async fn main() -> std::io::Result<()> {
tokio_uring::start(async {
let file = File::open("example.txt").await?;
let buf = vec![0u8; 4096];
// ownership-based API: buf is moved in, returned on completion
let (result, buf) = file.read_at(buf, 0).await;
let n = result?;
println!("read {} bytes: {:?}", n, &buf[..n]);
Ok(())
})
}
the ownership-based API is unusual but necessary — since the kernel holds a reference to the buffer during the operation, rust's borrow checker can't track it. moving ownership into the operation and returning it on completion is the correct solution.
when to use io_uring
- database storage engines that need maximum I/O throughput
- HTTP servers handling thousands of concurrent connections
- any application doing heavy file or network I/O on Linux 5.1+
if you're building performance-critical I/O on Linux and not using io_uring, you are leaving significant throughput on the table.