lock-free data structures: theory and practice in rust
why avoid locks
mutexes are expensive. acquiring a lock involves a syscall on contention, disables CPU optimizations, and creates a global serialization point. in high-throughput systems — network servers, game engines, databases — lock contention is a common bottleneck.
lock-free data structures use atomic CPU instructions instead of OS-level locking. they guarantee that at least one thread makes progress at any time, even if others are preempted.
the memory model
before lock-free programming, you need to understand memory ordering. modern CPUs reorder instructions for performance. the compiler does too. atomic operations with explicit ordering constraints prevent this:
use std::sync::atomic::{AtomicUsize, Ordering};
let counter = AtomicUsize::new(0);
// Relaxed: no ordering guarantees, just atomicity. fastest.
counter.fetch_add(1, Ordering::Relaxed);
// Release/Acquire: establishes a happens-before relationship.
// store with Release pairs with load with Acquire.
counter.store(1, Ordering::Release);
let val = counter.load(Ordering::Acquire);
// SeqCst: total sequential consistency across all threads. slowest.
counter.fetch_add(1, Ordering::SeqCst);
a lock-free stack
the simplest useful lock-free structure is a Treiber stack:
use std::sync::atomic::{AtomicPtr, Ordering};
use std::ptr;
struct Node<T> {
value: T,
next: *mut Node<T>,
}
pub struct Stack<T> {
head: AtomicPtr<Node<T>>,
}
impl<T> Stack<T> {
pub fn new() -> Self {
Stack { head: AtomicPtr::new(ptr::null_mut()) }
}
pub fn push(&self, value: T) {
let node = Box::into_raw(Box::new(Node {
value,
next: ptr::null_mut(),
}));
loop {
let head = self.head.load(Ordering::Relaxed);
unsafe { (*node).next = head; }
// compare_exchange: only update head if it still equals our snapshot
match self.head.compare_exchange_weak(
head, node,
Ordering::Release,
Ordering::Relaxed,
) {
Ok(_) => break,
Err(_) => continue, // another thread changed head, retry
}
}
}
pub fn pop(&self) -> Option<T> {
loop {
let head = self.head.load(Ordering::Acquire);
if head.is_null() { return None; }
let next = unsafe { (*head).next };
match self.head.compare_exchange_weak(
head, next,
Ordering::Release,
Ordering::Relaxed,
) {
Ok(_) => {
let value = unsafe { Box::from_raw(head).value };
return Some(value);
}
Err(_) => continue,
}
}
}
}
the ABA problem
the Treiber stack has a subtle bug: the ABA problem. thread A reads head as pointer X. thread B pops X, pushes Y, pushes X back. thread A's compare_exchange succeeds even though the list has changed.
the fix is a tagged pointer — pack a version counter into the unused bits of the pointer:
// on 64-bit systems, the top 16 bits of a pointer are unused
// pack the version counter there
struct TaggedPtr(u64);
impl TaggedPtr {
fn new(ptr: *mut u8, tag: u16) -> Self {
TaggedPtr((ptr as u64) | ((tag as u64) << 48))
}
fn ptr(&self) -> *mut u8 { (self.0 & 0x0000_FFFF_FFFF_FFFF) as *mut u8 }
fn tag(&self) -> u16 { (self.0 >> 48) as u16 }
}
when to use lock-free structures
lock-free is not always better. a mutex with low contention is simpler, safer, and often faster. use lock-free structures when:
- contention is genuinely high (many threads, frequent access)
- you need progress guarantees in a real-time context
- profiling shows mutex contention as the actual bottleneck
start with a Mutex. reach for lock-free only when measurements justify the complexity.