lock-free data structures: theory and practice in rust

system

why avoid locks

mutexes are expensive. acquiring a lock involves a syscall on contention, disables CPU optimizations, and creates a global serialization point. in high-throughput systems — network servers, game engines, databases — lock contention is a common bottleneck.

lock-free data structures use atomic CPU instructions instead of OS-level locking. they guarantee that at least one thread makes progress at any time, even if others are preempted.

the memory model

before lock-free programming, you need to understand memory ordering. modern CPUs reorder instructions for performance. the compiler does too. atomic operations with explicit ordering constraints prevent this:

use std::sync::atomic::{AtomicUsize, Ordering};

let counter = AtomicUsize::new(0);

// Relaxed: no ordering guarantees, just atomicity. fastest.
counter.fetch_add(1, Ordering::Relaxed);

// Release/Acquire: establishes a happens-before relationship.
// store with Release pairs with load with Acquire.
counter.store(1, Ordering::Release);
let val = counter.load(Ordering::Acquire);

// SeqCst: total sequential consistency across all threads. slowest.
counter.fetch_add(1, Ordering::SeqCst);

a lock-free stack

the simplest useful lock-free structure is a Treiber stack:

use std::sync::atomic::{AtomicPtr, Ordering};
use std::ptr;

struct Node<T> {
    value: T,
    next:  *mut Node<T>,
}

pub struct Stack<T> {
    head: AtomicPtr<Node<T>>,
}

impl<T> Stack<T> {
    pub fn new() -> Self {
        Stack { head: AtomicPtr::new(ptr::null_mut()) }
    }

    pub fn push(&self, value: T) {
        let node = Box::into_raw(Box::new(Node {
            value,
            next: ptr::null_mut(),
        }));

        loop {
            let head = self.head.load(Ordering::Relaxed);
            unsafe { (*node).next = head; }

            // compare_exchange: only update head if it still equals our snapshot
            match self.head.compare_exchange_weak(
                head, node,
                Ordering::Release,
                Ordering::Relaxed,
            ) {
                Ok(_) => break,
                Err(_) => continue, // another thread changed head, retry
            }
        }
    }

    pub fn pop(&self) -> Option<T> {
        loop {
            let head = self.head.load(Ordering::Acquire);
            if head.is_null() { return None; }

            let next = unsafe { (*head).next };

            match self.head.compare_exchange_weak(
                head, next,
                Ordering::Release,
                Ordering::Relaxed,
            ) {
                Ok(_) => {
                    let value = unsafe { Box::from_raw(head).value };
                    return Some(value);
                }
                Err(_) => continue,
            }
        }
    }
}

the ABA problem

the Treiber stack has a subtle bug: the ABA problem. thread A reads head as pointer X. thread B pops X, pushes Y, pushes X back. thread A's compare_exchange succeeds even though the list has changed.

the fix is a tagged pointer — pack a version counter into the unused bits of the pointer:

// on 64-bit systems, the top 16 bits of a pointer are unused
// pack the version counter there
struct TaggedPtr(u64);

impl TaggedPtr {
    fn new(ptr: *mut u8, tag: u16) -> Self {
        TaggedPtr((ptr as u64) | ((tag as u64) << 48))
    }
    fn ptr(&self) -> *mut u8 { (self.0 & 0x0000_FFFF_FFFF_FFFF) as *mut u8 }
    fn tag(&self) -> u16    { (self.0 >> 48) as u16 }
}

when to use lock-free structures

lock-free is not always better. a mutex with low contention is simpler, safer, and often faster. use lock-free structures when:

  • contention is genuinely high (many threads, frequent access)
  • you need progress guarantees in a real-time context
  • profiling shows mutex contention as the actual bottleneck

start with a Mutex. reach for lock-free only when measurements justify the complexity.

Command Palette

Search for a command to run...