building a minimal webassembly runtime in rust

system

what is a wasm runtime

a WebAssembly runtime is a program that reads a .wasm binary, validates it, and executes the instructions inside it. wasmtime, wasmer, and the browser's V8 engine are all wasm runtimes.

building a minimal one teaches you how stack machines work, how binary formats are parsed, and how sandboxing is implemented.

wasm binary format

a .wasm file is a binary format composed of sections. each section has a type byte, a byte-length, and its payload:

magic:   00 61 73 6D  (\0asm)
version: 01 00 00 00  (version 1)

section type (1 byte)
section size (LEB128 encoded)
section payload...

LEB128 is a variable-length integer encoding — small numbers take one byte, large numbers take more:

fn read_leb128(bytes: &[u8], pos: &mut usize) -> u32 {
    let mut result = 0u32;
    let mut shift = 0;
    loop {
        let byte = bytes[*pos];
        *pos += 1;
        result |= ((byte & 0x7F) as u32) << shift;
        if byte & 0x80 == 0 { break; }
        shift += 7;
    }
    result
}

the type section

the type section lists all function signatures. each signature is a list of parameter types and return types:

#[derive(Debug, Clone)]
struct FuncType {
    params:  Vec<ValType>,
    results: Vec<ValType>,
}

#[derive(Debug, Clone, PartialEq)]
enum ValType {
    I32,
    I64,
    F32,
    F64,
}

fn parse_type_section(bytes: &[u8], pos: &mut usize) -> Vec<FuncType> {
    let count = read_leb128(bytes, pos) as usize;
    let mut types = Vec::with_capacity(count);

    for _ in 0..count {
        assert_eq!(bytes[*pos], 0x60); // function type indicator
        *pos += 1;

        let param_count  = read_leb128(bytes, pos) as usize;
        let params: Vec<ValType> = (0..param_count)
            .map(|_| parse_valtype(bytes, pos))
            .collect();

        let result_count = read_leb128(bytes, pos) as usize;
        let results: Vec<ValType> = (0..result_count)
            .map(|_| parse_valtype(bytes, pos))
            .collect();

        types.push(FuncType { params, results });
    }
    types
}

the stack machine executor

wasm is a stack machine. every instruction either pushes values onto the stack or pops values and pushes a result:

#[derive(Debug, Clone)]
enum Value {
    I32(i32),
    I64(i64),
}

struct Executor {
    stack:  Vec<Value>,
    locals: Vec<Value>,
    memory: Vec<u8>,
}

impl Executor {
    fn execute(&mut self, instructions: &[Instruction]) {
        for instr in instructions {
            match instr {
                Instruction::I32Const(v) => {
                    self.stack.push(Value::I32(*v));
                }
                Instruction::I32Add => {
                    let b = self.pop_i32();
                    let a = self.pop_i32();
                    self.stack.push(Value::I32(a.wrapping_add(b)));
                }
                Instruction::LocalGet(idx) => {
                    self.stack.push(self.locals[*idx as usize].clone());
                }
                Instruction::LocalSet(idx) => {
                    let val = self.stack.pop().unwrap();
                    self.locals[*idx as usize] = val;
                }
                Instruction::Return => break,
                // ... other instructions
            }
        }
    }

    fn pop_i32(&mut self) -> i32 {
        match self.stack.pop().unwrap() {
            Value::I32(v) => v,
            _ => panic!("type mismatch"),
        }
    }
}

sandboxing

the security model of wasm is capability-based. a module can only access memory it has been given. it cannot call system functions unless the host explicitly imports them.

this is implemented by the runtime: the module's linear memory is a Vec<u8> that the host allocates. all memory accesses are bounds-checked against this vec.

impl Executor {
    fn mem_load_i32(&self, addr: u32) -> i32 {
        let addr = addr as usize;
        assert!(addr + 4 <= self.memory.len(), "out of bounds memory access");
        i32::from_le_bytes(self.memory[addr..addr+4].try_into().unwrap())
    }
}

out-of-bounds access panics in the runtime rather than causing undefined behavior in the host process. this is the foundation of wasm's safety guarantee.

Command Palette

Search for a command to run...