building a minimal webassembly runtime in rust
what is a wasm runtime
a WebAssembly runtime is a program that reads a .wasm binary, validates it, and executes the instructions inside it. wasmtime, wasmer, and the browser's V8 engine are all wasm runtimes.
building a minimal one teaches you how stack machines work, how binary formats are parsed, and how sandboxing is implemented.
wasm binary format
a .wasm file is a binary format composed of sections. each section has a type byte, a byte-length, and its payload:
magic: 00 61 73 6D (\0asm)
version: 01 00 00 00 (version 1)
section type (1 byte)
section size (LEB128 encoded)
section payload...
LEB128 is a variable-length integer encoding — small numbers take one byte, large numbers take more:
fn read_leb128(bytes: &[u8], pos: &mut usize) -> u32 {
let mut result = 0u32;
let mut shift = 0;
loop {
let byte = bytes[*pos];
*pos += 1;
result |= ((byte & 0x7F) as u32) << shift;
if byte & 0x80 == 0 { break; }
shift += 7;
}
result
}
the type section
the type section lists all function signatures. each signature is a list of parameter types and return types:
#[derive(Debug, Clone)]
struct FuncType {
params: Vec<ValType>,
results: Vec<ValType>,
}
#[derive(Debug, Clone, PartialEq)]
enum ValType {
I32,
I64,
F32,
F64,
}
fn parse_type_section(bytes: &[u8], pos: &mut usize) -> Vec<FuncType> {
let count = read_leb128(bytes, pos) as usize;
let mut types = Vec::with_capacity(count);
for _ in 0..count {
assert_eq!(bytes[*pos], 0x60); // function type indicator
*pos += 1;
let param_count = read_leb128(bytes, pos) as usize;
let params: Vec<ValType> = (0..param_count)
.map(|_| parse_valtype(bytes, pos))
.collect();
let result_count = read_leb128(bytes, pos) as usize;
let results: Vec<ValType> = (0..result_count)
.map(|_| parse_valtype(bytes, pos))
.collect();
types.push(FuncType { params, results });
}
types
}
the stack machine executor
wasm is a stack machine. every instruction either pushes values onto the stack or pops values and pushes a result:
#[derive(Debug, Clone)]
enum Value {
I32(i32),
I64(i64),
}
struct Executor {
stack: Vec<Value>,
locals: Vec<Value>,
memory: Vec<u8>,
}
impl Executor {
fn execute(&mut self, instructions: &[Instruction]) {
for instr in instructions {
match instr {
Instruction::I32Const(v) => {
self.stack.push(Value::I32(*v));
}
Instruction::I32Add => {
let b = self.pop_i32();
let a = self.pop_i32();
self.stack.push(Value::I32(a.wrapping_add(b)));
}
Instruction::LocalGet(idx) => {
self.stack.push(self.locals[*idx as usize].clone());
}
Instruction::LocalSet(idx) => {
let val = self.stack.pop().unwrap();
self.locals[*idx as usize] = val;
}
Instruction::Return => break,
// ... other instructions
}
}
}
fn pop_i32(&mut self) -> i32 {
match self.stack.pop().unwrap() {
Value::I32(v) => v,
_ => panic!("type mismatch"),
}
}
}
sandboxing
the security model of wasm is capability-based. a module can only access memory it has been given. it cannot call system functions unless the host explicitly imports them.
this is implemented by the runtime: the module's linear memory is a Vec<u8> that the host allocates. all memory accesses are bounds-checked against this vec.
impl Executor {
fn mem_load_i32(&self, addr: u32) -> i32 {
let addr = addr as usize;
assert!(addr + 4 <= self.memory.len(), "out of bounds memory access");
i32::from_le_bytes(self.memory[addr..addr+4].try_into().unwrap())
}
}
out-of-bounds access panics in the runtime rather than causing undefined behavior in the host process. this is the foundation of wasm's safety guarantee.