Building an Interpreter
in rust of course
What is WebAssembly, Really?
Before we start writing code, we need to understand what WebAssembly actually is. Most explanations begin with phrases like "assembly-like language" or "virtual machine," but these descriptions miss the fundamental point. WebAssembly is better understood as a new abstraction boundary—a way of dividing computing systems into composable pieces.
Think about the boundaries you already know. The Linux ABI doesn't care whether your program was written in C, Rust, or even Brainfuck—it just handles system calls and schedules process time. HTTP doesn't care if your server is written in Go or your client is Firefox or curl. The C ABI allows libraries written in different languages to work together seamlessly.
WebAssembly occupies a unique point in this space. Unlike the Linux ABI, there's no fixed set of system calls. Unlike HTTP, WebAssembly modules run in the same process as their host, enabling synchronous function calls in both directions. Unlike the C ABI, WebAssembly modules have only the shared state they're explicitly given.
This abstraction boundary has profound implications. It means you can run untrusted code safely, compose modules from different programming languages, and target everything from web browsers to IoT devices to serverless platforms with the same binary format.
The Stack Machine Model
WebAssembly is a stack-based virtual machine. This means that instead of having named registers like x86 or ARM, operations work by pushing values onto an implicit stack and popping them off as needed.
Consider this simple addition in different models:
# Register-based (x86-like)
mov %eax, 5
mov %ebx, 3
add %eax, %ebx ; result in %eax
# Stack-based (WebAssembly)
i32.const 5 ; push 5
i32.const 3 ; push 3
i32.add ; pop two values, push sum
The stack model has several advantages for a portable bytecode:
No need to specify register allocation
Compact encoding (most instructions are single bytes)
Easy to validate and analyze statically
Maps well to expression-based languages
Module Structure
WebAssembly organizes code into modules. Each module is a self-contained unit with explicit imports and exports. Think of a module as similar to a shared library or a package—it encapsulates functionality and exposes a clean interface.
A module contains several sections:
Type section: Function signatures used in the module
Function section: Declarations of functions defined in the module
Memory section: Linear memory regions
Global section: Global variables
Export section: Names and types of exports
Code section: The actual function bodies
Data section: Initial data for memory regions
This might seem like a lot of complexity, but each section serves a specific purpose in making WebAssembly fast to load, validate, and execute.
Why These Design Decisions Matter
Every design decision in WebAssembly was made for specific reasons. The stack model enables compact encoding and eliminates register allocation complexity. The module system with explicit imports and exports enables secure sandboxing and efficient linking. The type system enables ahead-of-time validation and optimization.
As we build our interpreter, we'll see how these decisions affect the implementation at every level. The binary format reflects the conceptual model, which makes parsing straightforward. The instruction set maps cleanly to the stack model, which makes the interpreter loop elegant. The type system catches errors early, which makes the whole system more robust.
Now let's start building.
The Binary Format
WebAssembly modules are distributed in a compact binary format. While there's also a textual representation (WAT), the binary format is what actually gets executed. To build an interpreter, we need to understand this format inside and out.
Let's start with the simplest possible WebAssembly module and examine its binary representation:
0x42 => {
let value = decoder::read_i64(reader)?;
Ok(Instruction::I64Const(value))
},
0x43 => {
let mut bytes = [0u8; 4];
reader.read_exact(&mut bytes)?;
let value = f32::from_le_bytes(bytes);
Ok(Instruction::F32Const(value))
},
0x44 => {
let mut bytes = [0u8; 8];
reader.read_exact(&mut bytes)?;
let value = f64::from_le_bytes(bytes);
Ok(Instruction::F64Const(value))
},
// Memory operations
0x28 => {
let align = decoder::read_u32(reader)?;
let offset = decoder::read_u32(reader)?;
Ok(Instruction::I32Load(MemArg { align, offset }))
},
0x36 => {
let align = decoder::read_u32(reader)?;
let offset = decoder::read_u32(reader)?;
Ok(Instruction::I32Store(MemArg { align, offset }))
},
// Numeric operations - i32
0x6A => Ok(Instruction::I32Add),
0x6B => Ok(Instruction::I32Sub),
0x6C => Ok(Instruction::I32Mul),
0x6D => Ok(Instruction::I32DivS),
0x6E => Ok(Instruction::I32DivU),
0x6F => Ok(Instruction::I32RemS),
0x70 => Ok(Instruction::I32RemU),
0x71 => Ok(Instruction::I32And),
0x72 => Ok(Instruction::I32Or),
0x73 => Ok(Instruction::I32Xor),
0x74 => Ok(Instruction::I32Shl),
0x75 => Ok(Instruction::I32ShrS),
0x76 => Ok(Instruction::I32ShrU),
0x77 => Ok(Instruction::I32Rotl),
0x78 => Ok(Instruction::I32Rotr),
// Comparison operations
0x46 => Ok(Instruction::I32Eq),
0x47 => Ok(Instruction::I32Ne),
0x48 => Ok(Instruction::I32LtS),
0x49 => Ok(Instruction::I32LtU),
0x4A => Ok(Instruction::I32GtS),
0x4B => Ok(Instruction::I32GtU),
0x4C => Ok(Instruction::I32LeS),
0x4D => Ok(Instruction::I32LeU),
0x4E => Ok(Instruction::I32GeS),
0x4F => Ok(Instruction::I32GeU),
_ => Err(io::Error::new(
io::ErrorKind::InvalidData,
format!("Unknown opcode: 0x{:02X}", opcode)
)),
}
}
/// Parse a sequence of instructions until 'end'
pub fn parse_expression<R: Read>(reader: &mut R) -> io::Result<Vec<Self>> {
let mut instructions = Vec::new();
loop {
let instr = Self::parse(reader)?;
let is_end = matches!(instr, Instruction::End);
instructions.push(instr);
if is_end {
break;
}
}
Ok(instructions)
}
}
Understanding Stack-Based Execution
Before we implement the interpreter, let's understand how stack-based execution works. Consider this simple WebAssembly function:
(func (param i32 i32) (result i32)
local.get 0 ;; push param 0 onto stack
local.get 1 ;; push param 1 onto stack
i32.add ;; pop two values, push sum
)
The execution proceeds as follows:
local.get 0
: Stack becomes[param0]
local.get 1
: Stack becomes[param0, param1]
i32.add
: Stack becomes[param0 + param1]
The final value on the stack is the function's return value.
This model has several advantages:
No register allocation needed
Compact instruction encoding
Easy to validate statically
Natural fit for expression-based languages
Let's implement our stack machine:
// src/interpreter/mod.rs
use std::collections::HashMap;
use crate::types::{Value, ValueType};
use crate::instructions::Instruction;
use crate::binary::Module;
pub mod stack;
pub mod frame;
use stack::Stack;
use frame::{Frame, Label};
#[derive(Debug)]
pub struct WasmInterpreter {
stack: Stack<Value>,
call_stack: Vec<Frame>,
module: Option<Module>,
}
#[derive(Debug)]
pub enum ExecutionError {
StackUnderflow,
TypeMismatch,
InvalidLocal(u32),
InvalidFunction(u32),
DivisionByZero,
IntegerOverflow,
Unreachable,
}
impl WasmInterpreter {
pub fn new() -> Self {
Self {
stack: Stack::new(),
call_stack: Vec::new(),
module: None,
}
}
pub fn load_module(&mut self, module: Module) {
self.module = Some(module);
}
pub fn call_function(&mut self, func_name: &str, args: Vec<Value>)
-> Result<Option<Value>, ExecutionError> {
let module = self.module.as_ref()
.ok_or(ExecutionError::InvalidFunction(0))?;
// Find the function by export name
let func_idx = module.exports
.iter()
.find(|export| export.name == func_name)
.and_then(|export| match export.desc {
crate::binary::module::ExportDesc::Func(idx) => Some(idx),
_ => None,
})
.ok_or(ExecutionError::InvalidFunction(0))?;
self.call_function_by_index(func_idx, args)
}
pub fn call_function_by_index(&mut self, func_idx: u32, args: Vec<Value>)
-> Result<Option<Value>, ExecutionError> {
let module = self.module.as_ref()
.ok_or(ExecutionError::InvalidFunction(func_idx))?;
// Get function type
let type_idx = *module.functions.get(func_idx as usize)
.ok_or(ExecutionError::InvalidFunction(func_idx))?;
let func_type = module.types.get(type_idx as usize)
.ok_or(ExecutionError::InvalidFunction(func_idx))?;
// Validate arguments
if args.len() != func_type.params.len() {
return Err(ExecutionError::TypeMismatch);
}
for (arg, expected_type) in args.iter().zip(&func_type.params) {
if arg.value_type() != *expected_type {
return Err(ExecutionError::TypeMismatch);
}
}
// Get function code
let func_code = module.code.get(func_idx as usize)
.ok_or(ExecutionError::InvalidFunction(func_idx))?;
// Parse instructions from the function body
let mut cursor = std::io::Cursor::new(&func_code.body);
let instructions = Instruction::parse_expression(&mut cursor)
.map_err(|_| ExecutionError::InvalidFunction(func_idx))?;
// Set up function frame
let mut locals = args.clone();
// Add space for local variables
for (count, value_type) in &func_code.locals {
let default_value = match value_type {
ValueType::I32 => Value::I32(0),
ValueType::I64 => Value::I64(0),
ValueType::F32 => Value::F32(0.0),
ValueType::F64 => Value::F64(0.0),
};
for _ in 0..*count {
locals.push(default_value);
}
}
let frame = Frame::new(locals, &instructions);
self.call_stack.push(frame);
// Execute the function
let result = self.execute_function()?;
// Clean up
self.call_stack.pop();
Ok(result)
}
fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
while let Some(frame) = self.call_stack.last_mut() {
if frame.pc >= frame.instructions.len() {
break;
}
let instruction = frame.instructions[frame.pc].clone();
frame.pc += 1;
match instruction {
Instruction::Nop => {},
Instruction::LocalGet(idx) => {
let value = frame.locals.get(idx as usize)
.ok_or(ExecutionError::InvalidLocal(idx))?;
self.stack.push(*value);
},
Instruction::LocalSet(idx) => {
let value = self.stack.pop()
.ok_or(ExecutionError::StackUnderflow)?;
if idx as usize >= frame.locals.len() {
return Err(ExecutionError::InvalidLocal(idx));
}
frame.locals[idx as usize] = value;
},
Instruction::LocalTee(idx) => {
let value = self.stack.peek()
.ok_or(ExecutionError::StackUnderflow)?;
if idx as usize >= frame.locals.len() {
return Err(ExecutionError::InvalidLocal(idx));
}
frame.locals[idx as usize] = *value;
},
Instruction::I32Const(value) => {
self.stack.push(Value::I32(value));
},
Instruction::I64Const(value) => {
self.stack.push(Value::I64(value));
},
Instruction::F32Const(value) => {
self.stack.push(Value::F32(value));
},
Instruction::F64Const(value) => {
self.stack.push(Value::F64(value));
},
Instruction::I32Add => {
let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
match (a, b) {
(Value::I32(a), Value::I32(b)) => {
// Use wrapping arithmetic to match WebAssembly semantics
self.stack.push(Value::I32(a.wrapping_add(b)));
},
_ => return Err(ExecutionError::TypeMismatch),
}
},
Instruction::I32Sub => {
let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
match (a, b) {
(Value::I32(a), Value::I32(b)) => {
self.stack.push(Value::I32(a.wrapping_sub(b)));
},
_ => return Err(ExecutionError::TypeMismatch),
}
},
Instruction::I32Mul => {
let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
match (a, b) {
(Value::I32(a), Value::I32(b)) => {
self.stack.push(Value::I32(a.wrapping_mul(b)));
},
_ => return Err(ExecutionError::TypeMismatch),
}
},
Instruction::I32DivS => {
let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
match (a, b) {
(Value::I32(a), Value::I32(b)) => {
if b == 0 {
return Err(ExecutionError::DivisionByZero);
}
// Check for overflow: i32::MIN / -1
if a == i32::MIN && b == -1 {
return Err(ExecutionError::IntegerOverflow);
}
self.stack.push(Value::I32(a / b));
},
_ => return Err(ExecutionError::TypeMismatch),
}
},
Instruction::I32Eq => {
let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
match (a, b) {
(Value::I32(a), Value::I32(b)) => {
self.stack.push(Value::I32(if a == b { 1 } else { 0 }));
},
_ => return Err(ExecutionError::TypeMismatch),
}
},
Instruction::Drop => {
self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
},
Instruction::Return => {
// Return the top value (if any) and exit function
return Ok(self.stack.pop());
},
Instruction::End => {
// End of block/function - for now, treat like return
return Ok(self.stack.pop());
},
Instruction::Unreachable => {
return Err(ExecutionError::Unreachable);
},
_ => {
// TODO: Implement remaining instructions
unimplemented!("Instruction not yet implemented: {:?}", instruction);
}
}
}
// Function completed normally, return top of stack if present
Ok(self.stack.pop())
}
}
Let's implement the supporting structures:
// src/interpreter/stack.rs
use std::collections::VecDeque;
#[derive(Debug)]
pub struct Stack<T> {
items: VecDeque<T>,
}
impl<T> Stack<T> {
pub fn new() -> Self {
Self {
items: VecDeque::new(),
}
}
pub fn push(&mut self, item: T) {
self.items.push_back(item);
}
pub fn pop(&mut self) -> Option<T> {
self.items.pop_back()
}
pub fn peek(&self) -> Option<&T> {
self.items.back()
}
pub fn len(&self) -> usize {
self.items.len()
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
}
// src/interpreter/frame.rs
use crate::types::Value;
use crate::instructions::Instruction;
#[derive(Debug)]
pub struct Frame {
pub locals: Vec<Value>,
pub instructions: Vec<Instruction>,
pub pc: usize, // program counter
pub labels: Vec<Label>,
}
#[derive(Debug)]
pub struct Label {
pub continuation: usize, // where to jump for br
pub arity: u32, // how many values to keep on stack
}
impl Frame {
pub fn new(locals: Vec<Value>, instructions: &[Instruction]) -> Self {
Self {
locals,
instructions: instructions.to_vec(),
pc: 0,
labels: Vec::new(),
}
}
}
Now let's test our interpreter with the simple add function:
// src/lib.rs
#[cfg(test)]
mod tests {
use super::*;
use crate::types::Value;
#[test]
fn test_simple_add_function() {
// Create our add module programmatically
let module_bytes = vec![
// Magic number and version
0x00, 0x61, 0x73, 0x6D, 0x01, 0x00, 0x00, 0x00,
// Type section
0x01, 0x07, // section id, size
0x01, // 1 type
0x60, 0x02, 0x7F, 0x7F, 0x01, 0x7F, // (i32, i32) -> i32
// Function section
0x03, 0x02, // section id, size
0x01, 0x00, // 1 function, type 0
// Export section
0x07, 0x07, // section id, size
0x01, // 1 export
0x03, b'a', b'd', b'd', // name "add"
0x00, 0x00, // function 0
// Code section
0x0A, 0x09, // section id, size
0x01, // 1 function body
0x07, // body size
0x00, // 0 locals
0x20, 0x00, // local.get 0
0x20, 0x01, // local.get 1
0x6A, // i32.add
0x0B, // end
];
let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
let mut interpreter = WasmInterpreter::new();
interpreter.load_module(module);
let result = interpreter.call_function(
"add",
vec![Value::I32(5), Value::I32(3)]
).unwrap();
assert_eq!(result, Some(Value::I32(8)));
}
}
This test demonstrates the complete flow:
Parse a binary WebAssembly module
Load it into our interpreter
Call an exported function with arguments
Get back the computed result
Control Flow and Structured Programming
One of WebAssembly's most distinctive features is its structured control flow. Unlike traditional assembly languages with goto-style jumps, WebAssembly uses structured constructs like blocks, loops, and if-statements that can be statically validated.
Understanding WebAssembly's Control Flow Model
WebAssembly's control flow is based on the concept of structured control flow. Every control construct has a clear entry and exit point, and branches can only target specific labels in a stack-disciplined manner.
Consider this WebAssembly function that computes a factorial:
(func $factorial (param i32) (result i32)
(local i32) ;; result accumulator
i32.const 1 ;; initialize result to 1
local.set 1 ;; store in local 1
(block $exit ;; block for early exit
(loop $loop ;; main loop
local.get 0 ;; get n
i32.const 1 ;; compare with 1
i32.le_s ;; n <= 1?
br_if $exit ;; if so, exit
local.get 1 ;; get current result
local.get 0 ;; get n
i32.mul ;; multiply
local.set 1 ;; store back to result
local.get 0 ;; get n
i32.const 1 ;; subtract 1
i32.sub
local.set 0 ;; store back to n
br $loop ;; continue loop
)
)
local.get 1 ;; return result
)
This example shows several important aspects of WebAssembly's control flow:
Labeled blocks:
$exit
and$loop
are labels that can be targeted by branchesStructured nesting: Control constructs are properly nested
Stack discipline: Branches maintain stack consistency
Let's extend our interpreter to handle control flow:
// src/interpreter/mod.rs (additions)
impl WasmInterpreter {
fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
while let Some(frame) = self.call_stack.last_mut() {
if frame.pc >= frame.instructions.len() {
break;
}
let instruction = frame.instructions[frame.pc].clone();
frame.pc += 1;
match instruction {
// ... existing cases ...
Instruction::Block(block_type) => {
// Find the matching End instruction
let end_pc = self.find_matching_end(frame.pc - 1)?;
// Create a label for this block
let label = Label {
continuation: end_pc + 1,
arity: if matches!(block_type, ValueType::I32 | ValueType::I64 |
ValueType::F32 | ValueType::F64) { 1 } else { 0 },
};
frame.labels.push(label);
},
Instruction::Loop(block_type) => {
// For loops, the continuation points back to the loop start
let label = Label {
continuation: frame.pc - 1, // Point back to the loop
arity: 0, // Loops don't produce values on break
};
frame.labels.push(label);
},
Instruction::If(block_type) => {
let condition = self.stack.pop()
.ok_or(ExecutionError::StackUnderflow)?;
let condition_true = match condition {
Value::I32(val) => val != 0,
Value::I64(val) => val != 0,
_ => return Err(ExecutionError::TypeMismatch),
};
if condition_true {
// Execute the if branch - find else or end
let (else_pc, end_pc) = self.find_if_structure(frame.pc - 1)?;
let label = Label {
continuation: end_pc + 1,
arity: if matches!(block_type, ValueType::I32 | ValueType::I64 |
ValueType::F32 | ValueType::F64) { 1 } else { 0 },
};
frame.labels.push(label);
// Continue executing if branch
} else {
// Skip to else branch or end
let (else_pc, end_pc) = self.find_if_structure(frame.pc - 1)?;
if let Some(else_pc) = else_pc {
frame.pc = else_pc + 1; // Skip the else instruction
let label = Label {
continuation: end_pc + 1,
arity: if matches!(block_type, ValueType::I32 | ValueType::I64 |
ValueType::F32 | ValueType::F64) { 1 } else { 0 },
};
frame.labels.push(label);
} else {
frame.pc = end_pc + 1; // Skip to after end
}
}
},
Instruction::Else => {
// When executing else, we've finished the if branch
// Jump to the end of the if statement
let end_pc = self.find_matching_end_for_else(frame.pc - 1)?;
frame.pc = end_pc + 1;
},
Instruction::End => {
// Pop the current label
frame.labels.pop();
},
Instruction::Br(label_idx) => {
let target_label = frame.labels.len().saturating_sub(label_idx as usize + 1);
if target_label >= frame.labels.len() {
return Err(ExecutionError::InvalidFunction(0)); // Invalid label
}
let label = &frame.labels[target_label];
// Handle stack unwinding for the branch
self.unwind_stack_for_branch(label.arity);
// Jump to the label's continuation
frame.pc = label.continuation;
// Remove labels that we're breaking out of
frame.labels.truncate(target_label);
},
Instruction::BrIf(label_idx) => {
let condition = self.stack.pop()
.ok_or(ExecutionError::StackUnderflow)?;
let should_branch = match condition {
Value::I32(val) => val != 0,
Value::I64(val) => val != 0,
_ => return Err(ExecutionError::TypeMismatch),
};
if should_branch {
let target_label = frame.labels.len().saturating_sub(label_idx as usize + 1);
if target_label >= frame.labels.len() {
return Err(ExecutionError::InvalidFunction(0));
}
let label = &frame.labels[target_label];
self.unwind_stack_for_branch(label.arity);
frame.pc = label.continuation;
frame.labels.truncate(target_label);
}
},
// ... rest of existing cases ...
}
}
Ok(self.stack.pop())
}
fn find_matching_end(&self, start_pc: usize) -> Result<usize, ExecutionError> {
let frame = self.call_stack.last()
.ok_or(ExecutionError::InvalidFunction(0))?;
let mut depth = 0;
let mut pc = start_pc + 1;
while pc < frame.instructions.len() {
match &frame.instructions[pc] {
Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
depth += 1;
},
Instruction::End => {
if depth == 0 {
return Ok(pc);
}
depth -= 1;
},
_ => {},
}
pc += 1;
}
Err(ExecutionError::InvalidFunction(0)) // No matching end found
}
fn find_if_structure(&self, if_pc: usize) -> Result<(Option<usize>, usize), ExecutionError> {
let frame = self.call_stack.last()
.ok_or(ExecutionError::InvalidFunction(0))?;
let mut depth = 0;
let mut pc = if_pc + 1;
let mut else_pc = None;
while pc < frame.instructions.len() {
match &frame.instructions[pc] {
Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
depth += 1;
},
Instruction::Else => {
if depth == 0 && else_pc.is_none() {
else_pc = Some(pc);
}
},
Instruction::End => {
if depth == 0 {
return Ok((else_pc, pc));
}
depth -= 1;
},
_ => {},
}
pc += 1;
}
Err(ExecutionError::InvalidFunction(0))
}
fn find_matching_end_for_else(&self, else_pc: usize) -> Result<usize, ExecutionError> {
let frame = self.call_stack.last()
.ok_or(ExecutionError::InvalidFunction(0))?;
let mut depth = 0;
let mut pc = else_pc + 1;
while pc < frame.instructions.len() {
match &frame.instructions[pc] {
Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
depth += 1;
},
Instruction::End => {
if depth == 0 {
return Ok(pc);
}
depth -= 1;
},
_ => {},
}
pc += 1;
}
Err(ExecutionError::InvalidFunction(0))
}
fn unwind_stack_for_branch(&mut self, arity: u32) {
// Keep the top `arity` values, discard the rest since entering this block
let keep_count = arity as usize;
let current_size = self.stack.len();
if keep_count > current_size {
return; // Nothing to unwind
}
// This is a simplified version - in a full implementation,
// we'd need to track stack heights when entering blocks
let mut values_to_keep = Vec::new();
for _ in 0..keep_count {
if let Some(value) = self.stack.pop() {
values_to_keep.push(value);
}
}
// Push the kept values back in correct order
for value in values_to_keep.into_iter().rev() {
self.stack.push(value);
}
}
}
Testing Control Flow
Let's test our control flow implementation with a simple conditional:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_conditional_execution() {
// Test module with an if statement
// (func (param i32) (result i32)
// local.get 0
// if (result i32)
// i32.const 100
// else
// i32.const 200
// end
// )
let module_bytes = create_conditional_module();
let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
let mut interpreter = WasmInterpreter::new();
interpreter.load_module(module);
// Test true condition
let result = interpreter.call_function(
"test_if",
vec![Value::I32(1)]
).unwrap();
assert_eq!(result, Some(Value::I32(100)));
// Test false condition
let result = interpreter.call_function(
"test_if",
vec![Value::I32(0)]
).unwrap();
assert_eq!(result, Some(Value::I32(200)));
}
fn create_conditional_module() -> Vec<u8> {
// This would be a more complex binary with if/else instructions
// For brevity, we'll assume this exists
vec![] // Placeholder
}
}
The control flow implementation shows several
7 => module.parse_export_section(&mut section_reader)?,
10 => module.parse_code_section(&mut section_reader)?,
11 => module.parse_data_section(&mut section_reader)?, // Data section
_ => {
eprintln!("Skipping unknown section: {}", section_id);
}
}
}
Ok(module)
}
}
Testing Function Calls
Let's create a comprehensive test that demonstrates function calls and imports:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_function_calls() {
// Create a module with multiple functions that call each other
// func 0 (imported): print_i32(i32) -> void
// func 1: double(i32) -> i32 { local.get 0; i32.const 2; i32.mul }
// func 2: test() -> void { i32.const 21; call 1; call 0 }
let module_bytes = create_function_call_module();
let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
let mut interpreter = WasmInterpreter::new();
interpreter.load_module(module);
// This should print "42" by calling test() -> double(21) -> print_i32(42)
let result = interpreter.call_function("test", vec![]).unwrap();
assert_eq!(result, None); // void return
}
#[test]
fn test_recursive_function() {
// Test a recursive factorial function
let module_bytes = create_factorial_module();
let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
let mut interpreter = WasmInterpreter::new();
interpreter.load_module(module);
let result = interpreter.call_function(
"factorial",
vec![Value::I32(5)]
).unwrap();
assert_eq!(result, Some(Value::I32(120)));
}
fn create_factorial_module() -> Vec<u8> {
// This would contain a recursive factorial implementation
// (func $factorial (param i32) (result i32)
// local.get 0
// i32.const 1
// i32.le_s
// if (result i32)
// i32.const 1
// else
// local.get 0
// local.get 0
// i32.const 1
// i32.sub
// call $factorial
// i32.mul
// end
// )
vec![] // Placeholder
}
}
Validation and Type Safety
One of WebAssembly's key feature is its validation algorithm. Before execution, every WebAssembly module is validated to ensure it's well-formed and type-safe. This validation happens at load time and provides strong guarantees about the module's behavior.
The Validation Algorithm
WebAssembly validation is essentially a type checker that operates on the bytecode. It ensures:
All instructions have valid operands of the correct types
Stack operations are balanced (no underflow/overflow)
Control flow is well-structured
Memory and function accesses are within bounds
All paths through a function produce the correct result type
Let's implement a basic validator:
// src/validation/mod.rs
use crate::types::{ValueType, Value, FuncType};
use crate::instructions::Instruction;
use crate::binary::Module;
use std::collections::HashMap;
pub mod context;
pub mod stack;
use context::ValidationContext;
use stack::TypeStack;
#[derive(Debug)]
pub enum ValidationError {
TypeMismatch,
StackUnderflow,
StackOverflow,
InvalidLocal(u32),
InvalidFunction(u32),
InvalidGlobal(u32),
InvalidMemory,
InvalidBranch(u32),
UnreachableCode,
InvalidBlockType,
MissingEnd,
}
pub struct Validator {
context: ValidationContext,
}
impl Validator {
pub fn new() -> Self {
Self {
context: ValidationContext::new(),
}
}
pub fn validate_module(&mut self, module: &Module) -> Result<(), ValidationError> {
// Set up validation context with module information
self.context.set_types(&module.types);
self.context.set_functions(&module.functions);
self.context.set_imports(&module.imports);
// Validate each function
for (func_idx, func_code) in module.code.iter().enumerate() {
let type_idx = module.functions[func_idx];
let func_type = &module.types[type_idx as usize];
self.validate_function(func_code, func_type, func_idx as u32)?;
}
// Validate exports point to valid indices
for export in &module.exports {
match &export.desc {
crate::binary::module::ExportDesc::Func(idx) => {
if *idx >= (module.imports.len() + module.functions.len()) as u32 {
return Err(ValidationError::InvalidFunction(*idx));
}
},
_ => {}, // We'll add other export types later
}
}
// Validate data segments
for data_segment in &module.data {
self.validate_constant_expression(&data_segment.offset)?;
}
Ok(())
}
fn validate_function(&mut self, func_code: &crate::binary::module::FunctionCode,
func_type: &FuncType, func_idx: u32) -> Result<(), ValidationError> {
// Parse instructions
let mut cursor = std::io::Cursor::new(&func_code.body);
let instructions = Instruction::parse_expression(&mut cursor)
.map_err(|_| ValidationError::InvalidFunction(func_idx))?;
// Set up local types (parameters + locals)
let mut local_types = func_type.params.clone();
for (count, value_type) in &func_code.locals {
for _ in 0..*count {
local_types.push(*value_type);
}
}
self.context.set_locals(&local_types);
// Validate instruction sequence
let mut type_stack = TypeStack::new();
let mut control_stack = Vec::new();
// Function entry creates an implicit block
control_stack.push(ControlFrame {
opcode: ControlOpcode::Function,
start_types: Vec::new(),
end_types: func_type.results.clone(),
height: 0,
unreachable: false,
});
for instruction in &instructions {
self.validate_instruction(
instruction,
&mut type_stack,
&mut control_stack
)?;
}
// Check that we end with the right types on the stack
let expected_results = &func_type.results;
if type_stack.len() != expected_results.len() {
return Err(ValidationError::TypeMismatch);
}
for (expected, actual) in expected_results.iter().zip(type_stack.types()) {
if expected != actual {
return Err(ValidationError::TypeMismatch);
}
}
Ok(())
}
fn validate_instruction(&self, instruction: &Instruction,
type_stack: &mut TypeStack,
control_stack: &mut Vec<ControlFrame>) -> Result<(), ValidationError> {
match instruction {
Instruction::Nop => {},
Instruction::Unreachable => {
if let Some(frame) = control_stack.last_mut() {
frame.unreachable = true;
}
},
Instruction::I32Const(_) => {
type_stack.push(ValueType::I32)?;
},
Instruction::I64Const(_) => {
type_stack.push(ValueType::I64)?;
},
Instruction::F32Const(_) => {
type_stack.push(ValueType::F32)?;
},
Instruction::F64Const(_) => {
type_stack.push(ValueType::F64)?;
},
Instruction::LocalGet(local_idx) => {
let local_type = self.context.get_local(*local_idx)
.ok_or(ValidationError::InvalidLocal(*local_idx))?;
type_stack.push(local_type)?;
},
Instruction::LocalSet(local_idx) => {
let local_type = self.context.get_local(*local_idx)
.ok_or(ValidationError::InvalidLocal(*local_idx))?;
let stack_type = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != local_type {
return Err(ValidationError::TypeMismatch);
}
},
Instruction::LocalTee(local_idx) => {
let local_type = self.context.get_local(*local_idx)
.ok_or(ValidationError::InvalidLocal(*local_idx))?;
let stack_type = type_stack.peek()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != local_type {
return Err(ValidationError::TypeMismatch);
}
// Note: tee doesn't pop the value, it stays on stack
},
// Arithmetic operations
Instruction::I32Add | Instruction::I32Sub | Instruction::I32Mul |
Instruction::I32DivS | Instruction::I32DivU => {
let b = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
let a = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
if a != ValueType::I32 || b != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
type_stack.push(ValueType::I32)?;
},
// Comparison operations
Instruction::I32Eq | Instruction::I32Ne | Instruction::I32LtS |
Instruction::I32LtU | Instruction::I32GtS | Instruction::I32GtU |
Instruction::I32LeS | Instruction::I32LeU | Instruction::I32GeS |
Instruction::I32GeU => {
let b = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
let a = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
if a != ValueType::I32 || b != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
type_stack.push(ValueType::I32)?; // Comparison result is i32 (0 or 1)
},
Instruction::Drop => {
type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
},
Instruction::Select => {
let condition = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
let val2 = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
let val1 = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
if condition != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
if val1 != val2 {
return Err(ValidationError::TypeMismatch);
}
type_stack.push(val1)?;
},
Instruction::Call(func_idx) => {
let func_type = self.context.get_function_type(*func_idx)
.ok_or(ValidationError::InvalidFunction(*func_idx))?;
// Pop parameters in reverse order
for param_type in func_type.params.iter().rev() {
let stack_type = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != *param_type {
return Err(ValidationError::TypeMismatch);
}
}
// Push results
for result_type in &func_type.results {
type_stack.push(*result_type)?;
}
},
Instruction::Block(block_type) => {
let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
vec![ty]
} else {
vec![]
};
control_stack.push(ControlFrame {
opcode: ControlOpcode::Block,
start_types: Vec::new(),
end_types,
height: type_stack.len(),
unreachable: false,
});
},
Instruction::Loop(block_type) => {
let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
vec![ty]
} else {
vec![]
};
control_stack.push(ControlFrame {
opcode: ControlOpcode::Loop,
start_types: Vec::new(),
end_types,
height: type_stack.len(),
unreachable: false,
});
},
Instruction::If(block_type) => {
// Pop condition
let condition = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if condition != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
vec![ty]
} else {
vec![]
};
control_stack.push(ControlFrame {
opcode: ControlOpcode::If,
start_types: Vec::new(),
end_types,
height: type_stack.len(),
unreachable: false,
});
},
Instruction::Else => {
let frame = control_stack.last_mut()
.ok_or(ValidationError::InvalidBlockType)?;
if frame.opcode != ControlOpcode::If {
return Err(ValidationError::InvalidBlockType);
}
// Reset stack to frame height and set up for else branch
type_stack.truncate_to(frame.height);
frame.opcode = ControlOpcode::Else;
frame.unreachable = false;
},
Instruction::End => {
let frame = control_stack.pop()
.ok_or(ValidationError::MissingEnd)?;
// Check that we have the right types for this block's results
for expected_type in frame.end_types.iter().rev() {
let stack_type = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != *expected_type {
return Err(ValidationError::TypeMismatch);
}
}
// Push the block's results back onto the stack
for result_type in &frame.end_types {
type_stack.push(*result_type)?;
}
},
Instruction::Br(label_idx) => {
let target_frame_idx = control_stack.len().saturating_sub(*label_idx as usize + 1);
if target_frame_idx >= control_stack.len() {
return Err(ValidationError::InvalidBranch(*label_idx));
}
let target_frame = &control_stack[target_frame_idx];
let branch_types = if target_frame.opcode == ControlOpcode::Loop {
&target_frame.start_types
} else {
&target_frame.end_types
};
// Check that we have the right types for this branch
for expected_type in branch_types.iter().rev() {
let stack_type = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != *expected_type {
return Err(ValidationError::TypeMismatch);
}
}
// Mark current frame as unreachable
if let Some(frame) = control_stack.last_mut() {
frame.unreachable = true;
}
},
Instruction::BrIf(label_idx) => {
// Pop condition
let condition = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if condition != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
let target_frame_idx = control_stack.len().saturating_sub(*label_idx as usize + 1);
if target_frame_idx >= control_stack.len() {
return Err(ValidationError::InvalidBranch(*label_idx));
}
let target_frame = &control_stack[target_frame_idx];
let branch_types = if target_frame.opcode == ControlOpcode::Loop {
&target_frame.start_types
} else {
&target_frame.end_types
};
// For conditional branch, we need to ensure the stack has the branch types,
// but we don't consume them (they might be needed for the fall-through case)
for (i, expected_type) in branch_types.iter().rev().enumerate() {
let stack_type = type_stack.peek_at(i)
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != *expected_type {
return Err(ValidationError::TypeMismatch);
}
}
},
Instruction::Return => {
let func_frame = control_stack.first()
.ok_or(ValidationError::InvalidFunction(0))?;
// Check that we have the right types for function return
for expected_type in func_frame.end_types.iter().rev() {
let stack_type = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if stack_type != *expected_type {
return Err(ValidationError::TypeMismatch);
}
}
// Mark current frame as unreachable
if let Some(frame) = control_stack.last_mut() {
frame.unreachable = true;
}
},
// Memory operations
Instruction::I32Load(_) => {
if !self.context.has_memory() {
return Err(ValidationError::InvalidMemory);
}
let addr = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if addr != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
type_stack.push(ValueType::I32)?;
},
Instruction::I32Store(_) => {
if !self.context.has_memory() {
return Err(ValidationError::InvalidMemory);
}
let value = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
let addr = type_stack.pop()
.ok_or(ValidationError::StackUnderflow)?;
if addr != ValueType::I32 || value != ValueType::I32 {
return Err(ValidationError::TypeMismatch);
}
},
_ => {
// For instructions not yet implemented, we'll be conservative
return Err(ValidationError::TypeMismatch);
}
}
Ok(())
}
fn validate_constant_expression(&self, instructions: &[Instruction]) -> Result<(), ValidationError> {
let mut type_stack = TypeStack::new();
for instruction in instructions {
match instruction {
Instruction::I32Const(_) => type_stack.push(ValueType::I32)?,
Instruction::I64Const(_) => type_stack.push(ValueType::I64)?,
Instruction::F32Const(_) => type_stack.push(ValueType::F32)?,
Instruction::F64Const(_) => type_stack.push(ValueType::F64)?,
Instruction::GlobalGet(idx) => {
let global_type = self.context.get_global_type(*idx)
.ok_or(ValidationError::InvalidGlobal(*idx))?;
type_stack.push(global_type)?;
},
Instruction::End => break,
_ => return Err(ValidationError::TypeMismatch), // Only certain instructions allowed in constant expressions
}
}
if type_stack.len() != 1 {
return Err(ValidationError::TypeMismatch);
}
Ok(())
}
fn block_type_to_result_type(block_type: ValueType) -> Option<ValueType> {
// In WebAssembly 1.0, block types are either empty or single value types
// This is a simplified version
Some(block_type)
}
}
#[derive(Debug, Clone, Copy, PartialEq)]
enum ControlOpcode {
Function,
Block,
Loop,
If,
Else,
}
#[derive(Debug, Clone)]
struct ControlFrame {
opcode: ControlOpcode,
start_types: Vec<ValueType>, // Types at start of block
end_types: Vec<ValueType>, // Types at end of block
height: usize, // Stack height when entering this block
unreachable: bool, // Is this block unreachable?
}
Let's implement the supporting structures:
// src/validation/context.rs
use crate::types::{ValueType, FuncType};
use crate::binary::module::{Import, ImportDesc};
use std::collections::HashMap;
pub struct ValidationContext {
types: Vec<FuncType>,
function_types: Vec<u32>, // indices into types
locals: Vec<ValueType>,
has_memory: bool,
globals: HashMap<u32, ValueType>,
}
impl ValidationContext {
pub fn new() -> Self {
Self {
types: Vec::new(),
function_types: Vec::new(),
locals: Vec::new(),
has_memory: false,
globals: HashMap::new(),
}
}
pub fn set_types(&mut self, types: &[FuncType]) {
self.types = types.to_vec();
}
pub fn set_functions(&mut self, functions: &[u32]) {
self.function_types = functions.to_vec();
}
pub fn set_imports(&mut self, imports: &[Import]) {
let mut import_func_count = 0;
for import in imports {
match &import.desc {
ImportDesc::Func(type_idx) => {
self.function_types.insert(import_func_count, *type_idx);
import_func_count += 1;
},
ImportDesc::Memory(_) => {
self.has_memory = true;
},
_ => {}, // Handle other import types as needed
}
}
}
pub fn set_locals(&mut self, locals: &[ValueType]) {
self.locals = locals.to_vec();
}
pub fn get_local(&self, idx: u32) -> Option<ValueType> {
self.locals.get(idx as usize).copied()
}
pub fn get_function_type(&self, func_idx: u32) -> Option<&FuncType> {
let type_idx = self.function_types.get(func_idx as usize)?;
self.types.get(*type_idx as usize)
}
pub fn has_memory(&self) -> bool {
self.has_memory
}
pub fn get_global_type(&self, global_idx: u32) -> Option<ValueType> {
self.globals.get(&global_idx).copied()
}
}
// src/validation/stack.rs
use crate::types::ValueType;
use super::ValidationError;
pub struct TypeStack {
types: Vec<ValueType>,
}
impl TypeStack {
pub fn new() -> Self {
Self {
types: Vec::new(),
}
}
pub fn push(&mut self, ty: ValueType) -> Result<(), ValidationError> {
// WebAssembly has a stack size limit to prevent infinite loops during validation
if self.types.len() >= 1024 {
return Err(ValidationError::StackOverflow);
}
self.types.push(ty);
Ok(())
}
pub fn pop(&mut self) -> Option<ValueType> {
self.types.pop()
}
pub fn peek(&self) -> Option<ValueType> {
self.types.last().copied()
}
pub fn peek_at(&self, depth: usize) -> Option<ValueType> {
if depth < self.types.len() {
Some(self.types[self.types.len() - 1 - depth])
} else {
None
}
}
pub fn len(&self) -> usize {
self.types.len()
}
pub fn truncate_to(&mut self, height: usize) {
self.types.truncate(height);
}
pub fn types(&self) -> &[ValueType] {
&self.types
}
}
Testing Validation
Let's create tests that verify our validation works correctly:
#[cfg(test)]
mod tests {
use super::*;
use crate::types::{ValueType, FuncType};
use crate::binary::module::FunctionCode;
#[test]
fn test_valid_function() {
let mut validator = Validator::new();
// Create a simple valid function: (param i32 i32) (result i32) local.get 0 local.get 1 i32.add
let func_type = FuncType {
params: vec![ValueType::I32, ValueType::I32],
results: vec![ValueType::I32],
};
let instructions = vec![
Instruction::LocalGet(0),
Instruction::LocalGet(1),
Instruction::I32Add,
Instruction::End,
];
let func_code = create_function_code_for_instructions(&instructions);
validator.context.set_types(&[func_type.clone()]);
let result = validator.validate_function(&func_code, &func_type, 0);
assert!(result.is_ok());
}
#[test]
fn test_type_mismatch() {
let mut validator = Validator::new();
// Try to add i32 and f32 - should fail
let func_type = FuncType {
params: vec![ValueType::I32, ValueType::F32],
results: vec![ValueType::I32],
};
let instructions = vec![
Instruction::LocalGet(0),
Instruction::LocalGet(1),
Instruction::I32Add, // This should fail - can't add i32 and f32
Instruction::End,
];
let func_code = create_function_code_for_instructions(&instructions);
validator.context.set_types(&[func_type.clone()]);
let result = validator.validate_function(&func_code, &func_type, 0);
assert!(matches!(result, Err(ValidationError::TypeMismatch)));
}
#[test]
fn test_stack_underflow() {
let mut validator = Validator::new();
// Try to pop from empty stack
let func_type = FuncType {
params: vec![],
results: vec![ValueType::I32],
};
let instructions = vec![
Instruction::I32Add, // No operands on stack - should fail
Instruction::End,
];
let func_code = create_function_code_for_instructions(&instructions);
validator.context.set_types(&[func_type.clone()]);
let result = validator.validate_function(&func_code, &func_type, 0);
assert!(matches!(result, Err(ValidationError::StackUnderflow)));
}
fn create_function_code_for_instructions(instructions: &[Instruction]) -> FunctionCode {
// This is a helper to create function code for testing
// In practice, you'd encode the instructions to bytes
FunctionCode {
locals: vec![],
body: vec![], // This would contain the encoded instructions
}
}
}
Advanced Features and Extensions
WebAssembly continues to evolve beyond its initial MVP. Several important extensions have been proposed and implemented, including:
Multiple return values - Functions can return multiple values
Bulk memory operations - Efficient memory copying and initialization
Sign-extension operators - Additional integer operations
Reference types - Support for opaque host references
Garbage Collection - Managed objects and automatic memory management
Let's implement support for some of these features:
// src/instructions.rs (extensions)
impl Instruction {
pub fn parse<R: Read>(reader: &mut R) -> io::Result<Self> {
let mut opcode_byte = [0u8; 1];
reader.read_exact(&mut opcode_byte)?;
let opcode = opcode_byte[0];
match opcode {
// ... existing opcodes ...
// Bulk memory operations (prefix 0xFC)
0xFC => {
let sub_opcode = decoder::read_u32(reader)?;
match sub_opcode {
8 => {
// memory.init
let segment_idx = decoder::read_u32(reader)?;
let memory_idx = decoder::read_u32(reader)?;
Ok(Instruction::MemoryInit(segment_idx, memory_idx))
},
9 => {
// data.drop
let segment_idx = decoder::read_u32(reader)?;
Ok(Instruction::DataDrop(segment_idx))
},
10 => {
// memory.copy
let dst_mem = decoder::read_u32(reader)?;
let src_mem = decoder::read_u32(reader)?;
Ok(Instruction::MemoryCopy(dst_mem, src_mem))
},
11 => {
// memory.fill
let memory_idx = decoder::read_u32(reader)?;
Ok(Instruction::MemoryFill(memory_idx))
},
_ => Err(io::Error::new(
io::ErrorKind::InvalidData,
format!("Unknown bulk memory opcode: {}", sub_opcode)
)),
}
},
// Sign-extension operations
0xC
// Sign-extension operations
0xC0 => Ok(Instruction::I32Extend8S),
0xC1 => Ok(Instruction::I32Extend16S),
0xC2 => Ok(Instruction::I64Extend8S),
0xC3 => Ok(Instruction::I64Extend16S),
0xC4 => Ok(Instruction::I64Extend32S),
// Reference types
0xD0 => Ok(Instruction::RefNull),
0xD1 => Ok(Instruction::RefIsNull),
0xD2 => {
let func_idx = decoder::read_u32(reader)?;
Ok(Instruction::RefFunc(func_idx))
},
_ => Err(io::Error::new(
io::ErrorKind::InvalidData,
format!("Unknown opcode: 0x{:02X}", opcode)
)),
}
}
}
// Add new instruction variants
#[derive(Debug, Clone, PartialEq)]
pub enum Instruction {
// ... existing variants ...
// Bulk memory operations
MemoryInit(u32, u32), // segment_idx, memory_idx
DataDrop(u32), // segment_idx
MemoryCopy(u32, u32), // dst_memory, src_memory
MemoryFill(u32), // memory_idx
// Sign extension
I32Extend8S,
I32Extend16S,
I64Extend8S,
I64Extend16S,
I64Extend32S,
// Reference types
RefNull,
RefIsNull,
RefFunc(u32),
}
Multi-Value Support
One of the most significant extensions is multi-value support, allowing functions to return multiple values. This requires updates to our type system:
// src/types.rs (extensions)
#[derive(Debug, Clone, PartialEq)]
pub enum BlockType {
Empty,
Type(ValueType),
TypeIndex(u32), // For multi-value blocks
}
impl BlockType {
pub fn from_byte(byte: i8) -> Self {
match byte {
-64 => BlockType::Empty, // 0x40 as signed
-1 => BlockType::Type(ValueType::I32), // 0x7F
-2 => BlockType::Type(ValueType::I64), // 0x7E
-3 => BlockType::Type(ValueType::F32), // 0x7D
-4 => BlockType::Type(ValueType::F64), // 0x7C
_ if byte >= 0 => BlockType::TypeIndex(byte as u32),
_ => BlockType::Empty, // Invalid, default to empty
}
}
pub fn result_types(&self, types: &[FuncType]) -> Vec<ValueType> {
match self {
BlockType::Empty => vec![],
BlockType::Type(ty) => vec![*ty],
BlockType::TypeIndex(idx) => {
types.get(*idx as usize)
.map(|ft| ft.results.clone())
.unwrap_or_default()
}
}
}
}
Implementing Bulk Memory Operations
Bulk memory operations provide efficient ways to initialize and copy memory. Let's implement these in our interpreter:
// src/interpreter/memory.rs (additions)
impl Memory {
pub fn init_from_data(&mut self, dest: u32, src_offset: u32, len: u32, data: &[u8])
-> Result<(), MemoryError> {
let dest_addr = dest as usize;
let src_start = src_offset as usize;
let copy_len = len as usize;
if dest_addr + copy_len > self.data.len() {
return Err(MemoryError::OutOfBounds);
}
if src_start + copy_len > data.len() {
return Err(MemoryError::OutOfBounds);
}
self.data[dest_addr..dest_addr + copy_len]
.copy_from_slice(&data[src_start..src_start + copy_len]);
Ok(())
}
pub fn copy_within(&mut self, dest: u32, src: u32, len: u32) -> Result<(), MemoryError> {
let dest_addr = dest as usize;
let src_addr = src as usize;
let copy_len = len as usize;
if dest_addr + copy_len > self.data.len() || src_addr + copy_len > self.data.len() {
return Err(MemoryError::OutOfBounds);
}
// Use memmove semantics (handles overlapping regions correctly)
if dest_addr != src_addr {
if dest_addr < src_addr {
// Copy forward
for i in 0..copy_len {
self.data[dest_addr + i] = self.data[src_addr + i];
}
} else {
// Copy backward to handle overlap
for i in (0..copy_len).rev() {
self.data[dest_addr + i] = self.data[src_addr + i];
}
}
}
Ok(())
}
pub fn fill(&mut self, dest: u32, value: u8, len: u32) -> Result<(), MemoryError> {
let dest_addr = dest as usize;
let fill_len = len as usize;
if dest_addr + fill_len > self.data.len() {
return Err(MemoryError::OutOfBounds);
}
self.data[dest_addr..dest_addr + fill_len].fill(value);
Ok(())
}
}
Now let's implement the bulk memory instructions in our interpreter:
// src/interpreter/mod.rs (bulk memory additions)
impl WasmInterpreter {
fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
// ... existing code ...
match instruction {
// ... existing cases ...
Instruction::MemoryInit(segment_idx, _memory_idx) => {
let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let src_offset = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
if let (Value::I32(dest), Value::I32(src_offset), Value::I32(len)) = (dest, src_offset, len) {
let module = self.module.as_ref().ok_or(ExecutionError::MemoryNotDefined)?;
let data_segment = module.data.get(segment_idx as usize)
.ok_or(ExecutionError::InvalidDataSegment)?;
let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
memory.init_from_data(
dest as u32,
src_offset as u32,
len as u32,
&data_segment.data
).map_err(|_| ExecutionError::MemoryOutOfBounds)?;
} else {
return Err(ExecutionError::TypeMismatch);
}
},
Instruction::MemoryCopy(_dst_mem, _src_mem) => {
let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let src = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
if let (Value::I32(dest), Value::I32(src), Value::I32(len)) = (dest, src, len) {
let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
memory.copy_within(dest as u32, src as u32, len as u32)
.map_err(|_| ExecutionError::MemoryOutOfBounds)?;
} else {
return Err(ExecutionError::TypeMismatch);
}
},
Instruction::MemoryFill(_memory_idx) => {
let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
if let (Value::I32(dest), Value::I32(value), Value::I32(len)) = (dest, value, len) {
let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
memory.fill(dest as u32, value as u8, len as u32)
.map_err(|_| ExecutionError::MemoryOutOfBounds)?;
} else {
return Err(ExecutionError::TypeMismatch);
}
},
// Sign extension operations
Instruction::I32Extend8S => {
let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
if let Value::I32(val) = value {
// Sign-extend from 8 bits
let result = (val as i8) as i32;
self.stack.push(Value::I32(result));
} else {
return Err(ExecutionError::TypeMismatch);
}
},
Instruction::I32Extend16S => {
let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
if let Value::I32(val) = value {
// Sign-extend from 16 bits
let result = (val as i16) as i32;
self.stack.push(Value::I32(result));
} else {
return Err(ExecutionError::TypeMismatch);
}
},
// ... other sign extension operations ...
_ => {
return Err(ExecutionError::UnknownInstruction);
}
}
}
}
// Add new error variants
#[derive(Debug)]
pub enum ExecutionError {
// ... existing variants ...
InvalidDataSegment,
UnknownInstruction,
}
Last updated