Building an Interpreter

in rust of course

What is WebAssembly, Really?

Before we start writing code, we need to understand what WebAssembly actually is. Most explanations begin with phrases like "assembly-like language" or "virtual machine," but these descriptions miss the fundamental point. WebAssembly is better understood as a new abstraction boundary—a way of dividing computing systems into composable pieces.

Think about the boundaries you already know. The Linux ABI doesn't care whether your program was written in C, Rust, or even Brainfuck—it just handles system calls and schedules process time. HTTP doesn't care if your server is written in Go or your client is Firefox or curl. The C ABI allows libraries written in different languages to work together seamlessly.

WebAssembly occupies a unique point in this space. Unlike the Linux ABI, there's no fixed set of system calls. Unlike HTTP, WebAssembly modules run in the same process as their host, enabling synchronous function calls in both directions. Unlike the C ABI, WebAssembly modules have only the shared state they're explicitly given.

This abstraction boundary has profound implications. It means you can run untrusted code safely, compose modules from different programming languages, and target everything from web browsers to IoT devices to serverless platforms with the same binary format.

The Stack Machine Model

WebAssembly is a stack-based virtual machine. This means that instead of having named registers like x86 or ARM, operations work by pushing values onto an implicit stack and popping them off as needed.

Consider this simple addition in different models:

# Register-based (x86-like)
mov %eax, 5
mov %ebx, 3  
add %eax, %ebx    ; result in %eax

# Stack-based (WebAssembly)
i32.const 5       ; push 5
i32.const 3       ; push 3
i32.add           ; pop two values, push sum

The stack model has several advantages for a portable bytecode:

  • No need to specify register allocation

  • Compact encoding (most instructions are single bytes)

  • Easy to validate and analyze statically

  • Maps well to expression-based languages

Module Structure

WebAssembly organizes code into modules. Each module is a self-contained unit with explicit imports and exports. Think of a module as similar to a shared library or a package—it encapsulates functionality and exposes a clean interface.

A module contains several sections:

  • Type section: Function signatures used in the module

  • Function section: Declarations of functions defined in the module

  • Memory section: Linear memory regions

  • Global section: Global variables

  • Export section: Names and types of exports

  • Code section: The actual function bodies

  • Data section: Initial data for memory regions

This might seem like a lot of complexity, but each section serves a specific purpose in making WebAssembly fast to load, validate, and execute.

Why These Design Decisions Matter

Every design decision in WebAssembly was made for specific reasons. The stack model enables compact encoding and eliminates register allocation complexity. The module system with explicit imports and exports enables secure sandboxing and efficient linking. The type system enables ahead-of-time validation and optimization.

As we build our interpreter, we'll see how these decisions affect the implementation at every level. The binary format reflects the conceptual model, which makes parsing straightforward. The instruction set maps cleanly to the stack model, which makes the interpreter loop elegant. The type system catches errors early, which makes the whole system more robust.

Now let's start building.

The Binary Format

WebAssembly modules are distributed in a compact binary format. While there's also a textual representation (WAT), the binary format is what actually gets executed. To build an interpreter, we need to understand this format inside and out.

Let's start with the simplest possible WebAssembly module and examine its binary representation:

            0x42 => {
                let value = decoder::read_i64(reader)?;
                Ok(Instruction::I64Const(value))
            },
            0x43 => {
                let mut bytes = [0u8; 4];
                reader.read_exact(&mut bytes)?;
                let value = f32::from_le_bytes(bytes);
                Ok(Instruction::F32Const(value))
            },
            0x44 => {
                let mut bytes = [0u8; 8];
                reader.read_exact(&mut bytes)?;
                let value = f64::from_le_bytes(bytes);
                Ok(Instruction::F64Const(value))
            },

            // Memory operations
            0x28 => {
                let align = decoder::read_u32(reader)?;
                let offset = decoder::read_u32(reader)?;
                Ok(Instruction::I32Load(MemArg { align, offset }))
            },
            0x36 => {
                let align = decoder::read_u32(reader)?;
                let offset = decoder::read_u32(reader)?;
                Ok(Instruction::I32Store(MemArg { align, offset }))
            },

            // Numeric operations - i32
            0x6A => Ok(Instruction::I32Add),
            0x6B => Ok(Instruction::I32Sub),
            0x6C => Ok(Instruction::I32Mul),
            0x6D => Ok(Instruction::I32DivS),
            0x6E => Ok(Instruction::I32DivU),
            0x6F => Ok(Instruction::I32RemS),
            0x70 => Ok(Instruction::I32RemU),
            0x71 => Ok(Instruction::I32And),
            0x72 => Ok(Instruction::I32Or),
            0x73 => Ok(Instruction::I32Xor),
            0x74 => Ok(Instruction::I32Shl),
            0x75 => Ok(Instruction::I32ShrS),
            0x76 => Ok(Instruction::I32ShrU),
            0x77 => Ok(Instruction::I32Rotl),
            0x78 => Ok(Instruction::I32Rotr),

            // Comparison operations
            0x46 => Ok(Instruction::I32Eq),
            0x47 => Ok(Instruction::I32Ne),
            0x48 => Ok(Instruction::I32LtS),
            0x49 => Ok(Instruction::I32LtU),
            0x4A => Ok(Instruction::I32GtS),
            0x4B => Ok(Instruction::I32GtU),
            0x4C => Ok(Instruction::I32LeS),
            0x4D => Ok(Instruction::I32LeU),
            0x4E => Ok(Instruction::I32GeS),
            0x4F => Ok(Instruction::I32GeU),

            _ => Err(io::Error::new(
                io::ErrorKind::InvalidData,
                format!("Unknown opcode: 0x{:02X}", opcode)
            )),
        }
    }

    /// Parse a sequence of instructions until 'end'
    pub fn parse_expression<R: Read>(reader: &mut R) -> io::Result<Vec<Self>> {
        let mut instructions = Vec::new();
        
        loop {
            let instr = Self::parse(reader)?;
            let is_end = matches!(instr, Instruction::End);
            instructions.push(instr);
            
            if is_end {
                break;
            }
        }
        
        Ok(instructions)
    }
}

Understanding Stack-Based Execution

Before we implement the interpreter, let's understand how stack-based execution works. Consider this simple WebAssembly function:

(func (param i32 i32) (result i32)
  local.get 0    ;; push param 0 onto stack
  local.get 1    ;; push param 1 onto stack  
  i32.add        ;; pop two values, push sum
)

The execution proceeds as follows:

  1. local.get 0: Stack becomes [param0]

  2. local.get 1: Stack becomes [param0, param1]

  3. i32.add: Stack becomes [param0 + param1]

The final value on the stack is the function's return value.

This model has several advantages:

  • No register allocation needed

  • Compact instruction encoding

  • Easy to validate statically

  • Natural fit for expression-based languages

Let's implement our stack machine:

// src/interpreter/mod.rs
use std::collections::HashMap;
use crate::types::{Value, ValueType};
use crate::instructions::Instruction;
use crate::binary::Module;

pub mod stack;
pub mod frame;

use stack::Stack;
use frame::{Frame, Label};

#[derive(Debug)]
pub struct WasmInterpreter {
    stack: Stack<Value>,
    call_stack: Vec<Frame>,
    module: Option<Module>,
}

#[derive(Debug)]
pub enum ExecutionError {
    StackUnderflow,
    TypeMismatch,
    InvalidLocal(u32),
    InvalidFunction(u32),
    DivisionByZero,
    IntegerOverflow,
    Unreachable,
}

impl WasmInterpreter {
    pub fn new() -> Self {
        Self {
            stack: Stack::new(),
            call_stack: Vec::new(),
            module: None,
        }
    }

    pub fn load_module(&mut self, module: Module) {
        self.module = Some(module);
    }

    pub fn call_function(&mut self, func_name: &str, args: Vec<Value>) 
        -> Result<Option<Value>, ExecutionError> {
        
        let module = self.module.as_ref()
            .ok_or(ExecutionError::InvalidFunction(0))?;

        // Find the function by export name
        let func_idx = module.exports
            .iter()
            .find(|export| export.name == func_name)
            .and_then(|export| match export.desc {
                crate::binary::module::ExportDesc::Func(idx) => Some(idx),
                _ => None,
            })
            .ok_or(ExecutionError::InvalidFunction(0))?;

        self.call_function_by_index(func_idx, args)
    }

    pub fn call_function_by_index(&mut self, func_idx: u32, args: Vec<Value>) 
        -> Result<Option<Value>, ExecutionError> {
        
        let module = self.module.as_ref()
            .ok_or(ExecutionError::InvalidFunction(func_idx))?;

        // Get function type
        let type_idx = *module.functions.get(func_idx as usize)
            .ok_or(ExecutionError::InvalidFunction(func_idx))?;
        
        let func_type = module.types.get(type_idx as usize)
            .ok_or(ExecutionError::InvalidFunction(func_idx))?;

        // Validate arguments
        if args.len() != func_type.params.len() {
            return Err(ExecutionError::TypeMismatch);
        }

        for (arg, expected_type) in args.iter().zip(&func_type.params) {
            if arg.value_type() != *expected_type {
                return Err(ExecutionError::TypeMismatch);
            }
        }

        // Get function code
        let func_code = module.code.get(func_idx as usize)
            .ok_or(ExecutionError::InvalidFunction(func_idx))?;

        // Parse instructions from the function body
        let mut cursor = std::io::Cursor::new(&func_code.body);
        let instructions = Instruction::parse_expression(&mut cursor)
            .map_err(|_| ExecutionError::InvalidFunction(func_idx))?;

        // Set up function frame
        let mut locals = args.clone();
        
        // Add space for local variables
        for (count, value_type) in &func_code.locals {
            let default_value = match value_type {
                ValueType::I32 => Value::I32(0),
                ValueType::I64 => Value::I64(0),
                ValueType::F32 => Value::F32(0.0),
                ValueType::F64 => Value::F64(0.0),
            };
            
            for _ in 0..*count {
                locals.push(default_value);
            }
        }

        let frame = Frame::new(locals, &instructions);
        self.call_stack.push(frame);

        // Execute the function
        let result = self.execute_function()?;

        // Clean up
        self.call_stack.pop();

        Ok(result)
    }

    fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
        while let Some(frame) = self.call_stack.last_mut() {
            if frame.pc >= frame.instructions.len() {
                break;
            }

            let instruction = frame.instructions[frame.pc].clone();
            frame.pc += 1;

            match instruction {
                Instruction::Nop => {},
                
                Instruction::LocalGet(idx) => {
                    let value = frame.locals.get(idx as usize)
                        .ok_or(ExecutionError::InvalidLocal(idx))?;
                    self.stack.push(*value);
                },

                Instruction::LocalSet(idx) => {
                    let value = self.stack.pop()
                        .ok_or(ExecutionError::StackUnderflow)?;
                    
                    if idx as usize >= frame.locals.len() {
                        return Err(ExecutionError::InvalidLocal(idx));
                    }
                    
                    frame.locals[idx as usize] = value;
                },

                Instruction::LocalTee(idx) => {
                    let value = self.stack.peek()
                        .ok_or(ExecutionError::StackUnderflow)?;
                    
                    if idx as usize >= frame.locals.len() {
                        return Err(ExecutionError::InvalidLocal(idx));
                    }
                    
                    frame.locals[idx as usize] = *value;
                },

                Instruction::I32Const(value) => {
                    self.stack.push(Value::I32(value));
                },

                Instruction::I64Const(value) => {
                    self.stack.push(Value::I64(value));
                },

                Instruction::F32Const(value) => {
                    self.stack.push(Value::F32(value));
                },

                Instruction::F64Const(value) => {
                    self.stack.push(Value::F64(value));
                },

                Instruction::I32Add => {
                    let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    
                    match (a, b) {
                        (Value::I32(a), Value::I32(b)) => {
                            // Use wrapping arithmetic to match WebAssembly semantics
                            self.stack.push(Value::I32(a.wrapping_add(b)));
                        },
                        _ => return Err(ExecutionError::TypeMismatch),
                    }
                },

                Instruction::I32Sub => {
                    let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    
                    match (a, b) {
                        (Value::I32(a), Value::I32(b)) => {
                            self.stack.push(Value::I32(a.wrapping_sub(b)));
                        },
                        _ => return Err(ExecutionError::TypeMismatch),
                    }
                },

                Instruction::I32Mul => {
                    let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    
                    match (a, b) {
                        (Value::I32(a), Value::I32(b)) => {
                            self.stack.push(Value::I32(a.wrapping_mul(b)));
                        },
                        _ => return Err(ExecutionError::TypeMismatch),
                    }
                },

                Instruction::I32DivS => {
                    let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    
                    match (a, b) {
                        (Value::I32(a), Value::I32(b)) => {
                            if b == 0 {
                                return Err(ExecutionError::DivisionByZero);
                            }
                            // Check for overflow: i32::MIN / -1
                            if a == i32::MIN && b == -1 {
                                return Err(ExecutionError::IntegerOverflow);
                            }
                            self.stack.push(Value::I32(a / b));
                        },
                        _ => return Err(ExecutionError::TypeMismatch),
                    }
                },

                Instruction::I32Eq => {
                    let b = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    let a = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                    
                    match (a, b) {
                        (Value::I32(a), Value::I32(b)) => {
                            self.stack.push(Value::I32(if a == b { 1 } else { 0 }));
                        },
                        _ => return Err(ExecutionError::TypeMismatch),
                    }
                },

                Instruction::Drop => {
                    self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                },

                Instruction::Return => {
                    // Return the top value (if any) and exit function
                    return Ok(self.stack.pop());
                },

                Instruction::End => {
                    // End of block/function - for now, treat like return
                    return Ok(self.stack.pop());
                },

                Instruction::Unreachable => {
                    return Err(ExecutionError::Unreachable);
                },

                _ => {
                    // TODO: Implement remaining instructions
                    unimplemented!("Instruction not yet implemented: {:?}", instruction);
                }
            }
        }

        // Function completed normally, return top of stack if present
        Ok(self.stack.pop())
    }
}

Let's implement the supporting structures:

// src/interpreter/stack.rs
use std::collections::VecDeque;

#[derive(Debug)]
pub struct Stack<T> {
    items: VecDeque<T>,
}

impl<T> Stack<T> {
    pub fn new() -> Self {
        Self {
            items: VecDeque::new(),
        }
    }

    pub fn push(&mut self, item: T) {
        self.items.push_back(item);
    }

    pub fn pop(&mut self) -> Option<T> {
        self.items.pop_back()
    }

    pub fn peek(&self) -> Option<&T> {
        self.items.back()
    }

    pub fn len(&self) -> usize {
        self.items.len()
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }
}

// src/interpreter/frame.rs
use crate::types::Value;
use crate::instructions::Instruction;

#[derive(Debug)]
pub struct Frame {
    pub locals: Vec<Value>,
    pub instructions: Vec<Instruction>,
    pub pc: usize,  // program counter
    pub labels: Vec<Label>,
}

#[derive(Debug)]
pub struct Label {
    pub continuation: usize,  // where to jump for br
    pub arity: u32,           // how many values to keep on stack
}

impl Frame {
    pub fn new(locals: Vec<Value>, instructions: &[Instruction]) -> Self {
        Self {
            locals,
            instructions: instructions.to_vec(),
            pc: 0,
            labels: Vec::new(),
        }
    }
}

Now let's test our interpreter with the simple add function:

// src/lib.rs
#[cfg(test)]
mod tests {
    use super::*;
    use crate::types::Value;

    #[test]
    fn test_simple_add_function() {
        // Create our add module programmatically
        let module_bytes = vec![
            // Magic number and version
            0x00, 0x61, 0x73, 0x6D, 0x01, 0x00, 0x00, 0x00,
            // Type section
            0x01, 0x07,  // section id, size
            0x01,        // 1 type
            0x60, 0x02, 0x7F, 0x7F, 0x01, 0x7F, // (i32, i32) -> i32
            // Function section  
            0x03, 0x02,  // section id, size
            0x01, 0x00,  // 1 function, type 0
            // Export section
            0x07, 0x07,  // section id, size  
            0x01,        // 1 export
            0x03, b'a', b'd', b'd', // name "add"
            0x00, 0x00,  // function 0
            // Code section
            0x0A, 0x09,  // section id, size
            0x01,        // 1 function body
            0x07,        // body size
            0x00,        // 0 locals
            0x20, 0x00,  // local.get 0
            0x20, 0x01,  // local.get 1  
            0x6A,        // i32.add
            0x0B,        // end
        ];
        
        let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
        
        let mut interpreter = WasmInterpreter::new();
        interpreter.load_module(module);
        
        let result = interpreter.call_function(
            "add", 
            vec![Value::I32(5), Value::I32(3)]
        ).unwrap();
        
        assert_eq!(result, Some(Value::I32(8)));
    }
}

This test demonstrates the complete flow:

  1. Parse a binary WebAssembly module

  2. Load it into our interpreter

  3. Call an exported function with arguments

  4. Get back the computed result

Control Flow and Structured Programming

One of WebAssembly's most distinctive features is its structured control flow. Unlike traditional assembly languages with goto-style jumps, WebAssembly uses structured constructs like blocks, loops, and if-statements that can be statically validated.

Understanding WebAssembly's Control Flow Model

WebAssembly's control flow is based on the concept of structured control flow. Every control construct has a clear entry and exit point, and branches can only target specific labels in a stack-disciplined manner.

Consider this WebAssembly function that computes a factorial:

(func $factorial (param i32) (result i32)
  (local i32)                     ;; result accumulator
  
  i32.const 1                     ;; initialize result to 1
  local.set 1                     ;; store in local 1
  
  (block $exit                    ;; block for early exit
    (loop $loop                   ;; main loop
      local.get 0                 ;; get n
      i32.const 1                 ;; compare with 1
      i32.le_s                    ;; n <= 1?
      br_if $exit                 ;; if so, exit
      
      local.get 1                 ;; get current result
      local.get 0                 ;; get n
      i32.mul                     ;; multiply
      local.set 1                 ;; store back to result
      
      local.get 0                 ;; get n
      i32.const 1                 ;; subtract 1
      i32.sub
      local.set 0                 ;; store back to n
      
      br $loop                    ;; continue loop
    )
  )
  
  local.get 1                     ;; return result
)

This example shows several important aspects of WebAssembly's control flow:

  1. Labeled blocks: $exit and $loop are labels that can be targeted by branches

  2. Structured nesting: Control constructs are properly nested

  3. Stack discipline: Branches maintain stack consistency

Let's extend our interpreter to handle control flow:

// src/interpreter/mod.rs (additions)
impl WasmInterpreter {
    fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
        while let Some(frame) = self.call_stack.last_mut() {
            if frame.pc >= frame.instructions.len() {
                break;
            }

            let instruction = frame.instructions[frame.pc].clone();
            frame.pc += 1;

            match instruction {
                // ... existing cases ...

                Instruction::Block(block_type) => {
                    // Find the matching End instruction
                    let end_pc = self.find_matching_end(frame.pc - 1)?;
                    
                    // Create a label for this block
                    let label = Label {
                        continuation: end_pc + 1,
                        arity: if matches!(block_type, ValueType::I32 | ValueType::I64 | 
                                         ValueType::F32 | ValueType::F64) { 1 } else { 0 },
                    };
                    
                    frame.labels.push(label);
                },

                Instruction::Loop(block_type) => {
                    // For loops, the continuation points back to the loop start
                    let label = Label {
                        continuation: frame.pc - 1, // Point back to the loop
                        arity: 0, // Loops don't produce values on break
                    };
                    
                    frame.labels.push(label);
                },

                Instruction::If(block_type) => {
                    let condition = self.stack.pop()
                        .ok_or(ExecutionError::StackUnderflow)?;
                    
                    let condition_true = match condition {
                        Value::I32(val) => val != 0,
                        Value::I64(val) => val != 0,
                        _ => return Err(ExecutionError::TypeMismatch),
                    };

                    if condition_true {
                        // Execute the if branch - find else or end
                        let (else_pc, end_pc) = self.find_if_structure(frame.pc - 1)?;
                        
                        let label = Label {
                            continuation: end_pc + 1,
                            arity: if matches!(block_type, ValueType::I32 | ValueType::I64 | 
                                             ValueType::F32 | ValueType::F64) { 1 } else { 0 },
                        };
                        frame.labels.push(label);
                        
                        // Continue executing if branch
                    } else {
                        // Skip to else branch or end
                        let (else_pc, end_pc) = self.find_if_structure(frame.pc - 1)?;
                        
                        if let Some(else_pc) = else_pc {
                            frame.pc = else_pc + 1; // Skip the else instruction
                            let label = Label {
                                continuation: end_pc + 1,
                                arity: if matches!(block_type, ValueType::I32 | ValueType::I64 | 
                                                 ValueType::F32 | ValueType::F64) { 1 } else { 0 },
                            };
                            frame.labels.push(label);
                        } else {
                            frame.pc = end_pc + 1; // Skip to after end
                        }
                    }
                },

                Instruction::Else => {
                    // When executing else, we've finished the if branch
                    // Jump to the end of the if statement
                    let end_pc = self.find_matching_end_for_else(frame.pc - 1)?;
                    frame.pc = end_pc + 1;
                },

                Instruction::End => {
                    // Pop the current label
                    frame.labels.pop();
                },

                Instruction::Br(label_idx) => {
                    let target_label = frame.labels.len().saturating_sub(label_idx as usize + 1);
                    
                    if target_label >= frame.labels.len() {
                        return Err(ExecutionError::InvalidFunction(0)); // Invalid label
                    }
                    
                    let label = &frame.labels[target_label];
                    
                    // Handle stack unwinding for the branch
                    self.unwind_stack_for_branch(label.arity);
                    
                    // Jump to the label's continuation
                    frame.pc = label.continuation;
                    
                    // Remove labels that we're breaking out of
                    frame.labels.truncate(target_label);
                },

                Instruction::BrIf(label_idx) => {
                    let condition = self.stack.pop()
                        .ok_or(ExecutionError::StackUnderflow)?;
                    
                    let should_branch = match condition {
                        Value::I32(val) => val != 0,
                        Value::I64(val) => val != 0,
                        _ => return Err(ExecutionError::TypeMismatch),
                    };

                    if should_branch {
                        let target_label = frame.labels.len().saturating_sub(label_idx as usize + 1);
                        
                        if target_label >= frame.labels.len() {
                            return Err(ExecutionError::InvalidFunction(0));
                        }
                        
                        let label = &frame.labels[target_label];
                        
                        self.unwind_stack_for_branch(label.arity);
                        frame.pc = label.continuation;
                        frame.labels.truncate(target_label);
                    }
                },

                // ... rest of existing cases ...
            }
        }

        Ok(self.stack.pop())
    }

    fn find_matching_end(&self, start_pc: usize) -> Result<usize, ExecutionError> {
        let frame = self.call_stack.last()
            .ok_or(ExecutionError::InvalidFunction(0))?;
        
        let mut depth = 0;
        let mut pc = start_pc + 1;
        
        while pc < frame.instructions.len() {
            match &frame.instructions[pc] {
                Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
                    depth += 1;
                },
                Instruction::End => {
                    if depth == 0 {
                        return Ok(pc);
                    }
                    depth -= 1;
                },
                _ => {},
            }
            pc += 1;
        }
        
        Err(ExecutionError::InvalidFunction(0)) // No matching end found
    }

    fn find_if_structure(&self, if_pc: usize) -> Result<(Option<usize>, usize), ExecutionError> {
        let frame = self.call_stack.last()
            .ok_or(ExecutionError::InvalidFunction(0))?;
        
        let mut depth = 0;
        let mut pc = if_pc + 1;
        let mut else_pc = None;
        
        while pc < frame.instructions.len() {
            match &frame.instructions[pc] {
                Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
                    depth += 1;
                },
                Instruction::Else => {
                    if depth == 0 && else_pc.is_none() {
                        else_pc = Some(pc);
                    }
                },
                Instruction::End => {
                    if depth == 0 {
                        return Ok((else_pc, pc));
                    }
                    depth -= 1;
                },
                _ => {},
            }
            pc += 1;
        }
        
        Err(ExecutionError::InvalidFunction(0))
    }

    fn find_matching_end_for_else(&self, else_pc: usize) -> Result<usize, ExecutionError> {
        let frame = self.call_stack.last()
            .ok_or(ExecutionError::InvalidFunction(0))?;
        
        let mut depth = 0;
        let mut pc = else_pc + 1;
        
        while pc < frame.instructions.len() {
            match &frame.instructions[pc] {
                Instruction::Block(_) | Instruction::Loop(_) | Instruction::If(_) => {
                    depth += 1;
                },
                Instruction::End => {
                    if depth == 0 {
                        return Ok(pc);
                    }
                    depth -= 1;
                },
                _ => {},
            }
            pc += 1;
        }
        
        Err(ExecutionError::InvalidFunction(0))
    }

    fn unwind_stack_for_branch(&mut self, arity: u32) {
        // Keep the top `arity` values, discard the rest since entering this block
        let keep_count = arity as usize;
        let current_size = self.stack.len();
        
        if keep_count > current_size {
            return; // Nothing to unwind
        }
        
        // This is a simplified version - in a full implementation,
        // we'd need to track stack heights when entering blocks
        let mut values_to_keep = Vec::new();
        for _ in 0..keep_count {
            if let Some(value) = self.stack.pop() {
                values_to_keep.push(value);
            }
        }
        
        // Push the kept values back in correct order
        for value in values_to_keep.into_iter().rev() {
            self.stack.push(value);
        }
    }
}

Testing Control Flow

Let's test our control flow implementation with a simple conditional:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_conditional_execution() {
        // Test module with an if statement
        // (func (param i32) (result i32)
        //   local.get 0
        //   if (result i32)
        //     i32.const 100
        //   else
        //     i32.const 200
        //   end
        // )
        
        let module_bytes = create_conditional_module();
        let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
        
        let mut interpreter = WasmInterpreter::new();
        interpreter.load_module(module);
        
        // Test true condition
        let result = interpreter.call_function(
            "test_if", 
            vec![Value::I32(1)]
        ).unwrap();
        assert_eq!(result, Some(Value::I32(100)));
        
        // Test false condition  
        let result = interpreter.call_function(
            "test_if",
            vec![Value::I32(0)]
        ).unwrap();
        assert_eq!(result, Some(Value::I32(200)));
    }
    
    fn create_conditional_module() -> Vec<u8> {
        // This would be a more complex binary with if/else instructions
        // For brevity, we'll assume this exists
        vec![] // Placeholder
    }
}

The control flow implementation shows several

                7 => module.parse_export_section(&mut section_reader)?,
                10 => module.parse_code_section(&mut section_reader)?,
                11 => module.parse_data_section(&mut section_reader)?,    // Data section
                _ => {
                    eprintln!("Skipping unknown section: {}", section_id);
                }
            }
        }

        Ok(module)
    }
}

Testing Function Calls

Let's create a comprehensive test that demonstrates function calls and imports:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_function_calls() {
        // Create a module with multiple functions that call each other
        // func 0 (imported): print_i32(i32) -> void
        // func 1: double(i32) -> i32  { local.get 0; i32.const 2; i32.mul }
        // func 2: test() -> void { i32.const 21; call 1; call 0 }
        
        let module_bytes = create_function_call_module();
        let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
        
        let mut interpreter = WasmInterpreter::new();
        interpreter.load_module(module);
        
        // This should print "42" by calling test() -> double(21) -> print_i32(42)
        let result = interpreter.call_function("test", vec![]).unwrap();
        assert_eq!(result, None); // void return
    }
    
    #[test]
    fn test_recursive_function() {
        // Test a recursive factorial function
        let module_bytes = create_factorial_module();
        let module = Module::parse(std::io::Cursor::new(module_bytes)).unwrap();
        
        let mut interpreter = WasmInterpreter::new();
        interpreter.load_module(module);
        
        let result = interpreter.call_function(
            "factorial", 
            vec![Value::I32(5)]
        ).unwrap();
        
        assert_eq!(result, Some(Value::I32(120)));
    }
    
    fn create_factorial_module() -> Vec<u8> {
        // This would contain a recursive factorial implementation
        // (func $factorial (param i32) (result i32)
        //   local.get 0
        //   i32.const 1
        //   i32.le_s
        //   if (result i32)
        //     i32.const 1
        //   else
        //     local.get 0
        //     local.get 0
        //     i32.const 1
        //     i32.sub
        //     call $factorial
        //     i32.mul
        //   end
        // )
        vec![] // Placeholder
    }
}

Validation and Type Safety

One of WebAssembly's key feature is its validation algorithm. Before execution, every WebAssembly module is validated to ensure it's well-formed and type-safe. This validation happens at load time and provides strong guarantees about the module's behavior.

The Validation Algorithm

WebAssembly validation is essentially a type checker that operates on the bytecode. It ensures:

  1. All instructions have valid operands of the correct types

  2. Stack operations are balanced (no underflow/overflow)

  3. Control flow is well-structured

  4. Memory and function accesses are within bounds

  5. All paths through a function produce the correct result type

Let's implement a basic validator:

// src/validation/mod.rs
use crate::types::{ValueType, Value, FuncType};
use crate::instructions::Instruction;
use crate::binary::Module;
use std::collections::HashMap;

pub mod context;
pub mod stack;

use context::ValidationContext;
use stack::TypeStack;

#[derive(Debug)]
pub enum ValidationError {
    TypeMismatch,
    StackUnderflow,
    StackOverflow,
    InvalidLocal(u32),
    InvalidFunction(u32),
    InvalidGlobal(u32),
    InvalidMemory,
    InvalidBranch(u32),
    UnreachableCode,
    InvalidBlockType,
    MissingEnd,
}

pub struct Validator {
    context: ValidationContext,
}

impl Validator {
    pub fn new() -> Self {
        Self {
            context: ValidationContext::new(),
        }
    }

    pub fn validate_module(&mut self, module: &Module) -> Result<(), ValidationError> {
        // Set up validation context with module information
        self.context.set_types(&module.types);
        self.context.set_functions(&module.functions);
        self.context.set_imports(&module.imports);
        
        // Validate each function
        for (func_idx, func_code) in module.code.iter().enumerate() {
            let type_idx = module.functions[func_idx];
            let func_type = &module.types[type_idx as usize];
            
            self.validate_function(func_code, func_type, func_idx as u32)?;
        }

        // Validate exports point to valid indices
        for export in &module.exports {
            match &export.desc {
                crate::binary::module::ExportDesc::Func(idx) => {
                    if *idx >= (module.imports.len() + module.functions.len()) as u32 {
                        return Err(ValidationError::InvalidFunction(*idx));
                    }
                },
                _ => {}, // We'll add other export types later
            }
        }

        // Validate data segments
        for data_segment in &module.data {
            self.validate_constant_expression(&data_segment.offset)?;
        }

        Ok(())
    }

    fn validate_function(&mut self, func_code: &crate::binary::module::FunctionCode, 
                        func_type: &FuncType, func_idx: u32) -> Result<(), ValidationError> {
        
        // Parse instructions
        let mut cursor = std::io::Cursor::new(&func_code.body);
        let instructions = Instruction::parse_expression(&mut cursor)
            .map_err(|_| ValidationError::InvalidFunction(func_idx))?;

        // Set up local types (parameters + locals)
        let mut local_types = func_type.params.clone();
        for (count, value_type) in &func_code.locals {
            for _ in 0..*count {
                local_types.push(*value_type);
            }
        }

        self.context.set_locals(&local_types);
        
        // Validate instruction sequence
        let mut type_stack = TypeStack::new();
        let mut control_stack = Vec::new();
        
        // Function entry creates an implicit block
        control_stack.push(ControlFrame {
            opcode: ControlOpcode::Function,
            start_types: Vec::new(),
            end_types: func_type.results.clone(),
            height: 0,
            unreachable: false,
        });

        for instruction in &instructions {
            self.validate_instruction(
                instruction, 
                &mut type_stack, 
                &mut control_stack
            )?;
        }

        // Check that we end with the right types on the stack
        let expected_results = &func_type.results;
        if type_stack.len() != expected_results.len() {
            return Err(ValidationError::TypeMismatch);
        }

        for (expected, actual) in expected_results.iter().zip(type_stack.types()) {
            if expected != actual {
                return Err(ValidationError::TypeMismatch);
            }
        }

        Ok(())
    }

    fn validate_instruction(&self, instruction: &Instruction, 
                          type_stack: &mut TypeStack,
                          control_stack: &mut Vec<ControlFrame>) -> Result<(), ValidationError> {
        
        match instruction {
            Instruction::Nop => {},
            
            Instruction::Unreachable => {
                if let Some(frame) = control_stack.last_mut() {
                    frame.unreachable = true;
                }
            },

            Instruction::I32Const(_) => {
                type_stack.push(ValueType::I32)?;
            },

            Instruction::I64Const(_) => {
                type_stack.push(ValueType::I64)?;
            },

            Instruction::F32Const(_) => {
                type_stack.push(ValueType::F32)?;
            },

            Instruction::F64Const(_) => {
                type_stack.push(ValueType::F64)?;
            },

            Instruction::LocalGet(local_idx) => {
                let local_type = self.context.get_local(*local_idx)
                    .ok_or(ValidationError::InvalidLocal(*local_idx))?;
                type_stack.push(local_type)?;
            },

            Instruction::LocalSet(local_idx) => {
                let local_type = self.context.get_local(*local_idx)
                    .ok_or(ValidationError::InvalidLocal(*local_idx))?;
                
                let stack_type = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if stack_type != local_type {
                    return Err(ValidationError::TypeMismatch);
                }
            },

            Instruction::LocalTee(local_idx) => {
                let local_type = self.context.get_local(*local_idx)
                    .ok_or(ValidationError::InvalidLocal(*local_idx))?;
                
                let stack_type = type_stack.peek()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if stack_type != local_type {
                    return Err(ValidationError::TypeMismatch);
                }
                // Note: tee doesn't pop the value, it stays on stack
            },

            // Arithmetic operations
            Instruction::I32Add | Instruction::I32Sub | Instruction::I32Mul |
            Instruction::I32DivS | Instruction::I32DivU => {
                let b = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                let a = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                
                if a != ValueType::I32 || b != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                type_stack.push(ValueType::I32)?;
            },

            // Comparison operations
            Instruction::I32Eq | Instruction::I32Ne | Instruction::I32LtS | 
            Instruction::I32LtU | Instruction::I32GtS | Instruction::I32GtU |
            Instruction::I32LeS | Instruction::I32LeU | Instruction::I32GeS | 
            Instruction::I32GeU => {
                let b = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                let a = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                
                if a != ValueType::I32 || b != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                type_stack.push(ValueType::I32)?; // Comparison result is i32 (0 or 1)
            },

            Instruction::Drop => {
                type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
            },

            Instruction::Select => {
                let condition = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                let val2 = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                let val1 = type_stack.pop().ok_or(ValidationError::StackUnderflow)?;
                
                if condition != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                if val1 != val2 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                type_stack.push(val1)?;
            },

            Instruction::Call(func_idx) => {
                let func_type = self.context.get_function_type(*func_idx)
                    .ok_or(ValidationError::InvalidFunction(*func_idx))?;
                
                // Pop parameters in reverse order
                for param_type in func_type.params.iter().rev() {
                    let stack_type = type_stack.pop()
                        .ok_or(ValidationError::StackUnderflow)?;
                    
                    if stack_type != *param_type {
                        return Err(ValidationError::TypeMismatch);
                    }
                }
                
                // Push results
                for result_type in &func_type.results {
                    type_stack.push(*result_type)?;
                }
            },

            Instruction::Block(block_type) => {
                let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
                    vec![ty]
                } else {
                    vec![]
                };

                control_stack.push(ControlFrame {
                    opcode: ControlOpcode::Block,
                    start_types: Vec::new(),
                    end_types,
                    height: type_stack.len(),
                    unreachable: false,
                });
            },

            Instruction::Loop(block_type) => {
                let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
                    vec![ty]
                } else {
                    vec![]
                };

                control_stack.push(ControlFrame {
                    opcode: ControlOpcode::Loop,
                    start_types: Vec::new(),
                    end_types,
                    height: type_stack.len(),
                    unreachable: false,
                });
            },

            Instruction::If(block_type) => {
                // Pop condition
                let condition = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if condition != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }

                let end_types = if let Some(ty) = Self::block_type_to_result_type(*block_type) {
                    vec![ty]
                } else {
                    vec![]
                };

                control_stack.push(ControlFrame {
                    opcode: ControlOpcode::If,
                    start_types: Vec::new(),
                    end_types,
                    height: type_stack.len(),
                    unreachable: false,
                });
            },

            Instruction::Else => {
                let frame = control_stack.last_mut()
                    .ok_or(ValidationError::InvalidBlockType)?;
                
                if frame.opcode != ControlOpcode::If {
                    return Err(ValidationError::InvalidBlockType);
                }
                
                // Reset stack to frame height and set up for else branch
                type_stack.truncate_to(frame.height);
                frame.opcode = ControlOpcode::Else;
                frame.unreachable = false;
            },

            Instruction::End => {
                let frame = control_stack.pop()
                    .ok_or(ValidationError::MissingEnd)?;
                
                // Check that we have the right types for this block's results
                for expected_type in frame.end_types.iter().rev() {
                    let stack_type = type_stack.pop()
                        .ok_or(ValidationError::StackUnderflow)?;
                    
                    if stack_type != *expected_type {
                        return Err(ValidationError::TypeMismatch);
                    }
                }
                
                // Push the block's results back onto the stack
                for result_type in &frame.end_types {
                    type_stack.push(*result_type)?;
                }
            },

            Instruction::Br(label_idx) => {
                let target_frame_idx = control_stack.len().saturating_sub(*label_idx as usize + 1);
                
                if target_frame_idx >= control_stack.len() {
                    return Err(ValidationError::InvalidBranch(*label_idx));
                }
                
                let target_frame = &control_stack[target_frame_idx];
                let branch_types = if target_frame.opcode == ControlOpcode::Loop {
                    &target_frame.start_types
                } else {
                    &target_frame.end_types
                };
                
                // Check that we have the right types for this branch
                for expected_type in branch_types.iter().rev() {
                    let stack_type = type_stack.pop()
                        .ok_or(ValidationError::StackUnderflow)?;
                    
                    if stack_type != *expected_type {
                        return Err(ValidationError::TypeMismatch);
                    }
                }
                
                // Mark current frame as unreachable
                if let Some(frame) = control_stack.last_mut() {
                    frame.unreachable = true;
                }
            },

            Instruction::BrIf(label_idx) => {
                // Pop condition
                let condition = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if condition != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                let target_frame_idx = control_stack.len().saturating_sub(*label_idx as usize + 1);
                
                if target_frame_idx >= control_stack.len() {
                    return Err(ValidationError::InvalidBranch(*label_idx));
                }
                
                let target_frame = &control_stack[target_frame_idx];
                let branch_types = if target_frame.opcode == ControlOpcode::Loop {
                    &target_frame.start_types
                } else {
                    &target_frame.end_types
                };
                
                // For conditional branch, we need to ensure the stack has the branch types,
                // but we don't consume them (they might be needed for the fall-through case)
                for (i, expected_type) in branch_types.iter().rev().enumerate() {
                    let stack_type = type_stack.peek_at(i)
                        .ok_or(ValidationError::StackUnderflow)?;
                    
                    if stack_type != *expected_type {
                        return Err(ValidationError::TypeMismatch);
                    }
                }
            },

            Instruction::Return => {
                let func_frame = control_stack.first()
                    .ok_or(ValidationError::InvalidFunction(0))?;
                
                // Check that we have the right types for function return
                for expected_type in func_frame.end_types.iter().rev() {
                    let stack_type = type_stack.pop()
                        .ok_or(ValidationError::StackUnderflow)?;
                    
                    if stack_type != *expected_type {
                        return Err(ValidationError::TypeMismatch);
                    }
                }
                
                // Mark current frame as unreachable
                if let Some(frame) = control_stack.last_mut() {
                    frame.unreachable = true;
                }
            },

            // Memory operations
            Instruction::I32Load(_) => {
                if !self.context.has_memory() {
                    return Err(ValidationError::InvalidMemory);
                }
                
                let addr = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if addr != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
                
                type_stack.push(ValueType::I32)?;
            },

            Instruction::I32Store(_) => {
                if !self.context.has_memory() {
                    return Err(ValidationError::InvalidMemory);
                }
                
                let value = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                let addr = type_stack.pop()
                    .ok_or(ValidationError::StackUnderflow)?;
                
                if addr != ValueType::I32 || value != ValueType::I32 {
                    return Err(ValidationError::TypeMismatch);
                }
            },

            _ => {
                // For instructions not yet implemented, we'll be conservative
                return Err(ValidationError::TypeMismatch);
            }
        }

        Ok(())
    }

    fn validate_constant_expression(&self, instructions: &[Instruction]) -> Result<(), ValidationError> {
        let mut type_stack = TypeStack::new();
        
        for instruction in instructions {
            match instruction {
                Instruction::I32Const(_) => type_stack.push(ValueType::I32)?,
                Instruction::I64Const(_) => type_stack.push(ValueType::I64)?,
                Instruction::F32Const(_) => type_stack.push(ValueType::F32)?,
                Instruction::F64Const(_) => type_stack.push(ValueType::F64)?,
                Instruction::GlobalGet(idx) => {
                    let global_type = self.context.get_global_type(*idx)
                        .ok_or(ValidationError::InvalidGlobal(*idx))?;
                    type_stack.push(global_type)?;
                },
                Instruction::End => break,
                _ => return Err(ValidationError::TypeMismatch), // Only certain instructions allowed in constant expressions
            }
        }
        
        if type_stack.len() != 1 {
            return Err(ValidationError::TypeMismatch);
        }
        
        Ok(())
    }

    fn block_type_to_result_type(block_type: ValueType) -> Option<ValueType> {
        // In WebAssembly 1.0, block types are either empty or single value types
        // This is a simplified version
        Some(block_type)
    }
}

#[derive(Debug, Clone, Copy, PartialEq)]
enum ControlOpcode {
    Function,
    Block,
    Loop,
    If,
    Else,
}

#[derive(Debug, Clone)]
struct ControlFrame {
    opcode: ControlOpcode,
    start_types: Vec<ValueType>,  // Types at start of block
    end_types: Vec<ValueType>,    // Types at end of block  
    height: usize,                // Stack height when entering this block
    unreachable: bool,            // Is this block unreachable?
}

Let's implement the supporting structures:

// src/validation/context.rs
use crate::types::{ValueType, FuncType};
use crate::binary::module::{Import, ImportDesc};
use std::collections::HashMap;

pub struct ValidationContext {
    types: Vec<FuncType>,
    function_types: Vec<u32>, // indices into types
    locals: Vec<ValueType>,
    has_memory: bool,
    globals: HashMap<u32, ValueType>,
}

impl ValidationContext {
    pub fn new() -> Self {
        Self {
            types: Vec::new(),
            function_types: Vec::new(),
            locals: Vec::new(),
            has_memory: false,
            globals: HashMap::new(),
        }
    }

    pub fn set_types(&mut self, types: &[FuncType]) {
        self.types = types.to_vec();
    }

    pub fn set_functions(&mut self, functions: &[u32]) {
        self.function_types = functions.to_vec();
    }

    pub fn set_imports(&mut self, imports: &[Import]) {
        let mut import_func_count = 0;
        
        for import in imports {
            match &import.desc {
                ImportDesc::Func(type_idx) => {
                    self.function_types.insert(import_func_count, *type_idx);
                    import_func_count += 1;
                },
                ImportDesc::Memory(_) => {
                    self.has_memory = true;
                },
                _ => {}, // Handle other import types as needed
            }
        }
    }

    pub fn set_locals(&mut self, locals: &[ValueType]) {
        self.locals = locals.to_vec();
    }

    pub fn get_local(&self, idx: u32) -> Option<ValueType> {
        self.locals.get(idx as usize).copied()
    }

    pub fn get_function_type(&self, func_idx: u32) -> Option<&FuncType> {
        let type_idx = self.function_types.get(func_idx as usize)?;
        self.types.get(*type_idx as usize)
    }

    pub fn has_memory(&self) -> bool {
        self.has_memory
    }

    pub fn get_global_type(&self, global_idx: u32) -> Option<ValueType> {
        self.globals.get(&global_idx).copied()
    }
}

// src/validation/stack.rs
use crate::types::ValueType;
use super::ValidationError;

pub struct TypeStack {
    types: Vec<ValueType>,
}

impl TypeStack {
    pub fn new() -> Self {
        Self {
            types: Vec::new(),
        }
    }

    pub fn push(&mut self, ty: ValueType) -> Result<(), ValidationError> {
        // WebAssembly has a stack size limit to prevent infinite loops during validation
        if self.types.len() >= 1024 {
            return Err(ValidationError::StackOverflow);
        }
        
        self.types.push(ty);
        Ok(())
    }

    pub fn pop(&mut self) -> Option<ValueType> {
        self.types.pop()
    }

    pub fn peek(&self) -> Option<ValueType> {
        self.types.last().copied()
    }

    pub fn peek_at(&self, depth: usize) -> Option<ValueType> {
        if depth < self.types.len() {
            Some(self.types[self.types.len() - 1 - depth])
        } else {
            None
        }
    }

    pub fn len(&self) -> usize {
        self.types.len()
    }

    pub fn truncate_to(&mut self, height: usize) {
        self.types.truncate(height);
    }

    pub fn types(&self) -> &[ValueType] {
        &self.types
    }
}

Testing Validation

Let's create tests that verify our validation works correctly:

#[cfg(test)]
mod tests {
    use super::*;
    use crate::types::{ValueType, FuncType};
    use crate::binary::module::FunctionCode;
    
    #[test]
    fn test_valid_function() {
        let mut validator = Validator::new();
        
        // Create a simple valid function: (param i32 i32) (result i32) local.get 0 local.get 1 i32.add
        let func_type = FuncType {
            params: vec![ValueType::I32, ValueType::I32],
            results: vec![ValueType::I32],
        };
        
        let instructions = vec![
            Instruction::LocalGet(0),
            Instruction::LocalGet(1),
            Instruction::I32Add,
            Instruction::End,
        ];
        
        let func_code = create_function_code_for_instructions(&instructions);
        
        validator.context.set_types(&[func_type.clone()]);
        let result = validator.validate_function(&func_code, &func_type, 0);
        
        assert!(result.is_ok());
    }
    
    #[test]
    fn test_type_mismatch() {
        let mut validator = Validator::new();
        
        // Try to add i32 and f32 - should fail
        let func_type = FuncType {
            params: vec![ValueType::I32, ValueType::F32],
            results: vec![ValueType::I32],
        };
        
        let instructions = vec![
            Instruction::LocalGet(0),
            Instruction::LocalGet(1),
            Instruction::I32Add, // This should fail - can't add i32 and f32
            Instruction::End,
        ];
        
        let func_code = create_function_code_for_instructions(&instructions);
        
        validator.context.set_types(&[func_type.clone()]);
        let result = validator.validate_function(&func_code, &func_type, 0);
        
        assert!(matches!(result, Err(ValidationError::TypeMismatch)));
    }
    
    #[test]
    fn test_stack_underflow() {
        let mut validator = Validator::new();
        
        // Try to pop from empty stack
        let func_type = FuncType {
            params: vec![],
            results: vec![ValueType::I32],
        };
        
        let instructions = vec![
            Instruction::I32Add, // No operands on stack - should fail
            Instruction::End,
        ];
        
        let func_code = create_function_code_for_instructions(&instructions);
        
        validator.context.set_types(&[func_type.clone()]);
        let result = validator.validate_function(&func_code, &func_type, 0);
        
        assert!(matches!(result, Err(ValidationError::StackUnderflow)));
    }
    
    fn create_function_code_for_instructions(instructions: &[Instruction]) -> FunctionCode {
        // This is a helper to create function code for testing
        // In practice, you'd encode the instructions to bytes
        FunctionCode {
            locals: vec![],
            body: vec![], // This would contain the encoded instructions
        }
    }
}

Advanced Features and Extensions

WebAssembly continues to evolve beyond its initial MVP. Several important extensions have been proposed and implemented, including:

  1. Multiple return values - Functions can return multiple values

  2. Bulk memory operations - Efficient memory copying and initialization

  3. Sign-extension operators - Additional integer operations

  4. Reference types - Support for opaque host references

  5. Garbage Collection - Managed objects and automatic memory management

Let's implement support for some of these features:

// src/instructions.rs (extensions)
impl Instruction {
    pub fn parse<R: Read>(reader: &mut R) -> io::Result<Self> {
        let mut opcode_byte = [0u8; 1];
        reader.read_exact(&mut opcode_byte)?;
        let opcode = opcode_byte[0];

        match opcode {
            // ... existing opcodes ...

            // Bulk memory operations (prefix 0xFC)
            0xFC => {
                let sub_opcode = decoder::read_u32(reader)?;
                match sub_opcode {
                    8 => {
                        // memory.init
                        let segment_idx = decoder::read_u32(reader)?;
                        let memory_idx = decoder::read_u32(reader)?;
                        Ok(Instruction::MemoryInit(segment_idx, memory_idx))
                    },
                    9 => {
                        // data.drop
                        let segment_idx = decoder::read_u32(reader)?;
                        Ok(Instruction::DataDrop(segment_idx))
                    },
                    10 => {
                        // memory.copy
                        let dst_mem = decoder::read_u32(reader)?;
                        let src_mem = decoder::read_u32(reader)?;
                        Ok(Instruction::MemoryCopy(dst_mem, src_mem))
                    },
                    11 => {
                        // memory.fill
                        let memory_idx = decoder::read_u32(reader)?;
                        Ok(Instruction::MemoryFill(memory_idx))
                    },
                    _ => Err(io::Error::new(
                        io::ErrorKind::InvalidData,
                        format!("Unknown bulk memory opcode: {}", sub_opcode)
                    )),
                }
            },

            // Sign-extension operations
            0xC
            // Sign-extension operations
            0xC0 => Ok(Instruction::I32Extend8S),
            0xC1 => Ok(Instruction::I32Extend16S),
            0xC2 => Ok(Instruction::I64Extend8S),
            0xC3 => Ok(Instruction::I64Extend16S),
            0xC4 => Ok(Instruction::I64Extend32S),

            // Reference types
            0xD0 => Ok(Instruction::RefNull),
            0xD1 => Ok(Instruction::RefIsNull),
            0xD2 => {
                let func_idx = decoder::read_u32(reader)?;
                Ok(Instruction::RefFunc(func_idx))
            },

            _ => Err(io::Error::new(
                io::ErrorKind::InvalidData,
                format!("Unknown opcode: 0x{:02X}", opcode)
            )),
        }
    }
}

// Add new instruction variants
#[derive(Debug, Clone, PartialEq)]
pub enum Instruction {
    // ... existing variants ...

    // Bulk memory operations
    MemoryInit(u32, u32),    // segment_idx, memory_idx
    DataDrop(u32),           // segment_idx
    MemoryCopy(u32, u32),    // dst_memory, src_memory
    MemoryFill(u32),         // memory_idx
    
    // Sign extension
    I32Extend8S,
    I32Extend16S,
    I64Extend8S,
    I64Extend16S,
    I64Extend32S,

    // Reference types
    RefNull,
    RefIsNull,
    RefFunc(u32),
}

Multi-Value Support

One of the most significant extensions is multi-value support, allowing functions to return multiple values. This requires updates to our type system:

// src/types.rs (extensions)
#[derive(Debug, Clone, PartialEq)]
pub enum BlockType {
    Empty,
    Type(ValueType),
    TypeIndex(u32), // For multi-value blocks
}

impl BlockType {
    pub fn from_byte(byte: i8) -> Self {
        match byte {
            -64 => BlockType::Empty, // 0x40 as signed
            -1 => BlockType::Type(ValueType::I32),   // 0x7F
            -2 => BlockType::Type(ValueType::I64),   // 0x7E  
            -3 => BlockType::Type(ValueType::F32),   // 0x7D
            -4 => BlockType::Type(ValueType::F64),   // 0x7C
            _ if byte >= 0 => BlockType::TypeIndex(byte as u32),
            _ => BlockType::Empty, // Invalid, default to empty
        }
    }

    pub fn result_types(&self, types: &[FuncType]) -> Vec<ValueType> {
        match self {
            BlockType::Empty => vec![],
            BlockType::Type(ty) => vec![*ty],
            BlockType::TypeIndex(idx) => {
                types.get(*idx as usize)
                    .map(|ft| ft.results.clone())
                    .unwrap_or_default()
            }
        }
    }
}

Implementing Bulk Memory Operations

Bulk memory operations provide efficient ways to initialize and copy memory. Let's implement these in our interpreter:

// src/interpreter/memory.rs (additions)
impl Memory {
    pub fn init_from_data(&mut self, dest: u32, src_offset: u32, len: u32, data: &[u8]) 
        -> Result<(), MemoryError> {
        let dest_addr = dest as usize;
        let src_start = src_offset as usize;
        let copy_len = len as usize;

        if dest_addr + copy_len > self.data.len() {
            return Err(MemoryError::OutOfBounds);
        }

        if src_start + copy_len > data.len() {
            return Err(MemoryError::OutOfBounds);
        }

        self.data[dest_addr..dest_addr + copy_len]
            .copy_from_slice(&data[src_start..src_start + copy_len]);

        Ok(())
    }

    pub fn copy_within(&mut self, dest: u32, src: u32, len: u32) -> Result<(), MemoryError> {
        let dest_addr = dest as usize;
        let src_addr = src as usize;
        let copy_len = len as usize;

        if dest_addr + copy_len > self.data.len() || src_addr + copy_len > self.data.len() {
            return Err(MemoryError::OutOfBounds);
        }

        // Use memmove semantics (handles overlapping regions correctly)
        if dest_addr != src_addr {
            if dest_addr < src_addr {
                // Copy forward
                for i in 0..copy_len {
                    self.data[dest_addr + i] = self.data[src_addr + i];
                }
            } else {
                // Copy backward to handle overlap
                for i in (0..copy_len).rev() {
                    self.data[dest_addr + i] = self.data[src_addr + i];
                }
            }
        }

        Ok(())
    }

    pub fn fill(&mut self, dest: u32, value: u8, len: u32) -> Result<(), MemoryError> {
        let dest_addr = dest as usize;
        let fill_len = len as usize;

        if dest_addr + fill_len > self.data.len() {
            return Err(MemoryError::OutOfBounds);
        }

        self.data[dest_addr..dest_addr + fill_len].fill(value);
        Ok(())
    }
}

Now let's implement the bulk memory instructions in our interpreter:

// src/interpreter/mod.rs (bulk memory additions)
impl WasmInterpreter {
    fn execute_function(&mut self) -> Result<Option<Value>, ExecutionError> {
        // ... existing code ...

        match instruction {
            // ... existing cases ...

            Instruction::MemoryInit(segment_idx, _memory_idx) => {
                let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let src_offset = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;

                if let (Value::I32(dest), Value::I32(src_offset), Value::I32(len)) = (dest, src_offset, len) {
                    let module = self.module.as_ref().ok_or(ExecutionError::MemoryNotDefined)?;
                    let data_segment = module.data.get(segment_idx as usize)
                        .ok_or(ExecutionError::InvalidDataSegment)?;

                    let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
                    
                    memory.init_from_data(
                        dest as u32, 
                        src_offset as u32, 
                        len as u32, 
                        &data_segment.data
                    ).map_err(|_| ExecutionError::MemoryOutOfBounds)?;
                } else {
                    return Err(ExecutionError::TypeMismatch);
                }
            },

            Instruction::MemoryCopy(_dst_mem, _src_mem) => {
                let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let src = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;

                if let (Value::I32(dest), Value::I32(src), Value::I32(len)) = (dest, src, len) {
                    let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
                    
                    memory.copy_within(dest as u32, src as u32, len as u32)
                        .map_err(|_| ExecutionError::MemoryOutOfBounds)?;
                } else {
                    return Err(ExecutionError::TypeMismatch);
                }
            },

            Instruction::MemoryFill(_memory_idx) => {
                let len = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                let dest = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;

                if let (Value::I32(dest), Value::I32(value), Value::I32(len)) = (dest, value, len) {
                    let memory = self.memory.as_mut().ok_or(ExecutionError::MemoryNotDefined)?;
                    
                    memory.fill(dest as u32, value as u8, len as u32)
                        .map_err(|_| ExecutionError::MemoryOutOfBounds)?;
                } else {
                    return Err(ExecutionError::TypeMismatch);
                }
            },

            // Sign extension operations
            Instruction::I32Extend8S => {
                let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                
                if let Value::I32(val) = value {
                    // Sign-extend from 8 bits
                    let result = (val as i8) as i32;
                    self.stack.push(Value::I32(result));
                } else {
                    return Err(ExecutionError::TypeMismatch);
                }
            },

            Instruction::I32Extend16S => {
                let value = self.stack.pop().ok_or(ExecutionError::StackUnderflow)?;
                
                if let Value::I32(val) = value {
                    // Sign-extend from 16 bits
                    let result = (val as i16) as i32;
                    self.stack.push(Value::I32(result));
                } else {
                    return Err(ExecutionError::TypeMismatch);
                }
            },

            // ... other sign extension operations ...

            _ => {
                return Err(ExecutionError::UnknownInstruction);
            }
        }
    }
}

// Add new error variants
#[derive(Debug)]
pub enum ExecutionError {
    // ... existing variants ...
    InvalidDataSegment,
    UnknownInstruction,
}


Last updated