Building an Interpreter

in rust of course

What is WebAssembly, Really?

Before we start writing code, we need to understand what WebAssembly actually is. Most explanations begin with phrases like "assembly-like language" or "virtual machine," but these descriptions miss the fundamental point. WebAssembly is better understood as a new abstraction boundary—a way of dividing computing systems into composable pieces.

Think about the boundaries you already know. The Linux ABI doesn't care whether your program was written in C, Rust, or even Brainfuck—it just handles system calls and schedules process time. HTTP doesn't care if your server is written in Go or your client is Firefox or curl. The C ABI allows libraries written in different languages to work together seamlessly.

WebAssembly occupies a unique point in this space. Unlike the Linux ABI, there's no fixed set of system calls. Unlike HTTP, WebAssembly modules run in the same process as their host, enabling synchronous function calls in both directions. Unlike the C ABI, WebAssembly modules have only the shared state they're explicitly given.

This abstraction boundary has profound implications. It means you can run untrusted code safely, compose modules from different programming languages, and target everything from web browsers to IoT devices to serverless platforms with the same binary format.

The Stack Machine Model

WebAssembly is a stack-based virtual machine. This means that instead of having named registers like x86 or ARM, operations work by pushing values onto an implicit stack and popping them off as needed.

Consider this simple addition in different models:

# Register-based (x86-like)
mov %eax, 5
mov %ebx, 3  
add %eax, %ebx    ; result in %eax

# Stack-based (WebAssembly)
i32.const 5       ; push 5
i32.const 3       ; push 3
i32.add           ; pop two values, push sum

The stack model has several advantages for a portable bytecode:

  • No need to specify register allocation

  • Compact encoding (most instructions are single bytes)

  • Easy to validate and analyze statically

  • Maps well to expression-based languages

Module Structure

WebAssembly organizes code into modules. Each module is a self-contained unit with explicit imports and exports. Think of a module as similar to a shared library or a package—it encapsulates functionality and exposes a clean interface.

A module contains several sections:

  • Type section: Function signatures used in the module

  • Function section: Declarations of functions defined in the module

  • Memory section: Linear memory regions

  • Global section: Global variables

  • Export section: Names and types of exports

  • Code section: The actual function bodies

  • Data section: Initial data for memory regions

This might seem like a lot of complexity, but each section serves a specific purpose in making WebAssembly fast to load, validate, and execute.

Why These Design Decisions Matter

Every design decision in WebAssembly was made for specific reasons. The stack model enables compact encoding and eliminates register allocation complexity. The module system with explicit imports and exports enables secure sandboxing and efficient linking. The type system enables ahead-of-time validation and optimization.

As we build our interpreter, we'll see how these decisions affect the implementation at every level. The binary format reflects the conceptual model, which makes parsing straightforward. The instruction set maps cleanly to the stack model, which makes the interpreter loop elegant. The type system catches errors early, which makes the whole system more robust.

Now let's start building.

The Binary Format

WebAssembly modules are distributed in a compact binary format. While there's also a textual representation (WAT), the binary format is what actually gets executed. To build an interpreter, we need to understand this format inside and out.

Let's start with the simplest possible WebAssembly module and examine its binary representation:

Understanding Stack-Based Execution

Before we implement the interpreter, let's understand how stack-based execution works. Consider this simple WebAssembly function:

The execution proceeds as follows:

  1. local.get 0: Stack becomes [param0]

  2. local.get 1: Stack becomes [param0, param1]

  3. i32.add: Stack becomes [param0 + param1]

The final value on the stack is the function's return value.

This model has several advantages:

  • No register allocation needed

  • Compact instruction encoding

  • Easy to validate statically

  • Natural fit for expression-based languages

Let's implement our stack machine:

Let's implement the supporting structures:

Now let's test our interpreter with the simple add function:

This test demonstrates the complete flow:

  1. Parse a binary WebAssembly module

  2. Load it into our interpreter

  3. Call an exported function with arguments

  4. Get back the computed result

Control Flow and Structured Programming

One of WebAssembly's most distinctive features is its structured control flow. Unlike traditional assembly languages with goto-style jumps, WebAssembly uses structured constructs like blocks, loops, and if-statements that can be statically validated.

Understanding WebAssembly's Control Flow Model

WebAssembly's control flow is based on the concept of structured control flow. Every control construct has a clear entry and exit point, and branches can only target specific labels in a stack-disciplined manner.

Consider this WebAssembly function that computes a factorial:

This example shows several important aspects of WebAssembly's control flow:

  1. Labeled blocks: $exit and $loop are labels that can be targeted by branches

  2. Structured nesting: Control constructs are properly nested

  3. Stack discipline: Branches maintain stack consistency

Let's extend our interpreter to handle control flow:

Testing Control Flow

Let's test our control flow implementation with a simple conditional:

The control flow implementation shows several

Testing Function Calls

Let's create a comprehensive test that demonstrates function calls and imports:

Validation and Type Safety

One of WebAssembly's key feature is its validation algorithm. Before execution, every WebAssembly module is validated to ensure it's well-formed and type-safe. This validation happens at load time and provides strong guarantees about the module's behavior.

The Validation Algorithm

WebAssembly validation is essentially a type checker that operates on the bytecode. It ensures:

  1. All instructions have valid operands of the correct types

  2. Stack operations are balanced (no underflow/overflow)

  3. Control flow is well-structured

  4. Memory and function accesses are within bounds

  5. All paths through a function produce the correct result type

Let's implement a basic validator:

Let's implement the supporting structures:

Testing Validation

Let's create tests that verify our validation works correctly:

Advanced Features and Extensions

WebAssembly continues to evolve beyond its initial MVP. Several important extensions have been proposed and implemented, including:

  1. Multiple return values - Functions can return multiple values

  2. Bulk memory operations - Efficient memory copying and initialization

  3. Sign-extension operators - Additional integer operations

  4. Reference types - Support for opaque host references

  5. Garbage Collection - Managed objects and automatic memory management

Let's implement support for some of these features:

Multi-Value Support

One of the most significant extensions is multi-value support, allowing functions to return multiple values. This requires updates to our type system:

Implementing Bulk Memory Operations

Bulk memory operations provide efficient ways to initialize and copy memory. Let's implement these in our interpreter:

Now let's implement the bulk memory instructions in our interpreter:


Last updated