High-Performance Hardware Design with HardCaml

Presented by Rachit Nigam

Rachit's goal is to build systems that democratize the design and use of specialized hardware. He is excited to work with folks who are dissatisfied with the current state of tools and techniques for hardware design. He is interested in radically new approaches that combine ideas from programming languages, computer architecture, VLSI, and computer-aided design to address the design, verification, and usability challenges of specialized hardware.

His PhD research has produced three systems: Calyx (a compiler infrastructure adopted by LLVM CIRCT), Filament (a hardware description language with novel type systems that influenced Google's XLS and Jane Street's HardCaml), and Dahlia (a high-level language for predictable accelerator generation).

Building an Embedded Hardware Description Language (eHDL)

Let's build an eHDL. We start with a fundamental question: What is an adder?

Starting with basic simple building blocks, then proceeding to combinational ones.

An adder takes two signals and returns a new signal:

def add(l: signal, r: signal) -> signal:
    # don't know what the implementation is
    ...

Even if you leave it abstract, you can still use it with the definition of the adder. For example, take this function that represents the circuit adder and build a new function called const_mul:

def const_mul(k: int, s: signal) -> signal:
    current_signal = s
    while (k > 0):
        current_signal = add(current_signal, s)
        k -= 1
    return current_signal

It's multiplying with some compile-time constant. There is something that represents the signal, and the signal is a circuit-time value—it's a value that shows up only when the circuit is executing.

But value k is compile-time and actually exists in Python. So the way the code works is when you run const_mul, it's going to use the value k and build a circuit for you by repeatedly calling the add operator on it.

s is guaranteed to be a signal. current_signal at the start is s, or it's add which returns a signal.

This is the magic of embedded DSLs: we have something inside the language called add that builds those circuits for you, and you just get to use the host language—the one you're working in—to build that circuit for you. You can define higher-order functions or libraries.

You can do this with Python with PyMTL, Scala with Chisel, and obviously with OCaml—God's greatest language.

The eHDL Recipe

Takeaway: eHDLs have some recipes going on in building one of these:

Define small set of primitives
Use host language to build useful libraries

You define a small set of primitives like add, subtract, or even registers if you are feeling a little adventurous, and use the features of the host language to build the libraries.

If you look into the API of Chisel and HardCaml or PyMTL, it's a massive API and you question how to implement something this size.

Takeaway: You don't have to—you can build it in the software language itself as long as you define the small set of primitives.

The add, mul are fundamental blocks and higher-level functions are built on top of them.

What's in a Signal?

def add(l: signal, r: signal) -> signal

l can be:

constant
wire or registers
another primitive circuit
...and that's it!

How do you implement the signal type? It's not too hard—you just need those 3 things to be represented.

struct Signal {
    name: String,
    kind: SignalKind  // constant, wire, or primitive circuit
}

Some examples of struct types are:

zero = signal("zero", constant(0))
clock = signal("clock", wire)

Now a function like add, we can define as:

def add(l: signal, r: signal) -> signal:
    return signal("add", primitive_add(l, r))

Why do we need name? The way we build the embedded HDL is everything on the right side of the =. clock is just the name in the host language.

In HardCaml, we have a preprocessor pass that will copy that name for you, but by default you can have clock_0, clock_1, clock_2, and give it the same variable name.

There's a separation going on between the host language like Python and the object/circuit language that we're constructing on the right side.

What we are doing is constructing a computational graph like PyTorch.

We are building the structure of our circuit and capturing it by connecting things together with signal representation.

This is how PyTorch and TensorFlow work—they capture the computational graph that you write down that represents your machine learning kernel, compile it, and execute it (they do a little more but functionally it's the same).

Compiling Programs to RTL

Final step is compiling programs to RTL:

# "Syntax-Directed" compiler
def compile(s: signal) -> RTL:
    match s:
        case const(c):
            return f"'b{c}"
        case add(l, r):
            l_c = compile(l)
            r_c = compile(r)
            return f"{l_c} + {r_c}"

This is a sort of baby's way of building a compiler. Take a signal and produce RTL (RTL is just strings). If the type of the signal is some constant, return 'b "the constant".

If you take the interesting case of the adder, you compile the left side and compile the right side and embed a string that represents the rest of the circuit and recursively compile each dependent.

Because we have collected this computational graph, we recursively go down, compile each component, combine everything and get the output.

Is this readable? Nope. When you dump the Verilog, no one can read it, so it's important for us to add those identifiers.

Lifetime of an eHDL Program

const_mul(10, a) --------                    High-level
                        |                    abstractions
                        v
            add(add(add(..., a))) -------     Primitives
                                         |
                                         v
                                  a + a + ...  RTL

An HDL program starts with some high-level operations with things like const_mul and other APIs that are defined in terms of basic building blocks like add that are then compiled to RTL.

Now you know how to build an eHDL. But the problem is HDL alone isn't enough. We need things like:

Simulator
Verification
Formal verification

One of the big reasons people like Chisel is they give you nice new abstractions that are not available in Verilog. You want to build a simulator that can take your high-level language and give you error messages and simulation reports. You want verification tools and formal tools.

Everything you've learned up to now actually makes it quite trivial to build these tools, and we can do this by building a simulator with the same representation.

Building Multiple Interpretations

What's in a signal? (Redux)

Circuits are "computation graphs"
But we get to define what the base primitives mean

There is nothing inherent in the way we defined add that lends itself to being a circuit. It can be defined or interpreted to anything else, like a little function that when we call it returns the result of our computation. The basic insight is that once you define the base primitives, we can interpret them in different ways.

The simplest interpretation is building circuits, but another interpretation could be defining a set of functions that are going to represent the result of simulation like combinational sequences.

Let's Build an eHDL Simulator

struct Signal {
    compute: Function*,  // signal tracks an update function that represents its simulation behavior
    typ: Type
}

You need a bit more trickery for registers, but this will work for combinational components. You define different signal types and instead of having field id we have a field compute which has a function pointer, and it captures the computation that node is going to do.

def add(l: Signal, r: Signal) -> Signal:
    return Signal {
        compute: fn() {
            l.compute() + r.compute()
        }
    }

For example, with the implementation of add—add computes what the left and right nodes compute and adds the results. You can build, again in a very nice syntax-directed way, the whole simulator. It won't be fast or great, but you can do that by building different interpretations of the signal type. You'll need more functionality for registers, but hopefully you get a flavor of what it takes to build simulators with the same API.

Takeaway: If you do this correctly, you can define a set of base primitives to compile to RTL and that'll give you a circuit generator. You can define what they mean computationally and it'll give you a simulator, or formally and it'll give you a formal validator. You can build high-level libraries and not worry about the step, because the base primitives are the interposing step that allow you to do this.

Trade-off: Larger base primitives library enables more control of the generated RTL on how it's generated or simulated, but it's harder to implement new tools.

HardCaml: The Realization

HardCaml is the realization of this idea. Here's the base type of HardCaml:

Module type Comb_intf.Primitives
// Type required to generate the full combinational API

val mux : t -> t Hardcaml__.Import.list -> t
    (* multiplexer *)

val (+:) : t -> t -> t
    (* addition *)

val (-:) : t -> t -> t
    (* subtraction *)

val (*:) : t -> t -> t
    (* unsigned multiplication *)

val (*+) : t -> t -> t
    (* signed multiplication *)

val (==:) : t -> t -> t
    (* equality *)

val (<:) : t -> t -> t
    (* less than *)

This is all you need to build the combinational component of the HDL. You can get an API of massive size from just these primitives.

There's also a simulator implemented in HardCaml using the kind of idea I described to you, and we have several simulators:

For quick simulations and unit testing, you use CycleSim which can run in-line but is written in OCaml and is slow for big designs
We also have CycleSim_verilator which will compile your designs (those 7 primitives) to get the Verilator verilated components you can simulate

HardCaml has many things implemented:

Multiple simulators
Waveform viewers
Testing frameworks
Formal verification tools
RTL generation

More details available at: https://ocaml.janestreet.com/ocaml-core/v0.12/doc/hardcaml/Hardcaml__/Comb_intf/module-type-S/index.html

Hopefully this style of building DSLs is complete and useful for you to think about.

Why People Build Their Own Embedded Hardware Description Languages

The capability of building reusable generators, popularized by the Chisel language thanks to Berkeley, made it evident that you can build really big designs by building small blocks and composing them.

The other thing that drives people to build their own HDL is it might fit neatly with existing infrastructure. Jane Street has their own mature OCaml ecosystem with testing, verification, and validation stack, so it makes sense to write OCaml and take engineers from different teams, have them contribute to some function that they might care about, and we write the shell to accelerate their computation.

Beyond Combinational Logic

Circuits have state, so they're not technically just computational graphs. HardCaml has this other type called Signal.t which has combinational components, registers, and memory. These 3 things make a full circuit/HDL. Registers are also defined with clock, clear, reset_to, etc.

You can annotate the Signal type to track more information. One of the things you can build is clock domain tracking on the signal type. Notionally, the only change you have to do is redefine the six primitives to track clock domains and give error messages when they mismatch, and all of the existing code can benefit from clock domain checking.

PreviousProgrammable Hardware: A Conversation with Andy Ray NextNext steps

Last updated 4 months ago