FPGA Engineering

This crash course will guide you from first principles to shipping FPGA-based systems with HardCaml, OCaml, and industry-standard flows.


Overview

My approach to hardware design for this project is to treat it like software: modular, testable, and version‑controlled. If you’re new to FPGAs like me then this guide skips jargon and assumes no prior hardware experience, while still preparing you for industry-grade projects. By the end, you’ll design RTL, debug timing violations, and deploy systems with PCIe, Ethernet, and custom accelerators. Here are some quick goals I'm trying to achieve:

  1. Get practical experience in Digital design, Verilog/SystemVerilog, FPGA toolchains (Vivado/Quartus).

  2. Work on real world projects i.e. UART, FIFOs, RISC-V CPUs, Ethernet MACs.

  3. Get comfortable with PCIe, SERDES integration, and high performance ethernet/infiniband.

Read through at your own pace and feel free to skip over few goals, but aim to spend 1–2 weeks on this. Adjust based on your background since it's an overkill of a crash course.


# [TODOs] Digital Logic Basics


FPGA Primer

We'll be using the Arty A7 hobbyist FPGA board, with the design expressed using HardCaml, an OCaml library for creating hardware designs, and driven by an embedded software stack written in OCaml and libraries that may or may not exist depending on the use-case.

Resources:

  • FPGA-101: Introduction to FPGAs, Learn the Basics

  • Learn-fpga: Learning FPGA, yosys, nextpnr, and RISC-V - GitHub

  • Learning Verilog and FPGA - LINK

  • Basics of FPGA Design - fpgatutorial.com

  • FPGA Design Elements - fpgacpu.ca/fpga

  • Implementing FizzBuzz on an FPGA - LINK

  • University courses on FPGAs - LINK

  • University of Pennsylvania ESE5320 System-on-a-Chip Architecture - Fall 2024

  • Cornell's ECE 5760: Advanced Microcontrollers & FPGA - LINK

  • Understand what it takes to build a RISC-V Assembler in FPGAs - LINK


[TODOs] Study Plan

This study plan is tailored to build a rock-solid foundation in FPGAs and equip you to tackle production-grade challenges.

Phase 1: Foundation

Goals: Establish a robust FPGA development environment and master basic RTL design. Tasks:

  1. Hardware & Tools

    • Acquire a Cora Z7 (Zynq-7000) or similar.

    • Install Xilinx Vivado Design Suite (WebPACK).

    • Verify board connectivity and JTAG operation.

  2. First Project: Blink LED

    • Create Vivado project targeting Zynq part.

    • Develop a parameterized Verilog module to blink LEDs at varying frequencies.

    • Synthesize, implement, generate bitstream, and program the board.

  3. RTL Code Review

    • Analyze demo code: identify FFs, LUTs, and interconnect.

    • Produce a one-page walkthrough highlighting RTL-to-gate translation.

Goals: Working blink-LED bitstream

Phase 2: Synthesis & Implementation

Goals: Gain deep insights into how synthesis and implementation transform RTL into silicon-ready designs. Tasks:

  1. Synthesis Deep Dive

    • Run synthesis on a 16-bit pipelined adder design.

    • Extract and interpret synthesis reports: area utilization, timing estimates.

  2. Implementation Flow

    • Perform Translate, Map, and Place & Route steps on the adder.

    • Compare mapped vs. routed resource usage.

    • Experiment with basic floorplanning: constrain logic regions and re-run P&R.

  3. Version-Controlled Experiments

    • Maintain branches for different synthesis and implementation constraint strategies.

    • Document changes and their impact on resource and timing metrics.

Phase 3: Timing Analysis & Constraints

Goals: Master Static Timing Analysis (STA) and constraint creation for multi-clock designs. Tasks:

  1. STA Fundamentals

    • Analyze timing report: slack, setup and hold violations.

    • Identify critical paths in a dual-clock FIFO design.

  2. Constraint Creation

    • Write XDC constraints: create_clock, create_generated_clock, set_false_path, and create_clock_groups.

    • Simulate PCB trace delays by adding input/output I/O delay constraints.

  3. Optimization Loop

    • Resolve setup violations via pipelining and retiming.

    • Address hold-time issues with appropriate constraints or register insertion.

Deliverables: Annotated XDC files covering all constraint types and benchmark timing closure report showing before/after optimizations.

Phase 4: High-Level Synthesis

Goals: Leverage high-level synthesis (HLS) tools to accelerate design productivity. Tasks:

  1. Chisel Lab

    • Implement a parameterized FIFO in Chisel; generate Verilog.

    • Synthesize & verify functional equivalence with Vivado.

  2. Lava & Bluespec

    • Build a simple counter in Lava; integrate into Vivado project.

    • Create a small state machine in Bluespec; compare resource utilization.

  3. Analysis & Comparison

    • Summarize productivity gains, code clarity, and resource overhead for each HLS tool.

Deliverables: HLS code repositories for Chisel, Lava, and Bluespec.

Phase 5: Embedded Integration

Goals: Combine programmable logic (PL) with processing system (PS) to build embedded FPGA solutions. Tasks:

  1. Ethernet MAC Implementation

    • Add Xilinx Ethernet MAC IP core; connect to onboard PHY.

    • Validate link-up and basic frame send/receive with loopback test.

  2. PetaLinux Platform Setup

    • Configure and build a PetaLinux project for the Zynq PS.

    • Boot Linux on the PS and verify console access via UART.

  3. Data Streaming Demo

    • Develop a PL-PS interface (AXI-Stream) to send sensor or test data over Ethernet.

    • Write a simple Linux user-space application to receive and display the data.

Deliverables: Ethernet + PetaLinux integration demo video or live demo.

Phase 6: Capstone

Goals: Showcase your expertise with a polished capstone project and portfolio. Tasks:

  1. Select an Advanced Interface

    • Options: PCIe endpoint, DDR memory controller, or custom high-speed protocol.

  2. Design & Implement

    • Full RTL development, synthesis, implementation, and verification.

    • Integrate with PS if applicable (e.g., DMA transfer to/from DDR).

  3. Documentation & Presentation

    • Create detailed GitHub repository: code, issues, CI scripts, and README.

    • Write a blog-style project report highlighting challenges and solutions.

  4. Self-Assessment

    • Prepare explanations of your design choices, trade-offs, and performance metrics.

    • Practice whiteboard sessions on timing closure and RTL architecture.

Deliverables:

  • Contribute to open-source FPGA projects (e.g., LiteX, SymbiFlow).

  • Read research papers on next-generation FPGA architectures.


Projects:

  • FSM implementation.

  • Load Balancer on FPGA - a Hardcaml Project.

  • Running Hardcaml on an Actual FPGA - Blog by Ceramic Hacker.

  • A low-speed communication protocol implementation (UART/SPI/I2C).

  • Memory management protocol implementations (AXI/AVALON).

  • DSP pipeline implementation.

  • IP integration.

  • Constraint/Pinout application

  • 10Gb Ethernet switch IP with virtual packet FIFOs

  • HDMI and SATA controller implementations

  • GTX transceiver controllers for high-speed interfaces

  • Universal I2C controllers for OLED displays and temperature sensors

  • Verification through testcase simulation and Verification modules that implement protocols.

  • A verification module that automatically generates I2C master data to verify an I2C slave that you implemented in RTL. By calling a task (like i2c_master.send_write_request(data)), automatically generate different write/read requests to simulate functionality.

  • Setup communication emulators on a PC and view the incoming and outgoing data to determine if it is functional after synthesizing and implementing your code onto the FPGAs.

  • Algorithm implementation: cryptographic algorithms, image processing algorithms, or any DSP algorithm.

  • Heterogeneous FPGA cluster for machine learning, graph processing, etc.

  • Transfer data from some source to the FPGA over a low speed communication protocol, then take that data, process it using pipeline and FSMs, then send the data back out to a destination, all for a specific and demonstratable purpose.

  • Designing mini IP cores in the areas of networking and video coding.

  • Implement a TPU: Try to build up a TPU yourself based on the original paper and use that project as a guide if you get stuck. You can also look at writing your HDL to be scalable (i.e. making the TPU WxH size configurable) and verify that you see the same performance characteristics as the actual TPU (scaled to your hardware - clocks and bandwidth).

  • Know how to build a FIFO. Sync and async. Read the sunburst pdf for this. Write one, write a testbench, be able to write it again mostly from memory.


Misc

Explore cutting edge of hardware design that's close to the metal to create custom silicon solutions. Try writing purposeful stacks in C for custom ASICs to achieve error-free roofline performance, to implementing complex networking solutions directly in hardware. Check out Jim Keller's podcasts while you're at it.

Learning from First Principles

One of the most best approaches for dealing with low-level to learning electrical engineering comes from those who tackle it from first principles. Try building a 10μm chip fabrication facility in your dorm room, creating an open-source chip fab in the process.

The journey of learning often includes designing ASICs for various applications. Current projects in this space include burning a GPT-2 layer directly onto silicon, developing TPU-like systolic arrays, and creating neuromorphic chips specifically designed for neural implants. These projects demonstrate the diverse applications of custom silicon design, from machine learning acceleration to biomedical applications. An interesting case study in unconventional substrate materials can be found in the article "28Gbps Microstrip With Pepper Jack Cheese as Substrate". Also try implementing an entire TCP stack on an FPGA, complete with an InfiniBand backbone for streaming particle events from multiple sensors for the Compact Muon Solenoid experiments. This level of integration demonstrates how experimental particle physics has become the wild west of electronics and high-performance computing, with some suggesting these technologies could be repurposed for applications like market making rather than relying solely on research funding.

Networking FPGA Engine

Explore different approaches to networking, developing everything from compute sleds and cabled backplanes to custom switches and complete software stacks from the lowest firmware levels to end-user experiences. Write VHDL code that underpins network data planes running at terabit speeds, designing and implementing both fixed network functions and programmable soft cores.

Try working with operating systems engineers to develop drivers for FPGA-based network data plane devices, collaborating with compiler engineers to target P4/Rust-like languages to FPGA soft cores, and working with hardware engineers on board and signal path development around FPGA chips. Get experience with high-speed network functions using FPGAs, understand Ethernet at the serdes/PCS/MAC level, and have worked with PCIe from both HDL and operating system perspectives.

Companies like Oxide Computer Company work on this stuff: Oxide and Friends discussions on rack-scale networking. Additional insights can be found in their episodes on Building a Rack Scale Computer with P4 at the Core, DTrace + P4, and The Power of Proto Boards for rapid board design iteration.

Electrical Engineering

Having some hands-on experience with complicated designs throughout the entire lifecycle from concept through sustaining engineering is a massive plus. Collaborate closely with software engineers to co-design systems that solve problems through hardware-software cooperation. Work with mechanical engineers to design servers, switches, and racks that function as integrated systems, addressing challenges from thermal management to cabling solutions.

Try designing and developing electronic circuits for high-speed boards, simulating and testing board designs, specifying and supporting functional tests in manufacturing, and ensuring compliance with all industry standards and regulatory requirements. Approach schematic entry as a craft, understanding that readable schematics facilitate easier review and comprehension.

Educational Resources and Learning Materials

Fundamental Texts

The foundation of hardware engineering knowledge often begins with classic texts like "Practical Electronics for Inventors" and "The Art of Electronics." For those diving into HDL specifically, resources like Appendix A on "Hardware Description Languages" from Weste and Harris's "CMOS VLSI Design: A Circuits and Systems Perspective" provide essential background knowledge. The IEEE Standard 1800-2012 serves as the definitive reference for SystemVerilog.

University Resources and Seminars

Academic institutions offer valuable resources, such as the University of Toronto FPGA Seminar Series, which provides ongoing education in FPGA technologies. Historical presentations remain relevant, including:

Online Learning Platforms

Modern learning resources include:

Research Papers and Articles

Notable papers include:

Open Source Tools and Projects

gEDA Suite

The gEDA project has produced a comprehensive GPL'd suite of Electronic Design Automation tools. The gEDA wiki provides extensive documentation for these tools used in electrical circuit design, schematic capture, simulation, prototyping, and production.

Project IceStorm and SymbiFlow

Project IceStorm focuses on reverse engineering and documenting the bitstream format of Lattice iCE40 FPGAs. The IceStorm flow, incorporating Yosys, Arachne-pnr, and IceStorm itself, represents a fully open source Verilog-to-Bitstream flow. SymbiFlow extends this concept to support Xilinx 7-Series FPGAs, with Project X-Ray documenting the Xilinx 7-series bitstream format.

CPU and System Implementations

The open source community has produced several notable CPU implementations:

Community Organizations

ZipCPU Ecosystem

The ZipCPU represents a fully functional, pipelined 32-bit CPU designed specifically for resource-constrained FPGA environments. This project includes comprehensive toolchain support through GCC, Binutils, and Newlib. For more information, see "A Quick Introduction to the ZipCPU's Instruction Set", "Instructions for building the GCC-based toolchain", and "Introducing the ZipCPU v3.0".

ZipCPU System Implementations

Several complete systems demonstrate the ZipCPU's versatility:

  • S6SoC - Demonstration on smallest FPGAs

  • ArrowZip - Implementation on MAX-1000 board

  • OpenArty - Showcases ZipCPU and AutoFPGA on Digilent Arty A7

  • ZBasic - Bare-bones minimal system for beginners

  • VideoZIP - HDMI receive/transmit system

  • ICOZip - iCE40 implementation using open source toolchain

Peripheral Support

The ZipCPU ecosystem includes extensive peripheral support:

DSP and Signal Processing

High-Level Synthesis and Advanced Topics

HLS Resources

High-Level Synthesis resources include:

SoC Performance Architecture

Modern SoC performance architecture insights can be found in Indraneil Gokhale's SoC Performance Architecture 101. The field continues evolving with vendor-agnostic tools like those offered at http://caas.symbioticeda.com.

Image Processing and Specialized Applications

FPGAs excel in image processing applications. Resources include specialized tutorials on building image processing chains, sensor selection, and pipeline creation using MicroBlaze V RISC-V microcontrollers for control and configuration.

Formal Methods Training

Introduction to Formal Methods courses teach Verilog and VHDL developers how to use SymbiYosys in a "formal first" strategy. These two-day courses cover formal verification from basics through bounded model checking and induction steps, addressing specific topics like dissimilar clocking, abstraction, invariants, and arbitrary values.

Community Forums and Platforms

Active community platforms include:

Advanced Reading and References

Computer Architecture and Low-Level Programming

RISC-V Resources

Additional Resources

Academic Papers and Articles

Key academic contributions to the field include:

  • "Global is the New Local: FPGA Architecture at 5nm and Beyond" (ACM/SIGDA FPGA 2021) by Stefan Nikolić et al. DOI: 10.1145/3431920.3439300, GitHub

  • "A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform" (2016) DOI: 10.1109/MM.2016.18

  • "Fundamental Underpinnings of Reconfigurable Computing Architectures" (2015) IEEE Xplore

  • "It's an FPGA!" (2011) by P. Alfke et al., IEEE Solid-State Circuits Magazine PDF / IEEE

  • "Measuring the Gap Between FPGAs and ASICs" (2007) IEEE

  • "Reconfigurable Computing Architectures" (2015) IEEE

  • "Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology" (2015) IEEE

  • "Trends in Reconfigurable Computing: Applications and Architectures" (2015) by Lesley Shannon et al. PDF

  • "Xilinx and the Birth of the Fabless Semiconductor Industry" (2013) by Steve Leibson PDF

References

  1. ACM SIGDA Technical Committee on FPGAs (TCFPGA) Hall of Fame - Reading List. Available at: http://hof.tcfpga.org/reading-list/

  2. "FPGAs and Open-Source Hardware - An Intro" (Meeting C++ 2016). Available at: https://speakerdeck.com/mattpd/fpgas-and-open-source-hardware-an-intro-meeting-c-plus-plus-2016


Last updated