FPGA Engineering
This crash course will guide you from first principles to shipping FPGA-based systems with HardCaml, OCaml, and industry-standard flows.
Overview
My approach to hardware design for this project is to treat it like software: modular, testable, and version‑controlled. If you’re new to FPGAs like me then this guide skips jargon and assumes no prior hardware experience, while still preparing you for industry-grade projects. By the end, you’ll design RTL, debug timing violations, and deploy systems with PCIe, Ethernet, and custom accelerators. Here are some quick goals I'm trying to achieve:
Get practical experience in Digital design, Verilog/SystemVerilog, FPGA toolchains (Vivado/Quartus).
Work on real world projects i.e. UART, FIFOs, RISC-V CPUs, Ethernet MACs.
Get comfortable with PCIe, SERDES integration, and high performance ethernet/infiniband.
Read through at your own pace and feel free to skip over few goals, but aim to spend 1–2 weeks on this. Adjust based on your background since it's an overkill of a crash course.
# [TODOs] Digital Logic Basics
Interactive Learning: HDLBits (solve puzzles in Verilog).
Verilog Essentials : chipverify.com/verilog/verilog-tutorial
Free Courses:
Digital Electronics (All About Circuits).
Nandland (FPGA tutorials for beginners).
Try using a logic simulator (e.g., EDA Playground) to build a 4-bit adder. Simulate inputs and verify outp
Open Source Chip Design - Course
Build a minimal GPU design in Verilog - adam-maj/tiny-gpu
FPGA Primer
We'll be using the Arty A7 hobbyist FPGA board, with the design expressed using HardCaml, an OCaml library for creating hardware designs, and driven by an embedded software stack written in OCaml and libraries that may or may not exist depending on the use-case.
Resources:
FPGA-101: Introduction to FPGAs, Learn the Basics
Learn-fpga: Learning FPGA, yosys, nextpnr, and RISC-V - GitHub
Learning Verilog and FPGA - LINK
Basics of FPGA Design - fpgatutorial.com
FPGA Design Elements - fpgacpu.ca/fpga
Implementing FizzBuzz on an FPGA - LINK
University courses on FPGAs - LINK
University of Pennsylvania ESE5320 System-on-a-Chip Architecture - Fall 2024
Cornell's ECE 5760: Advanced Microcontrollers & FPGA - LINK
Understand what it takes to build a RISC-V Assembler in FPGAs - LINK
[TODOs] Study Plan
This study plan is tailored to build a rock-solid foundation in FPGAs and equip you to tackle production-grade challenges.
Phase 1: Foundation
Goals: Establish a robust FPGA development environment and master basic RTL design. Tasks:
Hardware & Tools
Acquire a Cora Z7 (Zynq-7000) or similar.
Install Xilinx Vivado Design Suite (WebPACK).
Verify board connectivity and JTAG operation.
First Project: Blink LED
Create Vivado project targeting Zynq part.
Develop a parameterized Verilog module to blink LEDs at varying frequencies.
Synthesize, implement, generate bitstream, and program the board.
RTL Code Review
Analyze demo code: identify FFs, LUTs, and interconnect.
Produce a one-page walkthrough highlighting RTL-to-gate translation.
Goals: Working blink-LED bitstream
Phase 2: Synthesis & Implementation
Goals: Gain deep insights into how synthesis and implementation transform RTL into silicon-ready designs. Tasks:
Synthesis Deep Dive
Run synthesis on a 16-bit pipelined adder design.
Extract and interpret synthesis reports: area utilization, timing estimates.
Implementation Flow
Perform Translate, Map, and Place & Route steps on the adder.
Compare mapped vs. routed resource usage.
Experiment with basic floorplanning: constrain logic regions and re-run P&R.
Version-Controlled Experiments
Maintain branches for different synthesis and implementation constraint strategies.
Document changes and their impact on resource and timing metrics.
Phase 3: Timing Analysis & Constraints
Goals: Master Static Timing Analysis (STA) and constraint creation for multi-clock designs. Tasks:
STA Fundamentals
Analyze timing report: slack, setup and hold violations.
Identify critical paths in a dual-clock FIFO design.
Constraint Creation
Write XDC constraints:
create_clock
,create_generated_clock
,set_false_path
, andcreate_clock_groups
.Simulate PCB trace delays by adding input/output I/O delay constraints.
Optimization Loop
Resolve setup violations via pipelining and retiming.
Address hold-time issues with appropriate constraints or register insertion.
Deliverables: Annotated XDC files covering all constraint types and benchmark timing closure report showing before/after optimizations.
Phase 4: High-Level Synthesis
Goals: Leverage high-level synthesis (HLS) tools to accelerate design productivity. Tasks:
Chisel Lab
Implement a parameterized FIFO in Chisel; generate Verilog.
Synthesize & verify functional equivalence with Vivado.
Lava & Bluespec
Build a simple counter in Lava; integrate into Vivado project.
Create a small state machine in Bluespec; compare resource utilization.
Analysis & Comparison
Summarize productivity gains, code clarity, and resource overhead for each HLS tool.
Deliverables: HLS code repositories for Chisel, Lava, and Bluespec.
Phase 5: Embedded Integration
Goals: Combine programmable logic (PL) with processing system (PS) to build embedded FPGA solutions. Tasks:
Ethernet MAC Implementation
Add Xilinx Ethernet MAC IP core; connect to onboard PHY.
Validate link-up and basic frame send/receive with loopback test.
PetaLinux Platform Setup
Configure and build a PetaLinux project for the Zynq PS.
Boot Linux on the PS and verify console access via UART.
Data Streaming Demo
Develop a PL-PS interface (AXI-Stream) to send sensor or test data over Ethernet.
Write a simple Linux user-space application to receive and display the data.
Deliverables: Ethernet + PetaLinux integration demo video or live demo.
Phase 6: Capstone
Goals: Showcase your expertise with a polished capstone project and portfolio. Tasks:
Select an Advanced Interface
Options: PCIe endpoint, DDR memory controller, or custom high-speed protocol.
Design & Implement
Full RTL development, synthesis, implementation, and verification.
Integrate with PS if applicable (e.g., DMA transfer to/from DDR).
Documentation & Presentation
Create detailed GitHub repository: code, issues, CI scripts, and README.
Write a blog-style project report highlighting challenges and solutions.
Self-Assessment
Prepare explanations of your design choices, trade-offs, and performance metrics.
Practice whiteboard sessions on timing closure and RTL architecture.
Deliverables:
Contribute to open-source FPGA projects (e.g., LiteX, SymbiFlow).
Read research papers on next-generation FPGA architectures.
Projects:
FSM implementation.
Load Balancer on FPGA - a Hardcaml Project.
Running Hardcaml on an Actual FPGA - Blog by Ceramic Hacker.
A low-speed communication protocol implementation (UART/SPI/I2C).
Memory management protocol implementations (AXI/AVALON).
DSP pipeline implementation.
IP integration.
Constraint/Pinout application
10Gb Ethernet switch IP with virtual packet FIFOs
HDMI and SATA controller implementations
GTX transceiver controllers for high-speed interfaces
Universal I2C controllers for OLED displays and temperature sensors
Verification through testcase simulation and Verification modules that implement protocols.
A verification module that automatically generates I2C master data to verify an I2C slave that you implemented in RTL. By calling a task (like i2c_master.send_write_request(data)), automatically generate different write/read requests to simulate functionality.
Setup communication emulators on a PC and view the incoming and outgoing data to determine if it is functional after synthesizing and implementing your code onto the FPGAs.
Algorithm implementation: cryptographic algorithms, image processing algorithms, or any DSP algorithm.
Heterogeneous FPGA cluster for machine learning, graph processing, etc.
Transfer data from some source to the FPGA over a low speed communication protocol, then take that data, process it using pipeline and FSMs, then send the data back out to a destination, all for a specific and demonstratable purpose.
Designing mini IP cores in the areas of networking and video coding.
Implement a TPU: Try to build up a TPU yourself based on the original paper and use that project as a guide if you get stuck. You can also look at writing your HDL to be scalable (i.e. making the TPU WxH size configurable) and verify that you see the same performance characteristics as the actual TPU (scaled to your hardware - clocks and bandwidth).
Know how to build a FIFO. Sync and async. Read the sunburst pdf for this. Write one, write a testbench, be able to write it again mostly from memory.
Misc
Explore cutting edge of hardware design that's close to the metal to create custom silicon solutions. Try writing purposeful stacks in C for custom ASICs to achieve error-free roofline performance, to implementing complex networking solutions directly in hardware. Check out Jim Keller's podcasts while you're at it.
Learning from First Principles
One of the most best approaches for dealing with low-level to learning electrical engineering comes from those who tackle it from first principles. Try building a 10μm chip fabrication facility in your dorm room, creating an open-source chip fab in the process.
The journey of learning often includes designing ASICs for various applications. Current projects in this space include burning a GPT-2 layer directly onto silicon, developing TPU-like systolic arrays, and creating neuromorphic chips specifically designed for neural implants. These projects demonstrate the diverse applications of custom silicon design, from machine learning acceleration to biomedical applications. An interesting case study in unconventional substrate materials can be found in the article "28Gbps Microstrip With Pepper Jack Cheese as Substrate". Also try implementing an entire TCP stack on an FPGA, complete with an InfiniBand backbone for streaming particle events from multiple sensors for the Compact Muon Solenoid experiments. This level of integration demonstrates how experimental particle physics has become the wild west of electronics and high-performance computing, with some suggesting these technologies could be repurposed for applications like market making rather than relying solely on research funding.
Networking FPGA Engine
Explore different approaches to networking, developing everything from compute sleds and cabled backplanes to custom switches and complete software stacks from the lowest firmware levels to end-user experiences. Write VHDL code that underpins network data planes running at terabit speeds, designing and implementing both fixed network functions and programmable soft cores.
Try working with operating systems engineers to develop drivers for FPGA-based network data plane devices, collaborating with compiler engineers to target P4/Rust-like languages to FPGA soft cores, and working with hardware engineers on board and signal path development around FPGA chips. Get experience with high-speed network functions using FPGAs, understand Ethernet at the serdes/PCS/MAC level, and have worked with PCIe from both HDL and operating system perspectives.
Companies like Oxide Computer Company work on this stuff: Oxide and Friends discussions on rack-scale networking. Additional insights can be found in their episodes on Building a Rack Scale Computer with P4 at the Core, DTrace + P4, and The Power of Proto Boards for rapid board design iteration.
Electrical Engineering
Having some hands-on experience with complicated designs throughout the entire lifecycle from concept through sustaining engineering is a massive plus. Collaborate closely with software engineers to co-design systems that solve problems through hardware-software cooperation. Work with mechanical engineers to design servers, switches, and racks that function as integrated systems, addressing challenges from thermal management to cabling solutions.
Try designing and developing electronic circuits for high-speed boards, simulating and testing board designs, specifying and supporting functional tests in manufacturing, and ensuring compliance with all industry standards and regulatory requirements. Approach schematic entry as a craft, understanding that readable schematics facilitate easier review and comprehension.
Educational Resources and Learning Materials
Fundamental Texts
The foundation of hardware engineering knowledge often begins with classic texts like "Practical Electronics for Inventors" and "The Art of Electronics." For those diving into HDL specifically, resources like Appendix A on "Hardware Description Languages" from Weste and Harris's "CMOS VLSI Design: A Circuits and Systems Perspective" provide essential background knowledge. The IEEE Standard 1800-2012 serves as the definitive reference for SystemVerilog.
University Resources and Seminars
Academic institutions offer valuable resources, such as the University of Toronto FPGA Seminar Series, which provides ongoing education in FPGA technologies. Historical presentations remain relevant, including:
Carnegie Mellon University offers 18-643 Reconfigurable Logic: Technology, Architecture and Applications.
Online Learning Platforms
Modern learning resources include:
EDA Playground for Verilog tutorials with accompanying YouTube playlist
Research Papers and Articles
Notable papers include:
Papers by Cliff Cummings and Don Mills
Open Source Tools and Projects
gEDA Suite
The gEDA project has produced a comprehensive GPL'd suite of Electronic Design Automation tools. The gEDA wiki provides extensive documentation for these tools used in electrical circuit design, schematic capture, simulation, prototyping, and production.
Project IceStorm and SymbiFlow
Project IceStorm focuses on reverse engineering and documenting the bitstream format of Lattice iCE40 FPGAs. The IceStorm flow, incorporating Yosys, Arachne-pnr, and IceStorm itself, represents a fully open source Verilog-to-Bitstream flow. SymbiFlow extends this concept to support Xilinx 7-Series FPGAs, with Project X-Ray documenting the Xilinx 7-series bitstream format.
CPU and System Implementations
The open source community has produced several notable CPU implementations:
PicoRV32 - A size-optimized RISC-V CPU
J2 core - A cleanroom reimplementation of the SH-2 ISA with extensions
Nyuzi Processor - GPGPU processor with SystemVerilog FPGA implementation
NetFPGA - Network hardware development platform
OH! Open Hardware for Chip Designers - Silicon proven Verilog library
Community Organizations
FOSSi Foundation - The Free and Open Source Silicon Foundation
FuseSoC - Package manager and build tools for HDL code
LibreCores - Free and Open Source Digital Hardware
OpenCores - Open Source Hardware Community
SystemVerilog Tester - Test suite for SystemVerilog compliance
ZipCPU Ecosystem
The ZipCPU represents a fully functional, pipelined 32-bit CPU designed specifically for resource-constrained FPGA environments. This project includes comprehensive toolchain support through GCC, Binutils, and Newlib. For more information, see "A Quick Introduction to the ZipCPU's Instruction Set", "Instructions for building the GCC-based toolchain", and "Introducing the ZipCPU v3.0".
ZipCPU System Implementations
Several complete systems demonstrate the ZipCPU's versatility:
S6SoC - Demonstration on smallest FPGAs
ArrowZip - Implementation on MAX-1000 board
OpenArty - Showcases ZipCPU and AutoFPGA on Digilent Arty A7
ZBasic - Bare-bones minimal system for beginners
VideoZIP - HDMI receive/transmit system
ICOZip - iCE40 implementation using open source toolchain
Peripheral Support
The ZipCPU ecosystem includes extensive peripheral support:
UART controller - Basic and fully featured serial controllers
Wishbone-based scope - Internal logic analyzer
QSPI Flash controller - Universal flash controller with XIP
I2C Controller - Master, slave, and universal implementations
GPS schooled clock - Sub-microsecond precision timing
RTC Clock - Hours/minutes/seconds with timer functionality
WBPMIC - MEMs microphone PMod controller
WB Oled - PMod OLEDrgb display controller
ICAPE Interface - Xilinx configuration port access
VGA Simulator - GTK-based VGA/HDMI simulation
DSP and Signal Processing
CORDIC - Core generator for various sinewave generators
DSP Filters - Digital filter implementations
Interpolation - Piecewise polynomial interpolation
Double Clock FFT - Pipelined FFT/IFFT generator
FFT Demo - Spectral raster display demonstration
High-Level Synthesis and Advanced Topics
HLS Resources
High-Level Synthesis resources include:
hlslib - Extensions for Vivado HLS and Intel FPGA OpenCL
Productive Parallel Programming for FPGA with High Level Synthesis Tutorial by Torsten Hoefler and Johannes de Fine Licht
SoC Performance Architecture
Modern SoC performance architecture insights can be found in Indraneil Gokhale's SoC Performance Architecture 101. The field continues evolving with vendor-agnostic tools like those offered at http://caas.symbioticeda.com.
Image Processing and Specialized Applications
FPGAs excel in image processing applications. Resources include specialized tutorials on building image processing chains, sensor selection, and pipeline creation using MicroBlaze V RISC-V microcontrollers for control and configuration.
Formal Methods Training
Introduction to Formal Methods courses teach Verilog and VHDL developers how to use SymbiYosys in a "formal first" strategy. These two-day courses cover formal verification from basics through bounded model checking and induction steps, addressing specific topics like dissimilar clocking, abstraction, invariants, and arbitrary values.
Community Forums and Platforms
Active community platforms include:
Advanced Reading and References
Computer Architecture and Low-Level Programming
RISC-V Resources
Additional Resources
Academic Papers and Articles
Key academic contributions to the field include:
"Global is the New Local: FPGA Architecture at 5nm and Beyond" (ACM/SIGDA FPGA 2021) by Stefan Nikolić et al. DOI: 10.1145/3431920.3439300, GitHub
"A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform" (2016) DOI: 10.1109/MM.2016.18
"Fundamental Underpinnings of Reconfigurable Computing Architectures" (2015) IEEE Xplore
"Measuring the Gap Between FPGAs and ASICs" (2007) IEEE
"Reconfigurable Computing Architectures" (2015) IEEE
"Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology" (2015) IEEE
"Trends in Reconfigurable Computing: Applications and Architectures" (2015) by Lesley Shannon et al. PDF
"Xilinx and the Birth of the Fabless Semiconductor Industry" (2013) by Steve Leibson PDF
References
ACM SIGDA Technical Committee on FPGAs (TCFPGA) Hall of Fame - Reading List. Available at: http://hof.tcfpga.org/reading-list/
"FPGAs and Open-Source Hardware - An Intro" (Meeting C++ 2016). Available at: https://speakerdeck.com/mattpd/fpgas-and-open-source-hardware-an-intro-meeting-c-plus-plus-2016
Last updated