Glossary
Project Glossary
This document provides definitions for terms, acronyms, and concepts used throughout the OCP MX Streaming MAC Unit project.
OCP MX Specification & Formats
OCP MX: Open Compute Project Microscaling Formats, a specification for block-based scaling in deep learning.
MXFP8 / MXFP6 / MXFP4: 8-bit, 6-bit, and 4-bit floating-point formats defined under the OCP MX specification.
MXINT8: 8-bit integer formats (standard and symmetric) supported by OCP MX.
E4M3: An 8-bit float format with 1 sign bit, 4 exponent bits (Bias 7), and 3 mantissa bits.
E5M2: An 8-bit float format with 1 sign bit, 5 exponent bits (Bias 15), and 2 mantissa bits.
E2M1 (FP4): A 4-bit float format with 1 sign bit, 2 exponent bits (Bias 1), and 1 mantissa bit.
UE8M0: An 8-bit unsigned biased exponent format (Bias 127) used for shared scaling factors.
Block Size (k): The number of elements that share a single scaling factor (fixed at 32 in this design).
Shared Scale: A single scaling factor applied to a block of elements to reduce memory bandwidth.
Numerical Logic & Arithmetic
BM (Block Max): The element with the largest magnitude in a block, whose exponent determines the shared scale.
NBM (Non-Block Max): Any element in a block that is not the Block Max.
MX+: An extension to OCP MX that repurposes the exponent bits of the Block Max element for additional mantissa precision.
MX++: A further extension allowing Non-Block Max elements to use a finer quantization grid via a shared exponent offset (Decoupled Shared Scaling).
SEZ (Shared Exponent Zero): A flushing rule where the entire block is treated as zero if the shared scale is zero.
RNE (Round-to-Nearest-Ties-to-Even): A rounding mode that rounds to the nearest value, breaking ties by rounding to the nearest even number.
TRN (Truncate): A rounding mode that rounds towards zero.
CEL (Ceil): A rounding mode that rounds towards positive infinity.
FLR (Floor): A rounding mode that rounds towards negative infinity.
SAT (Saturation): An overflow handling method that clamps out-of-range values to the maximum or minimum representable values.
WRAP (Wrapping): An overflow handling method that uses modulo arithmetic (ignoring overflow bits).
LNS (Logarithmic Number System)
LNS Multiplier: A multiplier that performs multiplication by adding logarithms.
Mitchell’s Approximation: A linear approximation of \(\log_2(1+M) \approx M\), enabling low-area multiplier designs.
Precise LNS: An LNS implementation that uses a lookup table (LUT) to correct the Mitchell approximation error.
Bit-Serial & Tiny-Serial
Tiny-Serial: A bit-serial configuration of the OCP MX MAC unit designed for extreme area efficiency.
SERIAL_K_FACTOR: The scaling factor for internal clock cycles in the bit-serial implementation (e.g., K=8).
Stretched Protocol: A protocol where each 8-bit data load/unload phase is extended by \(K\) internal cycles to allow bit-serial processing.
Strobe: A control signal that marks the start of a multi-cycle bit-serial operation (e.g., the first bit of a byte).
Protocol & Hardware Architecture
Streaming MAC: A Multiply-Accumulate unit that processes elements sequentially over a timed protocol.
Fast Start (Scale Compression): A protocol optimization that allows the reuse of previously loaded scales and formats to increase throughput.
Vector Packing: An optimization for 4-bit formats (like MXFP4) where two elements are packed into a single 8-bit input cycle.
Input Buffering: A 16-entry FIFO that allows burst-loading packed 4-bit data in 16 cycles while processing internally over 32 cycles.
Aligner: Hardware logic that shifts the product of a multiplication to align its binary point with the accumulator.
Accumulator: A high-precision register (24-bit or 32-bit) that stores the running sum of products.
FSM (Finite State Machine): The control logic that manages the operational sequence (41 logical cycles).
Tooling & Technology
Tiny Tapeout: An educational project that facilitates low-cost ASIC manufacturing.
IHP SG13G2: A 130nm BiCMOS Open Source Process Design Kit (PDK) from IHP Microelectronics.
PDK (Process Design Kit): A set of files used to describe a semiconductor process for EDA tools.
GDS (Graphic Data System): The standard file format for exchanging integrated circuit layout data.
Cocotb: A coroutine-based cosimulation framework for writing VHDL and Verilog testbenches in Python.
Icarus Verilog: An open-source Verilog simulation and synthesis tool.
Yosys: An open-source framework for Verilog RTL synthesis.
RISC-V & SERV Integration
SERV: An award-winning bit-serial RISC-V core.
OCP-MX-V: A proposed custom RISC-V ISA extension (opcode 0x0b) for OCP MX operations.
Variant A (Extension Interface): A coprocessor integration method using SERV’s standard extension interface.
Variant B (Internal Snooping): A tightly coupled integration method that snoops the 1-bit register file streams directly.
ZvfofpXmin: A proposed RISC-V extension name for minimal vector floating-point support.
vmxfmt: A custom CSR (Control and Status Register) proposed to configure OCP MX formats.
vxsat: The standard RISC-V Vector Fixed-Point Saturation Flag.
frm: The standard RISC-V Floating-point Rounding Mode field.