Debug & Probing
Debug Capabilities for Tiny Tapeout Tapeout
This document proposes “best practices” debug capabilities for the OCP MXFP8 Streaming MAC Unit, specifically tailored for its first silicon tapeout on Tiny Tapeout.
1. Real-Time Observability (The “Logic Analyzer” Mode)
Since the OCP protocol leaves uo_out unused (driven to 0x00) during the IDLE, LOAD_SCALE, and STREAM phases (Cycles 0–36), we can repurpose these pins for real-time internal signal probing.
Configuration
A “Debug Instruction” is sampled in Cycle 0 (STATE_IDLE):
ui_in[6]: Enable Debug Mode (1 = Active, 0 = Normal)uio_in[3:0]: Probe Selector (Determines what signal is muxed touo_out)
Probe Mappings (uo_out during Cycles 0-36)
Selector |
Signal Description |
Bit Mapping |
|---|---|---|
|
Default |
|
|
FSM State & Timing |
|
|
Exception Monitor |
|
|
Accumulator [31:24] |
Live MSB of the accumulator |
|
Accumulator [23:16] |
Live Byte 2 |
|
Accumulator [15:8] |
Live Byte 1 |
|
Accumulator [7:0] |
Live LSB (Fixed-point fraction) |
|
Multiplier Lane 0 MSB |
|
|
Multiplier Lane 0 LSB |
|
|
Control Signals |
|
|
Multiplier Lane 0 Meta |
|
|
Multiplier Lane 1 MSB |
|
|
Multiplier Lane 1 LSB |
|
|
Multiplier Lane 1 Meta |
|
2. Connectivity Loopback
To verify the PCB/Socket connectivity and the TT infrastructure before running complex arithmetic, a transparent loopback mode is provided.
Trigger:
ui_in[5]is set to1in Cycle 0.Behavior: The unit enters a persistent “Loopback Mode” until reset. It is sticky across block boundaries once enabled.
uo_out = ui_in ^ uio_inuio_oe = 8'h00(All pins remain inputs to avoid combinational loops)This allows verifying all 16 input pins (
ui_in[7:0]anduio_in[7:0]) via theuo_outport.This bypasses all FSM logic.
3. Metadata Echo
In Cycle 35 (Pipeline Flush), if debug_mode is active, uo_out will echo the latched configuration instead of 0x00.
uo_out[2:0]:format_auo_out[4:3]:round_modeuo_out[5]:overflow_wrapuo_out[6]:packed_modeuo_out[7]:mx_plus_en
This allows verifying that the setup cycles (1 & 2) were correctly sampled by the hardware.
4. Implementation Notes
Area Impact: Approximately ~100-150 gates for the debug multiplexers.
Timing: No impact on the critical path (arithmetic) as it only muxes the final
uo_outstage which is already registered or gated.Persistence: Once
debug_modeis enabled in Cycle 0, it remains active for that entire block operation (untillogical_cycle == 0).