diff --git a/rtl/video/# fb_ctrl - Module Brief (v0.1).md b/rtl/video/# fb_ctrl - Module Brief (v0.1).md new file mode 100644 index 0000000..e7572bc --- /dev/null +++ b/rtl/video/# fb_ctrl - Module Brief (v0.1).md @@ -0,0 +1,102 @@ +# fb_ctrl - Module Brief (v0.1) + +**Owner:** Jay Merveilleux +**RTL:** `rtl/video/fb_ctrl.sv` + +The `fb_ctrl` module implements the display scanout engine for the video subsystem by autonomously reading pixel data from a framebuffer stored in system memory and delivering it to the video output pipeline in precise synchronization with VGA timing. A framebuffer is a memory-resident image buffer in which each location represents the color of a screen pixel; `fb_ctrl` continuously scans this memory in raster order-left to right, top to bottom-using DMA-style AXI reads driven by the active display timing. The module supports configurable base address and stride, enabling flexible memory layout and efficient, cache-aligned access to frame data. To prevent visual artifacts such as tearing, `fb_ctrl` employs double buffering, displaying one framebuffer while another is updated and performing buffer swaps only at vertical sync boundaries. In addition, the controller expands indexed pixel formats via palette or colormap lookup and raises a VSYNC interrupt to coordinate safe rendering and buffer management with software. + +--- + +## Parameters + +- **H_RES** (default: 640) + Defines the horizontal resolution of the active display area and the number of pixels scanned per line. + +- **V_RES** (default: 480) + Defines the vertical resolution of the active display area and the total number of visible lines per frame. + +- **Pixel format / data width** + Number of bits per pixel fetched from framebuffer memory. The default configuration uses 8-bit indexed color expanded to RGB via palette/COLORMAP logic. + +- **Address width** + Width of framebuffer base address and internal address counters, defining the maximum addressable framebuffer size in system memory. + +- **Stride support** + Programmable line stride allowing each framebuffer row to begin at a configurable byte offset, supporting padded or cache-line-aligned layouts. + +- **AXI burst length / beat size** + Controls AXI read burst sizing during scanout to maximize sustained memory bandwidth and minimize bus overhead. + +- **Double-buffer enable** + Enables front/back framebuffer operation with swaps applied only at VSYNC boundaries. + +- **Scaling mode** + Optional pixel replication used for resolution upscaling (e.g., 320*200 source scaled to 640*480 output). + +--- + +## Interfaces (Ports) + +| Signal | Dir | Width | Description | +|------------------|-----|--------|---------------------------------------------------------------------------| +| `clk_i` | in | 1 | System clock driving framebuffer control logic and DMA request generation | +| `rst_ni` | in | 1 | Active-low reset; clears internal state and disables scanout | +| `enable_i` | in | 1 | Enables framebuffer scanout | +| `fb_base_i` | in | ADDR_W | Base address of active framebuffer in DDR | +| `fb_stride_i` | in | STR_W | Byte stride between successive framebuffer rows | +| `fb_swap_i` | in | 1 | Requests front/back framebuffer swap (applied at VSYNC) | +| `pixel_x_i` | in | X_W | Horizontal pixel coordinate from VGA timing | +| `pixel_y_i` | in | Y_W | Vertical pixel coordinate from VGA timing | +| `active_video_i` | in | 1 | High during visible display region | +| `pixel_index_o` | out | PIX_W | Indexed pixel fetched from framebuffer | +| `pixel_rgb_o` | out | RGB_W | Expanded RGB pixel output | +| `axi_ar_*` | out | - | AXI read address channel | +| `axi_r_*` | in | - | AXI read data channel | +| `vsync_irq_o` | out | 1 | VSYNC interrupt signaling frame boundary | + +--- + +## Reset / Initialization + +The `fb_ctrl` module uses an active-low reset (`rst_ni`) to return all internal state to a known idle condition. When reset is asserted, scanout is disabled, internal registers are cleared, and no AXI memory transactions are issued. Pixel output and VSYNC interrupt generation are suppressed during reset. After reset is deasserted, software programs framebuffer base address, stride, and buffer configuration before enabling scanout. Normal operation begins once enabled, with any buffer swap requests applied on the next VSYNC boundary. + +--- + +## Behavior & Timing + +The `fb_ctrl` module operates synchronously on the system clock (`clk_i`) as a continuously running scanout engine. When enabled, it autonomously issues AXI read transactions to fetch pixel data from the active framebuffer in raster order, synchronized to VGA timing. Framebuffer addresses advance horizontally across each line and jump by the programmed stride at line boundaries. Memory fetches are gated during blanking intervals using `active_video_i`. Double-buffer swaps are latched during operation and applied atomically at VSYNC, where an optional interrupt signals frame completion. + +--- + +## Programming Model + +The `fb_ctrl` module is configured through memory-mapped control registers defined in `specs/registers/video.yaml`. Software programs framebuffer base address, stride, pixel format, palette configuration, and optional front/back buffers before enabling scanout. Once enabled, hardware autonomously performs pixel fetch, format expansion, and display synchronization. A VSYNC interrupt allows software to safely update frame data or request buffer swaps, which are applied only at VSYNC. + +--- + +## Errors / IRQs + +The `fb_ctrl` module does not implement internal error detection for invalid configuration parameters or memory access failures. It assumes valid framebuffer addresses and reliable AXI read responses. No explicit error flags are generated for out-of-bounds access or underruns. An optional VSYNC interrupt is generated at the end of each frame and cleared via control/status registers as defined in `specs/registers/video.yaml`. + +--- + +## Performance Targets + +- Sustains continuous scanout at one pixel per pixel clock +- Operates at standard video pixel clocks (e.g., 25-40+ MHz for VGA modes) +- AXI bandwidth provisioned for worst-case resolution and pixel format +- Bounded, deterministic pixel latency through fixed pipeline depth +- Tear-free full-frame updates via double buffering +- No jitter introduced into active display timing + +--- + +## Dependencies + +Depends on system clock (`clk_i`), reset (`rst_ni`), AXI memory fabric, and backing DDR. Requires VGA timing signals from `vga_timing.sv`. Software must program valid framebuffer parameters via `specs/registers/video.yaml`. The AXI interconnect arbitrates memory access against other masters and must provide sufficient bandwidth for worst-case scanout. + +--- + +## Verification Links + +Verified using directed simulation testbenches validating raster-order scanout, stride handling, blanking behavior, and VSYNC-synchronized buffer swaps. System-level video simulations and test applications validate sustained operation under memory contention. AXI memory models observe read access patterns and burst alignment. Known limitations include reliance on software for bounds checking and the absence of formal verification under extreme contention. diff --git a/rtl/video/VGA_timing_op.md b/rtl/video/VGA_timing_op.md new file mode 100644 index 0000000..b169438 --- /dev/null +++ b/rtl/video/VGA_timing_op.md @@ -0,0 +1,60 @@ +The purpose of the VGA timing module is to generate the horizontal and vertical synchronization pulses and pixel coordinate signals required for standard VGA resolutions (e.g., 640×480 @ 60 Hz). It defines when each pixel should be drawn, when blanking intervals occur, and when sync pulses are active essentially acting as the heartbeat of the display pipeline. Other video blocks, such as the framebuffer controller or DAC driver, use these timing signals to know when to fetch and output video data. + +Parameters + +| Name | Default | Description | +| ----------| ------- | --------------------------------------------| +| H_RES | 640 | Active horizontal pixels per line | +| V_RES | 480 | Active vertical pixels per frame | +| H_FP | 16 | Horizontal front porch (pixels) | +| H_SYNC | 96 | Horizontal sync pulse width (pixels) | +| H_BP | 48 | Horizontal back porch (pixels) | +| V_FP | 10 | Vertical front porch (lines) | +| V_SYNC | 2 | Vertical sync pulse width (lines) | +| V_BP | 2 | Vertical back porch (lines) | +| PIXEL_CLK | 25*10^6 | Pixel clock frequency for 640 by 480 @ 60Hz | + +Interfaces (Ports) + +| Signal | Dir | Width | Description | +| ---- | ---- | ---- |-------------------------------------------------------------------------------------------------------------| +| clk | Input | 1 | Main pixel clock. Ensures timing logic and display pipeline stay synchronized | +| reset | Input | 1 | Active-low synchronous reset | +| hsync | Output | 1 | Horizontal sync pulse. Signals the end of a frame (start of new refresh) | +| vsync | Output | 1 | Vertical sync pulse. Signals the end of a frame (start of new fresh) | +| x | Output | 10 | Horizontal pixel counter (0-639 during active display) | +| y | Output | 10 | Vertical pixel counter (0-479 during active display) | +| active_video | Output | 1 | High during visible display time; low during blanking intervals (used to gate pixel output or blank screen) | + +Reset/Initialization + +- On reset (reset = 0), all internal counters (x,y) reset to zero and both sync outputs (hsync, vsync) are deserted +- The module begins normal operation as soon as reset is released and a valid clk is present +- No external configuration sequence is required timing parameters are static or parametrized at synthesis + +Behavior and Timing + +- The module implements two nested counters: + - The horizontal counter (x) increments every clock cycle + - When x reaches the total pixels per line, it resets to zero and increments the vertical counter (y) +- hsync is asserted low for H_SYNC cycles after the active + front porch interval +- vsync is asserted low for V_SYNC lines after the active + front porch period +- The signal active_video is high only when both x and y are within the active display area +- The structure guarantees a 60 Hz refresh at 640 \* 480 with a 25 MHz pixel clock + +Errors / IRQs + +- This module does not generate interrupts or error signals +- It operates continuously as long as a valid clock is provided +- Any display synchronization or VSYNC interrupt is usually handled by the framebuffer controller + +Dependencies + +- Clock: Requires a stable pixel clock (typically 25 MHz). +- Reset: Synchronous, active-low (reset). +- Upstream IP: Clock generation block (PLL or divider). +- Downstream IP: Framebuffer controller or video DAC/encoder that consumes timing signals. + +Summary + +- The VGA timing generator defines the temporal structure of a video frame by driving sync pulses, counters, and valid video windows. It forms the foundation for raster-scan display logic and provides synchronization for all downstream video pipeline modules diff --git a/rtl/video/audio_dma_one_pager.md b/rtl/video/audio_dma_one_pager.md new file mode 100644 index 0000000..f3763e2 --- /dev/null +++ b/rtl/video/audio_dma_one_pager.md @@ -0,0 +1,94 @@ +# audio_dma — Module Brief (v0.1) + +**Owner:** Jay Merveilleux +**RTL:** `rtl/audio/audio_dma.sv` + +The `audio_dma` module implements the streaming direct memory access (DMA) engine for the GamingCPU audio subsystem. It autonomously transfers audio sample data from system memory into a local buffering interface that feeds downstream audio output blocks such as `i2s_tx` or `pwm_audio`. Designed around a ring-buffer model, `audio_dma` allows software to continuously produce audio samples into memory while hardware consumes them at a deterministic rate, decoupling real-time audio playback from CPU scheduling and instruction execution. By issuing burst-based AXI read transactions, the module sustains audio throughput with minimal bus overhead and predictable latency. + +--- + +## Parameters + +- **Address width** + Width of the AXI address bus and internal address counters, defining the maximum addressable audio buffer size in system memory. + +- **Data width** + Width of AXI read data beats, typically aligned to the system memory and interconnect configuration. + +- **Ring buffer size** + Defines the total size of the circular audio buffer in bytes or samples. + +- **Burst length** + Controls the maximum AXI read burst size used when fetching audio data, balancing latency and bus efficiency. + +- **FIFO depth** + Depth of internal buffering between AXI read responses and the downstream audio sink. + +--- + +## Interfaces (Ports) + +| Signal | Dir | Width | Description | +|-------------------|-----|-------|----------------------------------------------------------------------| +| `clk_i` | in | 1 | System clock driving DMA control logic | +| `rst_ni` | in | 1 | Active-low reset; clears internal state and halts DMA operation | +| `enable_i` | in | 1 | Enables audio DMA operation | +| `buf_base_i` | in | ADDR | Base address of audio ring buffer in system memory | +| `buf_size_i` | in | SIZE | Total size of ring buffer | +| `rd_ptr_i` | in | ADDR | Read pointer supplied by software or internal state | +| `axi_ar_*` | out | — | AXI read address channel | +| `axi_r_*` | in | — | AXI read data channel | +| `sample_*` | out | — | Audio sample stream output to downstream audio blocks | +| `sample_valid_o` | out | 1 | Indicates valid audio sample data | +| `sample_ready_i` | in | 1 | Backpressure from downstream consumer | +| `underrun_o` | out | 1 | Indicates ring buffer underrun condition | +| `irq_o` | out | 1 | Optional interrupt signaling underrun or threshold events | + +--- + +## Reset / Initialization + +The `audio_dma` module uses an active-low reset (`rst_ni`) to return all internal state to a known idle condition. When reset is asserted, all AXI transactions are halted, internal FIFOs are flushed, address counters are cleared, and no audio samples are emitted. Underrun status is cleared, and interrupt outputs are deasserted. After reset deassertion, software programs the ring buffer base address, buffer size, and initial read/write pointers before enabling DMA operation. + +--- + +## Behavior & Timing + +The `audio_dma` module operates synchronously on the system clock (`clk_i`) as a continuously running streaming DMA engine. When enabled, it issues AXI read transactions to fetch audio data from the configured ring buffer in memory. Address generation advances sequentially through the buffer and wraps automatically at the end of the configured buffer region, implementing circular addressing semantics. + +Read data returned over the AXI interface is queued into an internal FIFO, decoupling memory access timing from the fixed consumption rate of the downstream audio output block. Data is presented to the consumer using a valid/ready handshake. AXI read bursts are sized to maximize sustained bandwidth while minimizing arbitration overhead on the shared interconnect. + +--- + +## Programming Model + +The `audio_dma` module is configured through memory-mapped control registers defined in the audio/DMA register specification. Software initializes the audio ring buffer in memory and programs buffer base address, buffer size, and control flags before enabling DMA operation. During playback, software advances the producer write pointer independently, while `audio_dma` advances the consumer read pointer autonomously. Status registers allow software to monitor buffer occupancy and detect underrun conditions. + +--- + +## Errors / IRQs + +The primary error condition detected by `audio_dma` is a buffer underrun, which occurs when the DMA engine attempts to fetch audio data beyond the available produced samples. In this condition, the module asserts an underrun status flag and may generate an interrupt to notify software. Depending on configuration, the DMA engine may stall, continue issuing reads that return invalid data, or output zero-valued samples downstream until the buffer is refilled. Recovery is software-driven and involves replenishing the ring buffer and clearing the underrun status. + +--- + +## Performance Targets + +- Sustains continuous audio streaming at the configured sample rate +- Supports burst-based AXI reads aligned to cache-line boundaries +- Maintains deterministic sample delivery to downstream audio blocks +- Tolerates short-term memory latency via internal buffering +- No audible glitches during steady-state operation +- Graceful handling of underrun conditions + +--- + +## Dependencies + +Depends on the system clock (`clk_i`), reset (`rst_ni`), AXI memory fabric, and backing system memory. Requires a downstream audio consumer such as `i2s_tx` or `pwm_audio`. Software must configure valid ring buffer parameters via the audio/DMA register specification. The AXI interconnect must provide sufficient bandwidth to sustain real-time audio fetches under worst-case contention. + +--- + +## Verification Links + +Verified using directed simulation testbenches validating ring buffer wraparound behavior, AXI burst alignment, and backpressure handling. System-level audio simulations confirm sustained playback under memory contention and proper underrun detection. AXI memory models observe read access patterns and FIFO behavior. Known limitations include reliance on software for correct buffer sizing and the absence of formal verification for extreme arbitration scenarios. diff --git a/rtl/video/fb_ctrl.sv b/rtl/video/fb_ctrl.sv index e69de29..9a616e5 100644 --- a/rtl/video/fb_ctrl.sv +++ b/rtl/video/fb_ctrl.sv @@ -0,0 +1,204 @@ +module fb_ctrl_basic #( + parameter int unsigned H_RES = 640, + parameter int unsigned V_RES = 480, + + // Source framebuffer format (GamingCPU uses 320x200 8bpp; start here even if VGA is 640x480) + parameter int unsigned FB_W = 320, + parameter int unsigned FB_H = 200, + parameter int unsigned INDEX_W = 8, // 8bpp index + parameter int unsigned RGB_W = 24, // 8:8:8 output for now + + // Addressing + parameter int unsigned ADDR_W = 32, + parameter int unsigned STRIDE_W = 16 // stride in BYTES +) ( + input logic clk_i, + input logic rst_ni, + input logic enable_i, + + // "Registers" (programming model) + input logic [ADDR_W-1:0] fb_base_i, // base address (byte address, even for BRAM emulation) + input logic [STRIDE_W-1:0] fb_stride_i, // bytes per row (>= FB_W) + + // From timing generator + input logic [$clog2(H_RES)-1:0] pixel_x_i, + input logic [$clog2(V_RES)-1:0] pixel_y_i, + input logic active_video_i, + input logic vsync_i, + + //for double buffering + input logic swap_req_i, //signal from CPU to request buffer swap (can be a simple pulse) + output logic swap_done_o, //signal to CPU to indicate swap is done (can be a pulse on the next vsync after swap) + + // frame sync intrrupt (to CPU / interrupt controller) + output logic vsync_irq_o, + + // Output pixel stream + output logic [INDEX_W-1:0] pixel_index_o, + output logic [RGB_W-1:0] pixel_rgb_o, + output logic pixel_valid_o + +); + +//enable +logic enable_q; +logic enable_rise; + +always_ff @(posedge clk_i or negedge rst_ni) begin + if (!rst_ni) begin + enable_q <= 1'b0; + end else begin + enable_q <= enable_i; + end +end + +assign enable_rise = enable_i && !enable_q; //detect rising edge of enable to reset state if needed + // -------------------------------------------------------------------------- + //Coordinate mapping (OUTPUT space -> SOURCE framebuffer space) + // -------------------------------------------------------------------------- + + logic [$clog2(FB_W)-1:0] src_x; + logic [$clog2(FB_H)-1:0] src_y; + logic src_in_range; + + always_comb begin + + src_x = pixel_x_i[$clog2(H_RES)-1:1]; //scale the x down by 2 640 -> 320 + src_y = (pixel_y_i * FB_H) / V_RES; //scale the y down 480 -> 200 + src_in_range = (src_x < FB_W) && (src_y < FB_H); // a simple bounds check + + end + + + + +// -------------------------------------------------------------------------- +// Memory backend (BRAM emulation for now) +// We'll model framebuffer as a simple array indexed by "byte offset from base" +// -------------------------------------------------------------------------- +localparam int unsigned FB_BYTES = FB_W * FB_H; // 1 byte per pixel +localparam int unsigned FB_TOTAL_BYTES = 2 * FB_BYTES; //total size for double buffering + +logic [$clog2(FB_TOTAL_BYTES)-1:0] bram_addr; +logic [INDEX_W-1:0] bram_rdata_q; + +//replace this line with actual BRAM instantiation in the future +logic [INDEX_W-1:0] framebuffer [0:FB_TOTAL_BYTES-1]; + +//page offsets +localparam int unsigned FB_PAGE_BYTES = FB_BYTES; +logic [$clog2(FB_TOTAL_BYTES)-1:0] front_page_off_q, back_page_off_q; + + +//read address uses front page +always_comb begin + bram_addr = front_page_off_q + (src_y * FB_W) + src_x; +end + +//1 cycle registered read +always_ff @(posedge clk_i or negedge rst_ni) begin + if (!rst_ni) begin + bram_rdata_q <= '0; + end else begin + bram_rdata_q <= framebuffer[bram_addr]; + end +end + +// -------------------------------------------------------------------------- +//Valid alignment (because memory read has latency) +// -------------------------------------------------------------------------- + +logic valid_q; + +always_ff @(posedge clk_i or negedge rst_ni) begin + if (!rst_ni) begin + valid_q <= 1'b0; + end else begin + valid_q <= enable_i && src_in_range && active_video_i; + end +end + +always_comb begin + pixel_index_o = bram_rdata_q; + pixel_valid_o = valid_q; +end + +// -------------------------------------------------------------------------- +//Palette lookup +// -------------------------------------------------------------------------- + +logic [RGB_W-1:0] palette [0:(1<= H_ACTIVE_START) && (pos_in_row < H_ACTIVE_END) && + (pos_in_col >= V_ACTIVE_START) && (pos_in_col < V_ACTIVE_END)) begin + x <= pos_in_row - H_ACTIVE_START; + y <= pos_in_col - V_ACTIVE_START; + end + end + else begin //not within valid data window + x <= 0; + y <= 0; + end + end + /* VGA STUFF */ +endmodule