Skip to content

Bug Report: Memory Safety Issues in accl.cpp #214

@C-H-A-R-L-O-T-T-E-AI-Consulting-Corp

Description

Bug Report: Memory Safety Issues in accl.cpp

Summary

Found several memory safety and exception handling issues in driver/xrt/src/accl.cpp that could cause memory leaks, undefined behavior, and unhelpful error messages.


Issue 1: Exception Thrown as Pointer (Memory Leak)

Location

driver/xrt/src/accl.cpp, Line 1118 (in configure_arithmetic())

Problem

Exception is thrown using new, creating a pointer to an exception object instead of throwing by value:

throw new std::runtime_error("Communicators unconfigured, please call "
                             "configure_communicator() first");

Impact

  • Memory Leak: The allocated exception object is never freed
  • Incorrect Catch Behavior: Standard catch (const std::runtime_error& e) blocks won't catch this because it's a pointer, not an object
  • C++ Best Practice Violation: Exceptions should always be thrown by value and caught by reference

Fix

Remove the new keyword:

throw std::runtime_error("Communicators unconfigured, please call "
                         "configure_communicator() first");

Issue 2: Uninitialized Pointer and Missing Bounds Validation

Location

driver/xrt/src/accl.cpp, Lines 1174-1196 (in setup_rendezvous_spare_buffers())

Problem 1: Uninitialized Pointer

The buffer pointer is declared but not initialized:

Buffer<int8_t> *buf;
if (sim_mode) {
    buf = new SimBuffer(...);
} else if (cclo->get_device_type() == CCLO::xrt_device) {
    buf = new XRTBuffer<int8_t>(...);
} else if (cclo->get_device_type() == CCLO::coyote_device) {
    buf = new CoyoteBuffer<int8_t>(...);
}
buf->sync_to_device();  // Potential null/garbage pointer dereference!

If none of the conditions match (e.g., a new device type is added but not handled here), buf contains garbage/uninitialized memory, leading to undefined behavior when sync_to_device() is called.

Problem 2: Missing Bounds Validation

The code accesses utility_spares.at(0), .at(1), and .at(2) without validating the vector size:

cclo->write(CCLO_ADDR::SPARE1_OFFSET, utility_spares.at(0)->address() & 0xffffffff);
cclo->write(CCLO_ADDR::SPARE2_OFFSET, utility_spares.at(1)->address() & 0xffffffff);
cclo->write(CCLO_ADDR::SPARE3_OFFSET, utility_spares.at(2)->address() & 0xffffffff);

While the loop above should create 3 buffers, if any allocation fails or the logic changes, this will throw an unhelpful std::out_of_range exception.

Problem 3: Magic Number

The loop uses hardcoded 3 without explanation:

for(int i=0; i<3; i++){

Impact

  • Undefined Behavior: Dereferencing uninitialized pointer if device type doesn't match
  • Poor Error Messages: std::out_of_range gives no context about what failed
  • Maintenance Risk: Magic number makes code harder to understand and modify

Fix

void ACCL::setup_rendezvous_spare_buffers(addr_t rndzv_spare_buf_size, const std::vector<int> &devicemem){
  constexpr int REQUIRED_SPARE_BUFFERS = 3;
  max_rndzv_msg_size = rndzv_spare_buf_size;

  for(int i=0; i<REQUIRED_SPARE_BUFFERS; i++){
    Buffer<int8_t> *buf = nullptr;  // Initialize to nullptr
    if (sim_mode) {
      buf = new SimBuffer(new int8_t[max_rndzv_msg_size](), max_rndzv_msg_size, dataType::int8,
                        static_cast<SimDevice *>(cclo)->get_context());
    } else if(cclo->get_device_type() == CCLO::xrt_device ){
      buf = new XRTBuffer<int8_t>(max_rndzv_msg_size, dataType::int8,
                        *(static_cast<XRTDevice *>(cclo)->get_device()), devicemem[i % devicemem.size()]);
    } else if(cclo->get_device_type() == CCLO::coyote_device){
      buf = new CoyoteBuffer<int8_t>(max_rndzv_msg_size, dataType::int8, static_cast<CoyoteDevice *>(cclo));
    }

    // Validate buffer was created
    if (buf == nullptr) {
      throw std::runtime_error("Failed to create spare buffer: unsupported device type");
    }

    buf->sync_to_device();
    utility_spares.emplace_back(buf);
  }

  // Validate we have enough buffers before accessing
  if (utility_spares.size() < REQUIRED_SPARE_BUFFERS) {
    throw std::runtime_error("Insufficient utility spare buffers allocated: expected " +
                             std::to_string(REQUIRED_SPARE_BUFFERS) + ", got " +
                             std::to_string(utility_spares.size()));
  }

  // Safe to use [] now - bounds validated above
  cclo->write(CCLO_ADDR::SPARE1_OFFSET, utility_spares[0]->address() & 0xffffffff);
  cclo->write(CCLO_ADDR::SPARE1_OFFSET+4, (utility_spares[0]->address() >> 32) & 0xffffffff);
  cclo->write(CCLO_ADDR::SPARE2_OFFSET, utility_spares[1]->address() & 0xffffffff);
  cclo->write(CCLO_ADDR::SPARE2_OFFSET+4, (utility_spares[1]->address() >> 32) & 0xffffffff);
  cclo->write(CCLO_ADDR::SPARE3_OFFSET, utility_spares[2]->address() & 0xffffffff);
  cclo->write(CCLO_ADDR::SPARE3_OFFSET+4, (utility_spares[2]->address() >> 32) & 0xffffffff);
}

Environment

  • ACCL Version: Latest main branch (cloned 2026-01-27)
  • File: driver/xrt/src/accl.cpp
  • Compiler: Any C++17 compliant compiler

Severity

  • Issue 1: Low (memory leak, incorrect exception handling)
  • Issue 2: Medium (potential undefined behavior, poor error messages)

Checklist

  • Identified root cause
  • Provided minimal reproduction context
  • Suggested fix with code

Found during static code analysis of the ACCL codebase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions