Skip to content

Conversation

@g0nz4I0
Copy link
Member

@g0nz4I0 g0nz4I0 commented Sep 26, 2025

  • Added SEGGER_RTT files to project
  • Added SEGGER_RTT files to CMakeLists.txt
  • Fixes to DataWatchpointTrace module
  • IFDEFs guards for Benchmarking mode

Benchmarking Features

Different types of Events:

  • For measuring time executions of code blocks
  • Measuring time between different executions

This is achieved by a set of macros, which record the current CycleCounter at a given point.

Under the hood, the macros use a string which is used by the frontend to display pretty messages.
To communicate with the host PC it uses SEGGER_RTT protocol, so it needs to have a PC hooked at all times.
It introduces breaking changes by changing the software used to flash, moving away from STLINK to OPENOCD.

It also adds some polling checks to detect overflow and update the global counter.

jorgesg82
jorgesg82 previously approved these changes Oct 25, 2025
@jorgesg82
Copy link
Contributor

It would be also really really nice to have a doc folder hooked in the root one, and a benchmarking.md explaining how this work/how to use it, and link this file from the README

@jorgesg82
Copy link
Contributor

jorgesg82 commented Dec 14, 2025

Well, indeed, there is a wiki...
https://github.com/HyperloopUPV-H8/ST-LIB/wiki

@g0nz4I0
Copy link
Member Author

g0nz4I0 commented Dec 16, 2025

I missed the comment, sorry, maybe I can add it to this wiki?

@g0nz4I0 g0nz4I0 marked this pull request as draft December 16, 2025 15:41
@victor-Lopez25
Copy link

Would it be possible to add in the DWT CPI counter to get info on additional cycles spent on IF stalls or multicycle instructions?
It would be better to get info on the amount of cache misses in a block of code too but it doesn't look like there's any way in STM32H723.
The CPI counter is only 8 bits but you can enable an overflow event.

@g0nz4I0
Copy link
Member Author

g0nz4I0 commented Dec 26, 2025

The DWT peripheral can do a bunch of stuff yes, but I havent been able to make it work with VSCode yet, the event is not a SW event, but rather an internal event that triggers a transfer via SWD, this works fine in STM32CubeIDE, but I havent been able to make it work in VScode or outside the IDE yet, it's a work in progress

I can say however that the cache thing is impossible, some things that are acctually possible are checking cycles spent in interrupts or sleeeping for example

Copilot AI review requested due to automatic review settings December 27, 2025 12:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds benchmarking infrastructure to the project by integrating SEGGER RTT (Real Time Transfer) for communication with a host PC and implementing performance measurement capabilities through the ARM Data Watchpoint and Trace (DWT) unit.

Key changes:

  • Integration of SEGGER RTT third-party library for real-time data transfer
  • New benchmarking toolkit with DWT-based cycle counting and RTT-based logging
  • Refactored DataWatchpointTrace class with overflow detection support
  • Build system updates to include RTT sources and headers

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
Src/HALAL/HALAL.cpp Added initialization calls for DataWatchpointTrace and benchmarking, minor formatting fixes
Inc/HALAL/HALAL.hpp Added includes for new benchmarking modules
Inc/HALAL/Benchmarking_toolkit/DataWatchpointTrace/DataWatchpointTrace.hpp Refactored DWT interface with enable/disable methods and overflow detection
Inc/HALAL/Benchmarking_toolkit/Benchmarking.hpp New file implementing benchmarking macros and runtime measurement infrastructure
CMakeLists.txt Added RTT source files and include directories to build system
Middlewares/Third_Party/RTT/* Third-party SEGGER RTT library files (multiple new files)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +86 to +111
template<uint32_t Id>
__attribute__((always_inline)) inline static void __benchmark_start__(){
if constexpr(benchmark_type == BenchmarkType::GDB){
__asm__ volatile("BKPT \n");
} else if constexpr(benchmark_type == BenchmarkType::RUNTIME){
uint32_t cycles = DWT->CYCCNT;
if constexpr( logging_type & _RTT){
timestamp.id = Id;
timestamp.cycles = cycles;
SEGGER_RTT_Write(0,&timestamp,sizeof(TimeStamp));
}
}
}
template<uint32_t Id>
__attribute__((always_inline)) inline static void __benchmark_end__(){
if constexpr(benchmark_type == BenchmarkType::GDB){
__asm__ volatile("BKPT \n");
} else if constexpr(benchmark_type == BenchmarkType::RUNTIME){
uint32_t cycles = DWT->CYCCNT;
if constexpr( logging_type & _RTT){
timestamp.id = Id;
timestamp.cycles = cycles;
SEGGER_RTT_Write(0,&timestamp,sizeof(TimeStamp));
}
}
}
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark_start and benchmark_end functions read DWT->CYCCNT without any mechanism to handle or account for counter overflows. The num_overflows variable is declared in DataWatchpointTrace but never incremented. When the 32-bit counter wraps around, timing measurements will be incorrect. Consider implementing overflow detection and handling.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow logic is implemented on the host side, to make the target measuremetns as lightweight as possible

Comment on lines +34 to 38
static void enable() {
DWT->CTRL |= DWT_CTRL_CYCCNTENA | DWT_CTRL_EXCTRCENA_Msk | DWT_CTRL_EXCEVTENA_Msk; // enables the counter
DWT->FUNCTION0 |= (1 << 7) | 0b1000; // enable comparison
DWT->COMP0 = 0xFFFFFFFF; // detect overflow
}
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enable() function modifies DWT hardware registers but does not verify that CoreDebug->DEMCR has the TRCENA bit set, which is required before accessing DWT registers. This could cause the DWT operations to fail silently. Consider adding CoreDebug->DEMCR |= DEMCR_TRCENA before accessing DWT registers.

Copilot uses AI. Check for mistakes.
DWT->CTRL |= DWT_CTRL_CYCCNTENA; // enables the counter
DWT->CYCCNT = 0;
return DWT->CYCCNT;
static uint32_t num_overflows;
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static member variable num_overflows is declared but never defined or initialized. This will cause a linker error. Add a definition in the corresponding .cpp file (e.g., uint32_t DataWatchpointTrace::num_overflows = 0;) or make it inline if using C++17 or later.

Suggested change
static uint32_t num_overflows;
inline static uint32_t num_overflows = 0;

Copilot uses AI. Check for mistakes.
Comment on lines 49 to 52
static void reset_cnt() {
CoreDebug->DEMCR |= DEMCR_TRCENA;
DWT->CYCCNT = 0; // reset the counter
DWT->CTRL = 0;
}

static void unlock_dwt() { // unlock the dwt
uint32_t lsr = DWT->LSR;
if ((lsr & DWT_LSR_Present_Msk) != 0) {
if ((lsr & DWT_LSR_Access_Msk) != 0) {
DWT->LAR = DWT_LAR_KEY;
}
}
}
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset_cnt() private method is declared but has an empty function body, and it no longer performs its intended reset operation. This appears to be dead code that should either be removed or properly implemented to reset the counter as its name suggests.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +85
Benchmarker(){
DWT->CTRL |= 1;
DWT->CYCCNT = 0;
if constexpr(benchmark_type == BenchmarkType::RUNTIME){
SEGGER_RTT_Init();
}
}
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Benchmarker constructor initializes DWT registers (DWT->CTRL and DWT->CYCCNT) without checking if the trace infrastructure is unlocked or enabled via CoreDebug->DEMCR. This could fail silently on some targets. The initialization should call CoreDebug->DEMCR |= DEMCR_TRCENA first, and potentially unlock DWT if needed.

Copilot uses AI. Check for mistakes.
g0nz4I0 and others added 2 commits December 27, 2025 13:47
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment on lines 51 to 61
constexpr BenchmarkType benchmark_type =
#ifdef RUNTIME_BENCHMARK
BenchmarkType::RUNTIME;
static const char BENCHMARK_TYPE_RUNTIME[] __attribute__((used)) = "BENCHMARK_RUNTIME";
#elif defined(GDB_BENCHMARK)
BenchmarkType::GDB;
static const char BENCHMARK_TYPE_GDB[] __attribute__((used)) = "BENCHMARK_GDB";
#else
BenchmarkType::NONE;
static const char BENCHMARK_TYPE_NONE[] __attribute__((used)) = "BENCHMARK_NONE";
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define benchmark_type in each scope of the #if clause:

#ifdef RUNTIME_BENCHMARK
    constexpr BenchmarkType benchmark_type = BenchmarkType::RUNTIME;
    static const char BENCHMARK_TYPE_RUNTIME[] __attribute__((used)) = "BENCHMARK_RUNTIME";
#elif defined(GDB_BENCHMARK)
    constexpr BenchmarkType benchmark_type = BenchmarkType::GDB;
    static const char BENCHMARK_TYPE_GDB[] __attribute__((used)) = "BENCHMARK_GDB";
#else 
    constexpr BenchmarkType benchmark_type = BenchmarkType::NONE;
    static const char BENCHMARK_TYPE_NONE[] __attribute__((used)) = "BENCHMARK_NONE";
#endif 

Comment on lines 65 to 75
inline constexpr int logging_type =
#if defined(SD)
_SD |
#else
_NONE |
#endif
#if defined(RUNTIME_BENCHMARK)
_RTT;
#else
_NONE;
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it looks cool but it doesn't look very clear, it could be better to write it like this:

#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG  _RTT
#else
#define RUNTIME_BENCHMARK_TAG  _NONE
#endif

#if defined(SD)
inline constexpr int logging_type = _SD | RUNTIME_BENCHMARK_TAG:
#else
inline constexpr int logging_type = _NONE | RUNTIME_BENCHMARK_TAG;
#endif

or alternatively:

#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG  _RTT
#else
#define RUNTIME_BENCHMARK_TAG  _NONE
#endif

#if defined(SD)
#define BENCHMARK_LOGGING_TYPE  _SD | RUNTIME_BENCHMARK_TAG
#else
#define BENCHMARK_LOGGING_TYPE  _NONE | RUNTIME_BENCHMARK_TAG
#endif
inline constexpr int logging_type = BENCHMARK_LOGGING_TYPE:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or...:

#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG  _RTT
#else
#define RUNTIME_BENCHMARK_TAG  _NONE
#endif

#if defined(SD)
#define SD_BENCHMARK_TAG  _SD
#else
#define SD_BENCHMARK_TAG  _NONE
#endif
inline constexpr int logging_type =  RUNTIME_BENCHMARK_TAG | SD_BENCHMARK_TAG:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants