-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarking #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Benchmarking #526
Conversation
…orking beforehand, take a look at the old reset_cnt disabling DWT_CTRL_CYCCNTENA by setting DWT->CTRL to 0)
… overflow check in STLIB::update()
…Benchmarking as well
|
It would be also really really nice to have a doc folder hooked in the root one, and a benchmarking.md explaining how this work/how to use it, and link this file from the README |
|
Well, indeed, there is a wiki... |
|
I missed the comment, sorry, maybe I can add it to this wiki? |
|
Would it be possible to add in the DWT CPI counter to get info on additional cycles spent on IF stalls or multicycle instructions? |
|
The DWT peripheral can do a bunch of stuff yes, but I havent been able to make it work with VSCode yet, the event is not a SW event, but rather an internal event that triggers a transfer via SWD, this works fine in STM32CubeIDE, but I havent been able to make it work in VScode or outside the IDE yet, it's a work in progress I can say however that the cache thing is impossible, some things that are acctually possible are checking cycles spent in interrupts or sleeeping for example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds benchmarking infrastructure to the project by integrating SEGGER RTT (Real Time Transfer) for communication with a host PC and implementing performance measurement capabilities through the ARM Data Watchpoint and Trace (DWT) unit.
Key changes:
- Integration of SEGGER RTT third-party library for real-time data transfer
- New benchmarking toolkit with DWT-based cycle counting and RTT-based logging
- Refactored DataWatchpointTrace class with overflow detection support
- Build system updates to include RTT sources and headers
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| Src/HALAL/HALAL.cpp | Added initialization calls for DataWatchpointTrace and benchmarking, minor formatting fixes |
| Inc/HALAL/HALAL.hpp | Added includes for new benchmarking modules |
| Inc/HALAL/Benchmarking_toolkit/DataWatchpointTrace/DataWatchpointTrace.hpp | Refactored DWT interface with enable/disable methods and overflow detection |
| Inc/HALAL/Benchmarking_toolkit/Benchmarking.hpp | New file implementing benchmarking macros and runtime measurement infrastructure |
| CMakeLists.txt | Added RTT source files and include directories to build system |
| Middlewares/Third_Party/RTT/* | Third-party SEGGER RTT library files (multiple new files) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| template<uint32_t Id> | ||
| __attribute__((always_inline)) inline static void __benchmark_start__(){ | ||
| if constexpr(benchmark_type == BenchmarkType::GDB){ | ||
| __asm__ volatile("BKPT \n"); | ||
| } else if constexpr(benchmark_type == BenchmarkType::RUNTIME){ | ||
| uint32_t cycles = DWT->CYCCNT; | ||
| if constexpr( logging_type & _RTT){ | ||
| timestamp.id = Id; | ||
| timestamp.cycles = cycles; | ||
| SEGGER_RTT_Write(0,×tamp,sizeof(TimeStamp)); | ||
| } | ||
| } | ||
| } | ||
| template<uint32_t Id> | ||
| __attribute__((always_inline)) inline static void __benchmark_end__(){ | ||
| if constexpr(benchmark_type == BenchmarkType::GDB){ | ||
| __asm__ volatile("BKPT \n"); | ||
| } else if constexpr(benchmark_type == BenchmarkType::RUNTIME){ | ||
| uint32_t cycles = DWT->CYCCNT; | ||
| if constexpr( logging_type & _RTT){ | ||
| timestamp.id = Id; | ||
| timestamp.cycles = cycles; | ||
| SEGGER_RTT_Write(0,×tamp,sizeof(TimeStamp)); | ||
| } | ||
| } | ||
| } |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark_start and benchmark_end functions read DWT->CYCCNT without any mechanism to handle or account for counter overflows. The num_overflows variable is declared in DataWatchpointTrace but never incremented. When the 32-bit counter wraps around, timing measurements will be incorrect. Consider implementing overflow detection and handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overflow logic is implemented on the host side, to make the target measuremetns as lightweight as possible
| static void enable() { | ||
| DWT->CTRL |= DWT_CTRL_CYCCNTENA | DWT_CTRL_EXCTRCENA_Msk | DWT_CTRL_EXCEVTENA_Msk; // enables the counter | ||
| DWT->FUNCTION0 |= (1 << 7) | 0b1000; // enable comparison | ||
| DWT->COMP0 = 0xFFFFFFFF; // detect overflow | ||
| } |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enable() function modifies DWT hardware registers but does not verify that CoreDebug->DEMCR has the TRCENA bit set, which is required before accessing DWT registers. This could cause the DWT operations to fail silently. Consider adding CoreDebug->DEMCR |= DEMCR_TRCENA before accessing DWT registers.
| DWT->CTRL |= DWT_CTRL_CYCCNTENA; // enables the counter | ||
| DWT->CYCCNT = 0; | ||
| return DWT->CYCCNT; | ||
| static uint32_t num_overflows; |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static member variable num_overflows is declared but never defined or initialized. This will cause a linker error. Add a definition in the corresponding .cpp file (e.g., uint32_t DataWatchpointTrace::num_overflows = 0;) or make it inline if using C++17 or later.
| static uint32_t num_overflows; | |
| inline static uint32_t num_overflows = 0; |
| static void reset_cnt() { | ||
| CoreDebug->DEMCR |= DEMCR_TRCENA; | ||
| DWT->CYCCNT = 0; // reset the counter | ||
| DWT->CTRL = 0; | ||
| } | ||
|
|
||
| static void unlock_dwt() { // unlock the dwt | ||
| uint32_t lsr = DWT->LSR; | ||
| if ((lsr & DWT_LSR_Present_Msk) != 0) { | ||
| if ((lsr & DWT_LSR_Access_Msk) != 0) { | ||
| DWT->LAR = DWT_LAR_KEY; | ||
| } | ||
| } | ||
| } |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset_cnt() private method is declared but has an empty function body, and it no longer performs its intended reset operation. This appears to be dead code that should either be removed or properly implemented to reset the counter as its name suggests.
| Benchmarker(){ | ||
| DWT->CTRL |= 1; | ||
| DWT->CYCCNT = 0; | ||
| if constexpr(benchmark_type == BenchmarkType::RUNTIME){ | ||
| SEGGER_RTT_Init(); | ||
| } | ||
| } |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Benchmarker constructor initializes DWT registers (DWT->CTRL and DWT->CYCCNT) without checking if the trace infrastructure is unlocked or enabled via CoreDebug->DEMCR. This could fail silently on some targets. The initialization should call CoreDebug->DEMCR |= DEMCR_TRCENA first, and potentially unlock DWT if needed.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| constexpr BenchmarkType benchmark_type = | ||
| #ifdef RUNTIME_BENCHMARK | ||
| BenchmarkType::RUNTIME; | ||
| static const char BENCHMARK_TYPE_RUNTIME[] __attribute__((used)) = "BENCHMARK_RUNTIME"; | ||
| #elif defined(GDB_BENCHMARK) | ||
| BenchmarkType::GDB; | ||
| static const char BENCHMARK_TYPE_GDB[] __attribute__((used)) = "BENCHMARK_GDB"; | ||
| #else | ||
| BenchmarkType::NONE; | ||
| static const char BENCHMARK_TYPE_NONE[] __attribute__((used)) = "BENCHMARK_NONE"; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please define benchmark_type in each scope of the #if clause:
#ifdef RUNTIME_BENCHMARK
constexpr BenchmarkType benchmark_type = BenchmarkType::RUNTIME;
static const char BENCHMARK_TYPE_RUNTIME[] __attribute__((used)) = "BENCHMARK_RUNTIME";
#elif defined(GDB_BENCHMARK)
constexpr BenchmarkType benchmark_type = BenchmarkType::GDB;
static const char BENCHMARK_TYPE_GDB[] __attribute__((used)) = "BENCHMARK_GDB";
#else
constexpr BenchmarkType benchmark_type = BenchmarkType::NONE;
static const char BENCHMARK_TYPE_NONE[] __attribute__((used)) = "BENCHMARK_NONE";
#endif | inline constexpr int logging_type = | ||
| #if defined(SD) | ||
| _SD | | ||
| #else | ||
| _NONE | | ||
| #endif | ||
| #if defined(RUNTIME_BENCHMARK) | ||
| _RTT; | ||
| #else | ||
| _NONE; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it looks cool but it doesn't look very clear, it could be better to write it like this:
#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG _RTT
#else
#define RUNTIME_BENCHMARK_TAG _NONE
#endif
#if defined(SD)
inline constexpr int logging_type = _SD | RUNTIME_BENCHMARK_TAG:
#else
inline constexpr int logging_type = _NONE | RUNTIME_BENCHMARK_TAG;
#endifor alternatively:
#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG _RTT
#else
#define RUNTIME_BENCHMARK_TAG _NONE
#endif
#if defined(SD)
#define BENCHMARK_LOGGING_TYPE _SD | RUNTIME_BENCHMARK_TAG
#else
#define BENCHMARK_LOGGING_TYPE _NONE | RUNTIME_BENCHMARK_TAG
#endif
inline constexpr int logging_type = BENCHMARK_LOGGING_TYPE:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or...:
#if defined(RUNTIME_BENCHMARK)
#define RUNTIME_BENCHMARK_TAG _RTT
#else
#define RUNTIME_BENCHMARK_TAG _NONE
#endif
#if defined(SD)
#define SD_BENCHMARK_TAG _SD
#else
#define SD_BENCHMARK_TAG _NONE
#endif
inline constexpr int logging_type = RUNTIME_BENCHMARK_TAG | SD_BENCHMARK_TAG:
Benchmarking Features
Different types of Events:
This is achieved by a set of macros, which record the current CycleCounter at a given point.
Under the hood, the macros use a string which is used by the frontend to display pretty messages.
To communicate with the host PC it uses SEGGER_RTT protocol, so it needs to have a PC hooked at all times.
It introduces breaking changes by changing the software used to flash, moving away from STLINK to OPENOCD.
It also adds some polling checks to detect overflow and update the global counter.