diff --git a/CHANGELOG.md b/CHANGELOG.md index f1b67dae..857a1b92 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,210 @@ Full documentation for ROCprofiler is available at [docs.amd.com](https://docs.amd.com/bundle/ROCm-Profiling-Tools-User-Guide-v5.3) +As of ROCm 5.5, the ROCm Profiler will not use terminologies like `rocmtools` or +`rocsight` to describe `rocrofiler` as was done in ROCm 5.4. To identify the +separation of the two versions of `rocprofiler`, the terms `rocprofilerV1` and +`rocprofilerV2` will be used. The `rocprofilerV2` API is currently considered a +beta release and subject to changes in future releases. + +## ROCprofiler for rocm 5.4.4 + +In ROCm 5.4 the naming of the ROCm Profiler related files is: + + | ROCm 5.4 | rocprofilerv1 | rocmtools | + |-----------------|-------------------------------------|---------------------------------| + | **Tool script** | `bin/rocprof` | `bin/rocsight` | + | **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocmtools/rocmtools.h` | + | **API library** | `lib/librocprofiler64.so.1` | `lib/librocmtools.so.1` | + +The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the +following command: + +```sh +$ rocprof … +``` + +To write a custom tool based on the `rocprofilerV1` API do the following: + +```C +main.c: +#include // Use the rocprofilerV1 API +int main() { + // Use the rocprofilerV1 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocprofiler64 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.4.4/lib/librocprofiler64.so.1`. + +The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the +following command: + +```sh +$ rocsight … +``` + +To write a custom tool based on the `rocmtools` API do the following: + +```C +main.c: +#include // Use the rocmtools API +int main() { + // Use the rocmtools API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocmtools +``` + +The resulting `a.out` will depend on `/opt/rocm-5.4.4/lib/librocmtools.so.1`. + +## ROCprofiler for rocm 5.5.0 + +In ROCm 5.5 the `rocprofilerv1` and `rocprofilerv2` include and library files +are merged into single files. The `rocmtools` available in ROCm 5.4 is also +available in ROCm 5.5 but is deprecated and will be removed in a future release. + + | ROCm 5.5 | rocprofilerv1 | rocprofilerv2 | rocmtools *(deprecated)* | + |-----------------|-------------------------------------|-------------------------------------|---------------------------------| + | **Tool script** | `bin/rocprof` | `bin/rocprofv2` | `bin/rocsight` | + | **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/rocprofiler.h` | `include/rocmtools/rocmtools.h` | + | **API library** | `lib/librocprofiler64.so.1` | `lib/librocprofiler64.so.1` | `lib/librocmtools.so.1` | + + +The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the +following command: + +```sh +$ rocprof … +``` + +To write a custom tool based on the `rocprofilerV1` API it is necessary to +define the macro `ROCPROFILER_V1`: + +```C +main.c: +#define ROCPROFILER_V1 +#include +int main() { + // Use the rocprofilerV1 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.5.0/lib/librocprofiler64.so.1`. + +The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the +following command: + +```sh +$ rocprofv2 … +``` + +To write a custom tool based on the `rocprofilerV2` API do the following: + +```C +main.c: +#include +int main() { + // Use the rocprofilerV2 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.5.0/lib/librocprofiler64.so.1`. + ## ROCprofiler for rocm 5.6.0 + +In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of +ROCm 5.5 are split into separate files. The `rocmtools` files that were +deprecated in ROCm 5.5 have been removed. + + | ROCm 5.6 | rocprofilerv1 | rocprofilerv2 | + |-----------------|-------------------------------------|----------------------------------------| + | **Tool script** | `bin/rocprof` | `bin/rocprofv2` | + | **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` | + | **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` | + +The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the +following command: + +```sh +$ rocprof … +``` + +To write a custom tool based on the `rocprofilerV1` API do the following: + +```C +main.c: +#include // Use the rocprofilerV1 API +int main() { + // Use the rocprofilerV1 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`. + +The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the +following command: + +```sh +$ rocprofv2 … +``` + +To write a custom tool based on the `rocprofilerV2` API do the following: + +```C +main.c: +#include // Use the rocprofilerV2 API +int main() { + // Use the rocprofilerV2 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`. + ### Optimized - Improved Test Suite ### Added diff --git a/CMakeLists.txt b/CMakeLists.txt index 85f4e363..8f94b588 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -55,7 +55,7 @@ include(utils) include(env) # Setup the package version. -get_version("2.0.0") +get_version("1.0.0") message("-- LIB-VERSION: ${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_PATCH}") set(BUILD_VERSION_MAJOR ${VERSION_MAJOR}) @@ -277,7 +277,7 @@ if(FILE_REORG_BACKWARD_COMPATIBILITY) set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}" CACHE STRING "Header wrapper warnings as errors.") else() - set(ROCM_HEADER_WRAPPER_WERROR "ON" CACHE STRING "Header wrapper warnings as errors.") + set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.") endif() endif() @@ -516,8 +516,8 @@ if(DOXYGEN_FOUND) COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf MAIN_DEPENDENCY ${DOXYGEN_OUT} ${DOXYGEN_IN} - DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/inc/rocprofiler.h - ${CMAKE_CURRENT_SOURCE_DIR}/inc/rocprofiler_plugin.h + DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h + ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h COMMENT "Generating documentation") add_custom_target( diff --git a/README.md b/README.md index c66c0cb7..6ebfd8db 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,9 @@ Library supports GFX8/GFX9. The library source tree: - doc - Documentation - - inc/rocprofiler.h - Library public API + - include/rocprofiler/rocprofiler.h - Library public API + - include/rocprofiler/v2/rocprofiler.h - V2 Beta Library public API + - include/rocprofiler/v2/rocprofiler_plugins.h - V2 Beta Tool's Plugins Library public API - src - Library sources - core - Library API sources - util - Library utils sources @@ -64,7 +66,7 @@ To enable verbose tracing: $ export ROCPROFILER_TRACE=1 ``` -## ROC Profiler library version 2.0 +## ROC Profiler library version 9.0 ## Introduction @@ -249,8 +251,8 @@ The user has two options for building: installtion: ```bash - rocprofiler-plugins_2.0.0-local_amd64.deb - rocprofiler-plugins-2.0.0-local.x86_64.rpm + rocprofiler-plugins_9.0.0-local_amd64.deb + rocprofiler-plugins-9.0.0-local.x86_64.rpm ``` usage: @@ -299,8 +301,8 @@ The user has two options for building: installation: ```bash - rocprofiler-tests_2.0.0-local_amd64.deb - rocprofiler-tests-2.0.0-local.x86_64.rpm + rocprofiler-tests_9.0.0-local_amd64.deb + rocprofiler-tests-9.0.0-local.x86_64.rpm ``` usage: From build directory: @@ -319,8 +321,8 @@ We make use of doxygen to autmatically generate API documentation. Generated doc installtion: ```bash - rocprofiler-docs_2.0.0-local_amd64.deb - rocprofiler-docs-2.0.0-local.x86_64.rpm + rocprofiler-docs_9.0.0-local_amd64.deb + rocprofiler-docs-9.0.0-local.x86_64.rpm ``` ## Samples @@ -329,8 +331,8 @@ We make use of doxygen to autmatically generate API documentation. Generated doc insalltion: ```bash -rocprofiler-samples_2.0.0-local_amd64.deb -rocprofiler-samples-2.0.0-local.x86_64.rpm +rocprofiler-samples_9.0.0-local_amd64.deb +rocprofiler-samples-9.0.0-local.x86_64.rpm ``` usage: @@ -371,3 +373,8 @@ samples can be run as independent executables once installed Please report in the Github Issues ## Limitations +- In 5.6, Navi2x requires a GRBM counter as the first counter for input PMC lines. Results are undefined otherwise. +- Navi requires a stable power state for counter collection. Currently this state needs to be set by the user. + To do so, set "power_dpm_force_performance_level" to be writeable for non-root users with chmod, then: + echo profile_standard >> /sys/class/drm/card0/device/power_dpm_force_performance_level + Recommended: "auto" or "high" for ATT and "profile_standard" for PMC. Use rocm-smi to verify the current power state. diff --git a/bin/rocprofv2 b/bin/rocprofv2 index e8a356ec..3e95b24e 100755 --- a/bin/rocprofv2 +++ b/bin/rocprofv2 @@ -25,8 +25,8 @@ usage() { echo -e "--clean-install For installing ROCProfilerV2 with new clean build in the default installation folder (review build.sh to know more about the default paths)" fi echo -e "--hip-api For Collecting HIP API Traces" - echo -e "--hip-activity For Collecting HSA API Activities Traces" - echo -e "--hsa-api For Collecting HIP API Traces" + echo -e "--hip-activity For Collecting HIP API Activities Traces" + echo -e "--hsa-api For Collecting HSA API Traces" echo -e "--hsa-activity For Collecting HSA API Activities Traces" echo -e "--roctx-trace For Collecting ROCTx Traces" echo -e "--kernel-trace For Collecting Kernel dispatch Traces" diff --git a/bin/rpl_run.sh b/bin/rpl_run.sh index f0290e0a..ae9bfcb9 100755 --- a/bin/rpl_run.sh +++ b/bin/rpl_run.sh @@ -61,7 +61,7 @@ unset ROCPROFILER_SESS # Profiler environment # Loading of profiler library by HSA runtime -MY_HSA_TOOLS_LIB="$RPL_PATH/librocprofiler64.so" +MY_HSA_TOOLS_LIB="$RPL_PATH/librocprofiler64.so.1" # Loading of the test tool by ROC Profiler export ROCP_TOOL_LIB=$TLIB_PATH/librocprof-tool.so # Enabling HSA dispatches intercepting by ROC PRofiler @@ -272,16 +272,16 @@ run() { if [ "$HSA_TRACE" = 1 ] ; then export ROCTRACER_DOMAIN=$API_TRACE":hsa" - MY_HSA_TOOLS_LIB="$MY_HSA_TOOLS_LIB $ROCM_LIB_PATH/libroctracer64.so $TTLIB_PATH/libroctracer_tool.so" + MY_HSA_TOOLS_LIB="$MY_HSA_TOOLS_LIB $ROCM_LIB_PATH/libroctracer64.so.4 $TTLIB_PATH/libroctracer_tool.so" elif [ -n "$API_TRACE" ] ; then export ROCTRACER_DOMAIN=$API_TRACE OUTPUT_LIST="$ROCP_OUTPUT_DIR/" - MY_HSA_TOOLS_LIB="$ROCM_LIB_PATH/libroctracer64.so $TTLIB_PATH/libroctracer_tool.so" + MY_HSA_TOOLS_LIB="$ROCM_LIB_PATH/libroctracer64.so.4 $TTLIB_PATH/libroctracer_tool.so" fi if [ "$ROCP_STATS_OPT" = 1 ] ; then if [ "$ROCTRACER_DOMAIN" = ":hip" ] ; then - MY_HSA_TOOLS_LIB="$ROCM_LIB_PATH/libroctracer64.so $TTLIB_PATH/libhip_stats.so" + MY_HSA_TOOLS_LIB="$ROCM_LIB_PATH/libroctracer64.so.4 $TTLIB_PATH/libhip_stats.so" else error_message="ROCP_STATS_OPT is only available with --hip-trace option" echo $error_message diff --git a/doc/Doxyfile.in b/doc/Doxyfile.in index a34c460c..684b1315 100644 --- a/doc/Doxyfile.in +++ b/doc/Doxyfile.in @@ -791,7 +791,7 @@ WARN_LOGFILE = # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING # Note: If this tag is empty the current directory is searched. -INPUT = @CMAKE_CURRENT_SOURCE_DIR@/inc/rocprofiler.h @CMAKE_CURRENT_SOURCE_DIR@/inc/rocprofiler_plugin.h +INPUT = @CMAKE_CURRENT_SOURCE_DIR@/include/rocprofiler/v2/rocprofiler.h @CMAKE_CURRENT_SOURCE_DIR@/include/rocprofiler/v2/rocprofiler_plugin.h # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses diff --git a/doc/ROCProfiler_V1_API_spec.md b/doc/ROCProfiler_V1_API_spec.md new file mode 100644 index 00000000..975d58ca --- /dev/null +++ b/doc/ROCProfiler_V1_API_spec.md @@ -0,0 +1,837 @@ +# ROC Profiler Library Specification +ROC Profiler API version 7 + +## 1. High level overview +``` +The goal of the implementation is to provide a HW specific low-level performance analysis +interface for profiling of GPU compute applications. The profiling includes HW performance +counters with complex performance metrics and HW traces. The implementation distinguishes +two profiling features, metrics and traces. HW performance counters are treated as the basic +metrics and the formulas can be defined for derived complex metrics. +The library can be loaded by HSA runtime as a tool plugin and it can be loaded by higher +level HW independent performance analysis API like PAPI. +The library has C API and is based on AQLprofile AMD specific HSA extension. + + 1. The library provides methods to query the list of supported HW features. + 2. The library provides profiling APIs to start, stop, read metrics results and tracing + data. + 3. The library provides a intercepting API for collecting per-kernel profiling data for + the kernels + dispatched to HSA AQL queues. + 4. The library provides mechanism to load profiling tool library plugin by env variable + ROCP_TOOL_LIB. + 5. The library is responsible for allocation of the buffers for profiling and notifying + about output data buffer overflow for traces. + 6. The library is implemented based on AMD specific AQLprofile HSA extension. + 7. The library implementation is abstracted from the specific GFXIP. + 8. The library implementation is extensible: + - Easy adding of counters and metrics + - Counters enumeration + - Counters and metrics can be dynamically configured using XML configuration files with + counters and metrics tables: + o Counters table entry, basic metric: counter name, block name, event id + o Complex metrics table entry: metric name, an expression for calculation the metric + from the counters + +Metrics XML file example: + + + + . . . + + + + . . . + + + + + +``` +## 2. Environment +``` +* HSA_TOOLS_LIB - required to be set to the name of rocprofiler library to be loaded by +HSA runtime +* ROCP_METRICS - path to the metrics XML file +* ROCP_TOOL_LIB - path to profiling tool library loaded by ROC Profiler +* ROCP_HSA_INTERCEPT - if set then HSA dispatches intercepting is enabled +``` +## 3. General API +### 3.1. Description +``` +The library supports method for getting the error number and error string of the last +failed library API call. +To check the conformance of used library APi header and the library binary the version +macros and API methods can be used. + +Returning the error and error string methods: +- rocprofiler_error_string - method for returning the error string + +Library version: +- ROCPROFILER_VERSION_MAJOR - API major version macro +- ROCPROFILER_VERSION_MINOR - API minor version macro +- rocprofiler_version_major - library major version +- rocprofiler_version_minor - library minor version +``` +### 3.2. Returning the error and error string methods +``` +const char* rocprofiler_error_string(); +``` +### 3.3. Library version +``` +The library provides back compatibility if the library major version is less or equal +then the API major version macro. + +API version macros defined in the library API header 'rocprofiler.h': + +ROCPROFILER_VERSION_MAJOR +ROCPROFILER_VERSION_MINOR + +Methods to check library major and minor venison: + +uint32_t rocprofiler_major_version(); +uint32_t rocprofiler_minor_version(); +``` +## 4. Backend API +### 4.1. Description +``` +The library provides the methods to open/close profiling context, to start, stop and read +HW performance counters and traces, to intercept kernel dispatches to collect per-kernel +profiling data. Also the library provides methods to calculate complex performance metrics +and to query the list of available metrics. The library distinguishes two profiling features, +metrics and traces, where HW performance counters are treated as the basic metrics. To check +if there was an error the library methods return HSA standard status code. +For a given context the profiling can be started/stopped and counters sampled in standalone +mode or profiling can be initiated by intercepting the kernel dispatches with registering +a dispatch callback. +For counters sampling, which is the usage model of higher level APIs like PAPI, +the start/stop/read APIs should be used. +For collecting per-kernel data for the submitted to HSA queues kernels the dispatch callback +API should be used. +The library provides back compatibility if the library major version is less or equal. + +Returned API status: +- hsa_status_t - HSA status codes are used from hsa.h header + +Loading and Configuring, loadable plugin on-load/unload methods: +- rocprofiler_settings_t – global properties +- OnLoadTool +- OnLoadToolProp +- OnUnloadTool + +Info API: +- rocprofiler_info_kind_t - profiling info kind +- rocprofiler_info_query_t - profiling info query +- rocprofiler_info_data_t - profiling info data +- rocprofiler_get_info - return the info for a given info kind +- rocprofiler_iterote_inf_ - iterate over the info for a given info kind +- rocprofiler_query_info - iterate over the info for a given info query + +Context API: +- rocprofiler_t - profiling context handle +- rocprofiler_feature_kind_t - profiling feature kind +- rocprofiler_feature_parameter_t - profiling feature parameter +- rocprofiler_data_kind_t - profiling data kind +- rocprofiler_data_t - profiling data +- rocprofiler_feature_t - profiling feature +- rocprofiler_mode_t - profiling modes +- rocprofiler_properties_t - profiler properties +- rocprofiler_open - open new profiling context +- rocprofiler_close - close profiling context and release all allocated resources +- rocprofiler_group_count - return profiling groups count +- rocprofiler_get_group - return profiling group for a given index +- rocprofiler_get_metrics - method for calculating the metrics data +- rocprofiler_iterate_trace_data - method for iterating output trace data instances +- rocprofiler_time_id_t - supported time value ID enumeration +- rocprofiler_get_time – return time for a given time ID and profiling timestamp value + +Sampling API: +- rocprofiler_start - start profiling +- rocprofiler_stop - stop profiling +- rocprofiler_read - read profiling data to the profiling features objects +- rocprofiler_get_data - wait for profiling data + Group versions of start/stop/read/get_data methods: + o rocprofiler_group_start + o rocprofiler_group_stop + o rocprofiler_group_read + o rocprofiler_group_get_data + +Intercepting API: +- rocprofiler_callback_t - profiling callback type +- rocprofiler_callback_data_t - profiling callback data type +- rocprofiler_dispatch_record_t – dispatch record +- rocprofiler_queue_callbacks_t – queue callbacks, dispatch/destroy +- rocprofiler_set_queue_callbacks - set queue kernel dispatch and queue destroy callbacks +- rocprofiler_remove_queue_callbacks - remove queue callbacks + +Context pool API: +- rocprofiler_pool_t – context pool handle +- rocprofiler_pool_entry_t – context pool entry +- rocprofiler_pool_properties_t – context pool properties +- rocprofiler_pool_handler_t – context pool completion handler +- rocprofiler_pool_open - context pool open +- rocprofiler_pool_close - context pool close +- rocprofiler_pool_fetch – fetch and empty context entry to pool +- rocprofiler_pool_release – release a context entry +- rocprofiler_pool_iterate – iterated fetched context entries +- rocprofiler_pool_flush – flush completed context entries +``` +### 4.2. Loading and Configuring +``` +Loading and Configuring +The profiling properties can be set by profiler plugin on loading by ROC runtime. +The profiler library plugin can be set by ROCP_TOOL_LIB env var. + +Global properties: + +typedef struct { + uint32_t intercept_mode; + uint64_t timeout; + uint32_t timestamp_on; +} rocprofiler_settings_t; + +On load/unload methods defined in profiling tool library loaded by ROCP_TOOL_LIB env var: +extern "C" void OnLoadTool(); +extern "C" void OnLoadToolProp(rocprofiler_settings_t* settings); +extern "C" void OnUnloadTool(); + +``` +### 4.3. Info API +``` +The profiling metrics are defined by name and the traces are defined by name and parameters. +All supported features can be iterated using 'iterate_info/query_info' methods. The counter +names are defined in counters table configuration file, each counter has a unique name and +defined by block name and event id. The traces and trace parameters names are same as in +the hardware documentation and the parameters codes are rocprofiler_feature_parameter_t values, +see below in the "Context API" section. + +Profiling info kind: + +typedef enum { + ROCPROFILER_INFO_KIND_METRIC = 0, // metric info + ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metrics count + ROCPROFILER_INFO_KIND_TRACE = 2, // trace info + ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // traces count +} rocprofiler_info_kind_t; + +Profiling info data: + +typedef struct { + rocprofiler_info_kind_t kind; // info data kind + union { + struct { + const char* name; // metric name + uint32_t instances; // instances number + const char* expr; // metric expression, NULL for basic counters + const char* description; // metric description + const char* block_name; // block name + uint32_t block_counters; // number of block counters + } metric; + struct { + const char* name; // trace name + const char* description; // trace description + uint32_t parameter_count; // supported by the trace number + // parameters + } trace; + }; +} rocprofiler_info_data_t; + +Return info for a given info kind: + +has_status_t rocprofiler_get_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + void *data); // data passed to callback + +Iterate over the info for a given info kind, and invoke an application-defined callback on +every iteration: + +has_status_t rocprofiler_iterate_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); + +Iterate over the info for a given info query, and invoke an application-defined callback on +every iteration. The query +fields set to NULL define the query wildcard: + +has_status_t rocprofiler_query_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + rocprofiler_info_data_t query, // info query + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); // data passed to callback +``` +### 4.4. Context API +``` +Profiling context is accumulating all profiling information including profiling features +which carry profiling data, required buffers for profiling command packets and output data. +The context can be created and deleted by the library open/close methods. By deleting +the context all accumulated by the library resources associated with this context will be +released. If it is required more than one run to collect all requested counters data then +data for all profiling groups should be collected and then the metrics can be calculated by +loading the saved groups' data to the profiling context. Saving and loading of the groups +data is responsibility of the tool. The groups are automatically identified on the profiling +context open and there is API to access them, see the "Profiling groups" section below. + +Profiling context handle: + +typename rocprofiler_t; + +Profiling feature kind: + +typedef enum { + ROCPROFILER_FEATURE_KIND_METRIC = 0, // metric + ROCPROFILER_FEATURE_KIND_TRACE = 1 // trace +} rocprofiler_feature_kind_t; + +Profiling feature parameter: + +typedef hsa_ven_amd_aqlprofile_parameter_t rocprofiler_feature_parameter_t; + +Profiling data kind: + +typedef enum { + ROCPROFILER_DATA_KIND_UNINIT = 0, // data uninitialized + ROCPROFILER_DATA_KIND_INT32 = 1, // 32bit integer + ROCPROFILER_DATA_KIND_INT64 = 2, // 64bit integer + ROCPROFILER_DATA_KIND_FLOAT = 3, // float single-precision result + ROCPROFILER_DATA_KIND_DOUBLE = 4, // float double-precision result + ROCPROFILER_DATA_KIND_BYTES = 5 // trace output as a bytes array +} rocprofiler_data_kind_t; + + +Profiling data: + +typedef struct { + rocprofiler_data_kind_t kind; // result kind + union { + uint32_t result_int32; // 32bit integer result + uint64_t result_int64; // 64bit integer result + float result_float; // float single-precision result + double result_double; // float double-precision result + typedef struct { + void* ptr; // pointer + uint32_t size; // byte size + uint32_t instances; // number of trace instances + } result_bytes; // data by ptr and byte size + }; +} rocprofiler_data_t; + +Profiling feature: + +typedef struct { + rocprofiler_feature_kind_t type; // feature type + const char* name; // feature name + const rocprofiler_feature_parameter_t* parameters; // feature parameters + uint32_t parameter_count; // feature parameter count + rocprofiler_data_t* data; // profiling data +} rocprofiler_feature_t; + +Profiling mode masks: +There are several modes which can be specified for the profiling context. +STANDALONE mode can be used for the counters sampling in another then application context +to support statistical system wide profiling. In this mode the profiling context supports +its own queue which can be created on the context open if the CREATEQUEUE mode also specified. +See also "Profiler properties" section below for the standalone mode queue properties. +The profiler supports several profiling groups for collecting profiling data in several +runs and 'SINGLEGROUP' mode allows only one group and the context open will fail if more +groups are needed. + +typedef enum { + ROCPROFILER_MODE_STANDALONE = 1, // standalone mode when ROC profiler + // supports own AQL queue + ROCPROFILER_MODE_CREATEQUEUE = 2, // profiler creates queue in STANDALONE mode + ROCPROFILER_MODE_SINGLEGROUP = 4 // profiler allows one group only and fails + // if more groups are needed +} rocprofiler_mode_t; + +Context data readiness callback: + +typedef void (*rocprofiler_context_callback_t)( + rocprofiler_group_t* group, // profiling group + void* arg); // callback arg + +Profiler properties: +There are several properties which can be specified for the context. A callback can be +registered which will be called when the context data is ready. In standalone profiling mode +'ROCPROFILER_MODE_STANDALONE' the context supports its own queue and the queue can be set by +the property 'queue' or a queue will be created with the specified depth 'queue_depth' if mode +'ROCPROFILER_MODE_CREATEQUEUE' also specified. + +typedef struct { + rocprofiler_context_callback_t callback; // callback on the context data readiness + void* callback_arg; // callback arg + has_queue_t* queue; // HSA queue for standalone mode + uint32_t queue_depth; // created queue depth,for create-queue mode +} rocprofiler_properties_t; + +Open/close profiling context: + +hsa_status_t rocprofiler_open( + hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in/out] profiling feature array + uint32_t feature_count, // profiling feature count + rocprofiler_t** context, // [out] profiling context handle + uint32_t mode, // profiling mode mask + rocprofiler_properties_t* properties); // profiler properties + +hsa_status_t rocprofiler_close( + rocprofiler_t* context); // [in] profiling context + +Profiling groups: +The profiler on the context open automatically identifies a required number of the application +runs to collect all data needed for all specified metrics and creates a metric group per each +run. Data for all profiling groups should be collected and then the metrics can be calculated +by loading the saved groups' data to the profiling context. Saving and loading of he groups +data is responsibility of the tool. + +typedef struct { + uint32_t index; // profiling group index + rocprofiler_feature_t** features; // profiling features array + uint32_t feature_count; // profiling feature count + rocprofiler_t* context; // profiling context handle +} rocprofiler_group_t; + +Return profiling groups count: + +hsa_status_t rocprofiler_group_count( + rocprofiler_t* context); // [in/out] profiling context + uint32* count); // [out] profiling groups count + +Return the profiling group for a given index: + +hsa_status_t rocprofiler_get_group( + rocprofiler_t* context, // [in/out] profiling context, + // will be returned as + // a part of the group structure + uint32_t index, // [in] group index + rocprofiler_group_t* group); // [out] profiling group + +Calculate metrics data. The data will be stored to the registered profiling features data fields: +After all profiling context data is ready the registered metrics can be calculated. The context +data readiness can be checked by 'get_data' API or using the context callback. + +hsa_status_t rocprofiler_get_metrics( + rocprofiler_t* context); // [in/out] profiling context + +Method for iterating trace data instances: +Trace data can have several instance, for example, one instance per Shader Engine. + +hsa_status_t rocprofiler_iterate_trace_data( + const rocprofiler_t* contex, // [in] context object + hsa_ven_amd_aqlprofile_data_callback_t callback, // [in] callback to iterate + // the output data + void* callback_data); // [in/out] passed to callback data + +Converting of profiling timestamp to time value for suported time ID. +Supported time value ID enumeration: +typedef enum { + ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time + ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 1, // Linux monotonic clock time +} rocprofiler_time_id_t; + +Method for converting of profiling timestamp to time value for a given time ID: +hsa_status_t rocprofiler_get_time( + rocprofiler_time_id_t time_id, // identifier of the particular + // time to convert the timestamp + uint64_t timestamp, // profiling timestamp + uint64_t* value_ns); // [out] returned time ‘ns’ value +``` +### 4.5. Sampling API +``` +The API supports the counters sampling usage model with start/read/stop methods and also lets +to wait for the profiling data in the intercepting usage model with get_data method. + +Start/stop/read methods: + +hsa_status_t rocprofiler_start( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +hsa_status_t rocprofiler_stop( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +hsa_status_t rocprofiler_read( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +Wait for profiling data: + +hsa_status_t rocprofiler_get_data( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +Group versions of the above start/stop/read/get_data methods: + +hsa_status_t rocprofiler_group_start( + rocprofiler_group_t* group); // [in/out] profiling group + +hsa_status_t rocprofiler_group_stop( + rocprofiler_group_t* group); // [in/out] profiling group + + +hsa_status_t rocprofiler_group_read( + rocprofiler_group_t* group); // [in/out] profiling group + + +hsa_status_t rocprofiler_group_get_data( + rocprofiler_group_t* group); // [in/out] profiling group +``` +### 4.6. Intercepting API +``` +The library provides a callback API for enabling profiling for the kernels dispatched to +HSA AQL queues. The API enables per-kernel profiling data collection. +Currently implemented the option with serializing the kernels execution. + +ROC profiler callback type: + +hsa_status_t (*rocprofiler_callback_t)( + const rocprofiler_callback_data_t* callback_data, // callback data passed by HSA runtime + void* user_data, // [in/out] user data passed + // to the callback + rocprofiler_group** group); // [out] returned profiling group + +Profiling callback data: + +typedef struct { + uint64_t dispatch; // dispatch timestamp + uint64_t begin; // begin timestamp + uint64_t end; // end timestamp + uint64_t complete; // completion signal timestamp +} rocprofiler_dispatch_record_t; + +typedef struct { + hsa_agent_t agent; // GPU agent handle + uint32_t agent_index; // GPU index + const hsa_queue_t* queue; // HSA queue + uint64_t queue_index; // Index in the queue + const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet + const char* kernel_name; // Kernel name + const rocprofiler_dispatch_record_t* record; // Dispatch record +} rocprofiler_callback_data_t; + +Queue callbacks: + +typedef struct { + rocprofiler_callback_t dispatch; // kernel dispatch callback + hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // queue destroy callback +} rocprofiler_queue_callbacks_t; + +Adding/removing kernel dispatch and queue destroy callbacks + +hsa_status_t rocprofiler_set_intercepting( + rocprofiler_intercepting_t callbacks, // intercepting callbacks + void* data); // [in/out] passed callbacks data + +hsa_status_t rocprofiler_remove_intercepting(); +``` +### 4.7. Profiling Context Pools +``` +The API provide capability to create a context pool for a given agent and a set of features, to fetch/release a context entry, to register a callback for pool’s contexts completion. +Profiling pool handle: +typename rocprofiler_pool_t; +Profiling pool entry: +typedef struct { + rocprofiler_t* context; // context object + void* payload; // payload data object +} rocprofiler_pool_entry_t; + +Profiling handler, calling on profiling completion: +typedef bool (*rocprofiler_pool_handler_t)(const rocprofiler_pool_entry_t* entry, void* arg); + +Profiling properties: +typedef struct { + uint32_t num_entries; // pool size entries + uint32_t payload_bytes; // payload size bytes + rocprofiler_pool_handler_t handler; // handler on context completion + void* handler_arg; // the handler arg +} rocprofiler_pool_properties_t; + +Open profiling pool: +hsa_status_t rocprofiler_pool_open( + hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in] profiling features array + uint32_t feature_count, // profiling info count + rocprofiler_pool_t** pool, // [out] context object + uint32_t mode, // profiling mode mask + rocprofiler_pool_properties_t*); // pool properties + +Close profiling pool: +hsa_status_t rocprofiler_pool_close( + rocprofiler_pool_t* pool); // profiling pool handle + +Fetch profiling pool entry: +hsa_status_t rocprofiler_pool_fetch( + rocprofiler_pool_t* pool, // profiling pool handle + rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry + +Release profiling pool entry: +hsa_status_t rocprofiler_pool_release( + rocprofiler_pool_entry_t* entry); // released profiling pool entry + +Iterate fetched profiling pool entries: +hsa_status_t rocprofiler_pool_iterate( + rocprofiler_pool_t* pool, // profiling pool handle + hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), + // callback + void *data); // [in/out] data passed to callback + +Flush completed entries in profiling pool: +hsa_status_t rocprofiler_pool_flush( + rocprofiler_pool_t* pool); // profiling pool handle +``` +## 5. Application code examples +### 5.1. Querying available metrics +``` +Info data callback: + + hsa_status_t info_data_callback(const rocprofiler_info_data_t info, void *data) { + switch (info.kind) { + case ROCPROFILER_INFO_KIND_METRIC: { + if (info.metric.expr != NULL) { + fprintf(stdout, "Derived counter: gpu-agent%d : %s : %s\n", + info.agent_index, info.metric.name, info.metric.description); + fprintf(stdout, " %s = %s\n", info.metric.name, info.metric.expr); + } else { + fprintf(stdout, "Basic counter: gpu-agent%d : %s", + info.agent_index, info.metric.name); + if (info.metric.instances > 1) { + fprintf(stdout, "[0-%u]", info.metric.instances - 1); + } + fprintf(stdout, " : %s\n", info.metric.description); + fprintf(stdout, " block %s has %u counters\n", + info.metric.block_name, info.metric.block_counters); + } + fflush(stdout); + break; + } + default: + printf("wrong info kind %u\n", kind); + return HSA_STATUS_ERROR; + } + return HSA_STATUS_SUCCESS; + } + +Printing all available metrics: + + hsa_status_t status = rocprofiler_iterate_info( + agent, + ROCPROFILER_INFO_KIND_METRIC, + info_data_callback, + NULL); + +``` +### 5.2. Profiling code example +``` +Profiling of L1 miss ratio, average memory bandwidth. +In the example below rocprofiler_group_get_data group APIs are used for the purpose of a usage +example but in SINGLEGROUP mode when only one group is allowed the context handle itself can be +saved and then direct context method rocprofiler_get_data with default group index equal to 0 +can be used. + +hsa_status_t dispatch_callback( + const rocprofiler_callback_data_t* callback_data, + void* user_data, + rocprofiler_group_t* group) +{ + hsa_status_t status = HSA_STATUS_SUCCESS; + // Profiling context + rocprofiler_t* context; + // Profiling info objects + rocprofiler_feature_t features* = new rocprofiler_feature_t[2]; + // Tracing parameters + rocprofiler_feature_parameter_t* parameters = new rocprofiler_feature_parameter_t[2]; + + // Setting profiling features + features[0].type = ROCPROFILER_METRIC; + features[0].name = "L1_MISS_RATIO"; + features[1].type = ROCPROFILER_METRIC; + features[1].name = "DRAM_BANDWIDTH"; + + // Creating profiling context + status = rocprofiler_open(callback_data->dispatch.agent, features, 2, &context, + ROCPROFILER_MODE_SINGLEGROUP, NULL); + + + // Get the profiling group + // For general case with many groups there is rocprofiler_group_count() API + const uint32_t group_index = 0 + status = rocprofiler_get_group(context, group_index, group); + + + // In SINGLEGROUP mode the context handle itself can be saved, because there is just one group + + + return status; +} + +Profiling tool constructor is adding the dispatch callback: + +void profiling_libary_constructor() { + // Defining callback data, no data in this simple example + void* callback_data = NULL; + + // Adding observers + hsa_sttaus_t status = rocprofiler_add_dispatch_callback(dispatch_callback, callback_data); + + + // Dispatching profiled kernel + +} + +void profiling_libary_destructor() { + > { + // In SINGLEGROUP mode the rocprofiler_get_group() method with default zero group + // index can be used, if context handle would be saved + status = rocprofiler_group_get_data(entry->group); + + status = rocprofiler_get_metrics(entry->group->context); + + status = rocprofiler_close(entry->group->context); + + + dispatch_data, entry->features, entry->features_count)>; + } +} +``` +### 5.3. Option to use completion callback +``` +Creating profiling context with completion callback: + . . . + rocprofiler_properties_t properties = {}; + properties.callback = completion_callback; + properties.callback_arg = NULL; // no args defined + status = rocprofiler_open(agent, features, 3, &context, + ROCPROFILER_MODE_SINGLEGROUP, properties); + + . . . + +Definition of completion callback: + +void completion_callback(profiler_group_t group, void* arg) { + + hsa_status_t status = rocprofiler_close(group.context); + +} +``` +### 5.4. Option to Use Context Pool +``` +Code example of context pool usage. +Creating profiling contexts pool: + . . . + rocprofiler_pool_properties_t properties{}; + properties.num_entries = 100; + properties.payload_bytes = sizeof(context_entry_t); + properties.handler = context_handler; + properties.handler_arg = handler_arg; + status = rocprofiler_pool_open(agent, features, 3, &context, + ROCPROFILER_MODE_SINGLEGROUP, properties); + + . . . + +Fetching a context entry: + rocprofiler_pool_entry_t pool_entry{}; + status = rocprofiler_pool_fetch(pool, &pool_entry); + + // Profiling context entry + rocprofiler_t* context = pool_entry.context; + context_entry_t* entry = reinterpret_cast + (pool_entry.payload); +``` +### 5.5. Standalone Sampling Usage Code Example +``` +The profiling metrics are being read from separate standalone queue other than the application kernels are submitted to. +To enable the sampling mode, the profiling mode in all user queues should be enabled. It can be done by loading ROC-profiler +library to HSA runtime using the environment variable HSA_TOOLS_LIB for all shell sessions. + // Sampling rate + uint32_t sampling_rate = ; + // Sampling count + uint32_t sampling_count = ; + // HSA status + hsa_status_t status = HSA_STATUS_ERROR; + // HSA agent + hsa_agent_t agent; + // Profiling context + rocprofiler_t* context = NULL; + // Profiling properties + rocprofiler_properties_t properties; + + // Getting HSA agent + + + // Profiling feature objects + const unsigned feature_count = 2; + rocprofiler_feature_t feature[feature_count]; + + // Counters and metrics + feature[0].kind = ROCPROFILER_FEATURE_KIND_METRIC; + feature[0].name = "GPUBusy"; + feature[1].kind = ROCPROFILER_FEATURE_KIND_METRIC; + feature[1].name = "SQ_WAVES"; + + // Creating profiling context with standalone queue + properties = {}; + properties.queue_depth = 128; + status = rocprofiler_open(agent, feature, feature_count, &context, + ROCPROFILER_MODE_STANDALONE| ROCPROFILER_MODE_CREATEQUEUE| + ROCPROFILER_MODE_SINGLEGROUP, &properties); + + + // Start counters and sample them in the loop with the sampling rate + status = rocprofiler_start(context, 0); + + + for (unsigned ind = 0; ind < sampling_count; ++ind) { + sleep(sampling_rate); + status = rocprofiler_read(context, 0); + + status = rocprofiler_get_data(context, 0); + + status = rocprofiler_get_metrics(context); + + print_results(feature, feature_count); + } + + // Stop counters + status = rocprofiler_stop(context, group_n); + + + // Finishing cleanup + // Deleting profiling context will delete all allocated resources + status = rocprofiler_close(context); + +``` +### 5.6. Printing Out Profiling Results +``` +Below is a code example for printing out the profiling results from profiling features array: +void print_results(rocprofiler_feature_t* feature, uint32_t feature_count) { + for (rocprofiler_feature_t* p = feature; p < feature + feature_count; ++p) + { + std::cout << (p - feature) << ": " << p->name; + switch (p->data.kind) { + case ROCPROFILER_DATA_KIND_INT64: + std::cout << " result_int64 (" << p->data.result_int64 << ")" + << std::endl; + break; + + case ROCPROFILER_DATA_KIND_BYTES: { + std::cout << " result_bytes ptr(" << p->data.result_bytes.ptr << + ") " << " size(" << p->data.result_bytes.size << ")" + << " instance_count(" << p->data.result_bytes.instance_count + << ")"; + break; + } + default: + std::cout << "bad result kind (" << p->data.kind << ")" + << std::endl; + + } + } +} +``` diff --git a/doc/ROC_profiler_API.pptx b/doc/ROC_profiler_V1_API.pptx similarity index 100% rename from doc/ROC_profiler_API.pptx rename to doc/ROC_profiler_V1_API.pptx diff --git a/doc/Rocprofiler_V1_Usage_Documentation.pdf b/doc/Rocprofiler_V1_Usage_Documentation.pdf new file mode 100644 index 00000000..04eda3a6 Binary files /dev/null and b/doc/Rocprofiler_V1_Usage_Documentation.pdf differ diff --git a/doc/data/HIP Trace.svg b/doc/data/HIP Trace.svg new file mode 100644 index 00000000..80c676cd --- /dev/null +++ b/doc/data/HIP Trace.svg @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/doc/data/HIP_API_Execution.svg b/doc/data/HIP_API_Execution.svg new file mode 100644 index 00000000..bbe1f239 --- /dev/null +++ b/doc/data/HIP_API_Execution.svg @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/doc/data/HIP_Copy_Tasks.svg b/doc/data/HIP_Copy_Tasks.svg new file mode 100644 index 00000000..86f81d1a --- /dev/null +++ b/doc/data/HIP_Copy_Tasks.svg @@ -0,0 +1,22 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/doc/data/HIP_GPU_Tasks.svg b/doc/data/HIP_GPU_Tasks.svg new file mode 100644 index 00000000..812cbac6 --- /dev/null +++ b/doc/data/HIP_GPU_Tasks.svg @@ -0,0 +1,19 @@ + + + + + + + + + + + + + + + + + + + diff --git a/doc/data/HIP_trace_time_range.svg b/doc/data/HIP_trace_time_range.svg new file mode 100644 index 00000000..13ce16e4 --- /dev/null +++ b/doc/data/HIP_trace_time_range.svg @@ -0,0 +1,15 @@ + + + + + + + + + + + + + + + diff --git a/doc/data/HSA_Trace.svg b/doc/data/HSA_Trace.svg new file mode 100644 index 00000000..0dab673c --- /dev/null +++ b/doc/data/HSA_Trace.svg @@ -0,0 +1,22 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/doc/data/Roctx_Trace.svg b/doc/data/Roctx_Trace.svg new file mode 100644 index 00000000..a00132f8 --- /dev/null +++ b/doc/data/Roctx_Trace.svg @@ -0,0 +1,17 @@ + + + + + + + + + + + + + + + + + diff --git a/doc/data/Sys_Trace.svg b/doc/data/Sys_Trace.svg new file mode 100644 index 00000000..f2678436 --- /dev/null +++ b/doc/data/Sys_Trace.svg @@ -0,0 +1,19 @@ + + + + + + + + + + + + + + + + + + + diff --git a/doc/rocprof.md b/doc/rocprof.md new file mode 100644 index 00000000..3b4c9f99 --- /dev/null +++ b/doc/rocprof.md @@ -0,0 +1,393 @@ +# rocprof +## 1. Overview +The rocProf is a command line tool implemented on the top of rocProfiler and rocTracer APIs. Source code for rocProf may be found here: +GitHub: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/bin/rocprof +This command line tool is implemented as a script which is setting up the environment for attaching the profiler and then run the provided application command line. The tool uses two profiling plugins loaded by ROC runtime and based on rocProfiler and rocTracer for collecting metrics/counters, HW traces and runtime API/activity traces. The tool consumes an input XML or text file with counters list or trace parameters and provides output profiling data and statistics in various formats as text, CSV and JSON traces. Google Chrome tracing can be used to visualize the JSON traces with runtime API/activity timelines and per kernel counters data. +## 2. Profiling Modes +‘rocprof’ can be used for GPU profiling using HW counters and application tracing +### 2.1. GPU profiling +GPU profiling is controlled with input file which defines a list of metrics/counters and a profiling scope. An input file is provided using option ‘-i ’. Output CSV file with a line per submitted kernel is generated. Each line has kernel name, kernel parameters and counter values. By option ‘—stats’ the kernel execution stats can be generated in CSV format. Currently profiling has limitation of serializing submitted kernels. +An example of input file: +``` + # Perf counters group 1 + pmc : Wavefronts VALUInsts SALUInsts SFetchInsts + # Perf counters group 2 + pmc : TCC_HIT[0], TCC_MISS[0] + # Filter by dispatches range, GPU index and kernel names + # supported range formats: "3:9", "3:", "3" + range: 1 : 4 + gpu: 0 1 2 3 + kernel: simple Pass1 simpleConvolutionPass2 +``` +An example of profiling command line for ‘MatrixTranspose’ application +``` +$ rocprof -i input.txt MatrixTranspose +RPL: on '191018_011134' from '/…./rocprofiler_pkg' in '/…./MatrixTranspose' +RPL: profiling '"./MatrixTranspose"' +RPL: input file 'input.txt' +RPL: output dir '/tmp/rpl_data_191018_011134_9695' +RPL: result dir '/tmp/rpl_data_191018_011134_9695/input0_results_191018_011134' +ROCProfiler: rc-file '/…./rpl_rc.xml' +ROCProfiler: input from "/tmp/rpl_data_191018_011134_9695/input0.xml" + gpu_index = + kernel = + range = + 4 metrics + L2CacheHit, VFetchInsts, VWriteInsts, MemUnitStalled + 0 traces +Device name Ellesmere [Radeon RX 470/480/570/570X/580/580X] +PASSED! + +ROCPRofiler: 1 contexts collected, output directory /tmp/rpl_data_191018_011134_9695/input0_results_191018_011134 +RPL: '/…./MatrixTranspose/input.csv' is generated +``` +#### 2.1.1. Counters and metrics +There are two profiling features, metrics and traces. Hardware performance counters are treated as the basic metrics and the formulas can be defined for derived metrics. +Counters and metrics can be dynamically configured using XML configuration files with counters and metrics tables: + - Counters table entry, basic metric: counter name, block name, event id + - Derived metrics table entry: metric name, an expression for calculation the metric from the counters + +Metrics XML File Example: +``` + + + + . . . + + + + . . . + + + + + +``` +##### 2.1.1.1. Metrics query +Available counters and metrics can be queried by options ‘—list-basic’ for counters and ‘—list-derived’ for derived metrics. The output for counters indicates number of block instances and number of block counter registers. The output for derived metrics prints the metrics expressions. +Examples: +``` +$ rocprof --list-basic +RPL: on '191018_014450' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +ROCProfiler: rc-file '/…./rpl_rc.xml' +Basic HW counters: + gpu-agent0 : GRBM_COUNT : Tie High - Count Number of Clocks + block GRBM has 2 counters + gpu-agent0 : GRBM_GUI_ACTIVE : The GUI is Active + block GRBM has 2 counters + . . . + gpu-agent0 : TCC_HIT[0-15] : Number of cache hits. + block TCC has 4 counters + gpu-agent0 : TCC_MISS[0-15] : Number of cache misses. UC reads count as misses. + block TCC has 4 counters + . . . + +$ rocprof --list-derived +RPL: on '191018_015911' from '/opt/rocm/rocprofiler' in '/home/evgeny/work/BUILD/0_MatrixTranspose' +ROCProfiler: rc-file '/home/evgeny/rpl_rc.xml' +Derived metrics: + gpu-agent0 : TCC_HIT_sum : Number of cache hits. Sum over TCC instances. + TCC_HIT_sum = sum(TCC_HIT,16) + gpu-agent0 : TCC_MISS_sum : Number of cache misses. Sum over TCC instances. + TCC_MISS_sum = sum(TCC_MISS,16) + gpu-agent0 : TCC_MC_RDREQ_sum : Number of 32-byte reads. Sum over TCC instaces. + TCC_MC_RDREQ_sum = sum(TCC_MC_RDREQ,16) + . . . +``` +##### 2.1.1.2. Metrics collecting +Counters and metrics accumulated per kernel can be collected using input file with a list of metrics, see an example in 2.1. +Currently profiling has limitation of serializing submitted kernels. +The number of counters which can be dumped by one run is limited by GPU HW by number of counter registers per block. The number of counters can be different for different blocks and can be queried, see 2.1.1.1. +###### 2.1.1.2.1. Blocks instancing +GPU blocks are implemented as several identical instances. To dump counters of specific instance square brackets can be used, see an example in 2.1. +The number of block instances can be queried, see 2.1.1.1. +###### 2.1.1.2.2. HW limitations +The number of counters which can be dumped by one run is limited by GPU HW by number of counter registers per block. The number of counters can be different for different blocks and can be queried, see 2.1.1.1. + - Metrics groups + +To dump a list of metrics exceeding HW limitations the metrics list can be split on groups. +The tool supports automatic splitting on optimal metric groups: +``` +$ rocprof -i input.txt ./MatrixTranspose +RPL: on '191018_032645' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +RPL: profiling './MatrixTranspose' +RPL: input file 'input.txt' +RPL: output dir '/tmp/rpl_data_191018_032645_12106' +RPL: result dir '/tmp/rpl_data_191018_032645_12106/input0_results_191018_032645' +ROCProfiler: rc-file '/…./rpl_rc.xml' +ROCProfiler: input from "/tmp/rpl_data_191018_032645_12106/input0.xml" + gpu_index = + kernel = + range = + 20 metrics + Wavefronts, VALUInsts, SALUInsts, SFetchInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts, VALUUtilization, FetchSize, WriteSize, L2CacheHit, VWriteInsts, GPUBusy, VALUBusy, SALUBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict, MemUnitBusy + 0 traces +Device name Ellesmere [Radeon RX 470/480/570/570X/580/580X] + +Input metrics out of HW limit. Proposed metrics group set: + group1: L2CacheHit VWriteInsts MemUnitStalled WriteUnitStalled MemUnitBusy FetchSize FlatVMemInsts LDSInsts VALUInsts SALUInsts SFetchInsts FlatLDSInsts GPUBusy Wavefronts + group2: WriteSize GDSInsts VALUUtilization VALUBusy SALUBusy LDSBankConflict + +ERROR: rocprofiler_open(), Construct(), Metrics list exceeds HW limits + +Aborted (core dumped) +Error found, profiling aborted. +``` + - Collecting with multiple runs + +To collect several metric groups a full application replay is used by defining several ‘pmc:’ lines in the input file, see 2.1. + +### 2.2. Application tracing +Supported application tracing includes runtime API and GPU activity tracing’ +Supported runtimes are: ROCr (HSA API) and HIP +Supported GPU activity: kernel execution, async memory copy, barrier packets. +The trace is generated in JSON format compatible with Chrome tracing. +The trace consists of several sections with timelines for API trace per thread and GPU activity. The timelines events show event name and parameters. +Supported options: ‘—hsa-trace’, ‘—hip-trace’, ‘—sys-trace’, where ‘sys trace’ is for HIP and HSA combined trace. +#### 2.2.1. HIP runtime trace +The trace is generated by option ‘—hip-trace’ and includes HIP API timelines and GPU activity at the runtime level. +#### 2.2.2. ROCr runtime trace +The trace is generated by option ‘—hsa-trace’ and includes ROCr API timelines and GPU activity at AQL queue level. Also, can provide counters per kernel. +#### 2.2.3. KFD driver trace +The trace is generated by option ‘—kfd-trace’ and includes KFD Thunk API timeline. +It is planned to add memory allocations/migration tracing. +#### 2.2.4. Code annotation +Support for application code annotation. +Start/stop API is supported to programmatically control the profiling. +A ‘roctx’ library provides annotation API. Annotation is visualized in JSON trace as a separate "Markers and Ranges" timeline section. +##### 2.2.4.1. Start/stop API +``` +// Tracing start API +void roctracer_start(); + +// Tracing stop API +void roctracer_stop(); +``` +##### 2.2.4.2. rocTX basic markers API +``` +// A marker created by given ASCII massage +void roctxMark(const char* message); + +// Returns the 0 based level of a nested range being started by given message associated to this range. +// A negative value is returned on the error. +int roctxRangePush(const char* message); + +// Marks the end of a nested range. +// Returns the 0 based level the range. +// A negative value is returned on the error. +int roctxRangePop(); +``` +### 2.3. Multiple GPUs profiling +The profiler supports multiple GPU’s profiling and provide GPI id for counters and kernels data in CSV output file. Also, GPU id is indicating for respective GPU activity timeline in JSON trace. +## 3. Profiling control +Profiling can be controlled by specifying a profiling scope, by filtering trace events and specifying interesting time intervals. +### 3.1. Profiling scope +Counters profiling scope can be specified by GPU id list, kernel name substrings list and dispatch range. +Supported range formats examples: "3:9", "3:", "3". You can see an example of input file in 2.1. +#### 3.2. Tracing control +Tracing can be filtered by events names using profiler input file and by enabling interesting time intervals by command line option. +#### 3.2.1. Filtering traced APIs +A list of traced API names can be specified in profiler input file. +An example of input file line for ROCr runtime trace (HAS API): +``` +hsa: hsa_queue_create hsa_amd_memory_pool_allocate +``` +#### 3.2.2. Tracing time period +Trace can be dumped periodically with initial delay, dumping period length and rate: +``` +--trace-period +``` +### 3.3. Concurrent kernels +Currently concurrent kernels profiling is not supported which is a planned feature. Kernels are serialized. +### 3.4. Multi-processes profiling +Multi-processes profiling is not currently supported. +### 3.5. Errors logging +Profiler errors are logged to global logs: +``` +/tmp/aql_profile_log.txt +/tmp/rocprofiler_log.txt +/tmp/roctracer_log.txt +``` +## 4. 3rd party visualization tools +‘rocprof’ is producing JSON trace compatible with Chrome Tracing, which is an internal trace visualization tool in Google Chrome. +### 4.1. Chrome tracing +Good review can be found by the link: https://aras-p.info/blog/2017/01/23/Chrome-Tracing-as-Profiler-Frontend/ +## 5. Command line options +The command line options can be printed with option ‘-h’: +``` +$ rocprof -h +RPL: on '191018_023018' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package. +Full path: /opt/rocm/rocprofiler/bin/rocprof +Metrics definition: /opt/rocm/rocprofiler/lib/metrics.xml + +Usage: + rocprof [-h] [--list-basic] [--list-derived] [-i ] [-o ] + +Options: + -h - this help + --verbose - verbose mode, dumping all base counters used in the input metrics + --list-basic - to print the list of basic HW counters + --list-derived - to print the list of derived metrics with formulas + --cmd-qts - quoting profiled cmd-line [on] + + -i <.txt|.xml file> - input file + Input file .txt format, automatically rerun application for every pmc line: + + # Perf counters group 1 + pmc : Wavefronts VALUInsts SALUInsts SFetchInsts FlatVMemInsts LDSInsts FlatLDSInsts GDSInsts VALUUtilization FetchSize + # Perf counters group 2 + pmc : WriteSize L2CacheHit + # Filter by dispatches range, GPU index and kernel names + # supported range formats: "3:9", "3:", "3" + range: 1 : 4 + gpu: 0 1 2 3 + kernel: simple Pass1 simpleConvolutionPass2 + + Input file .xml format, for single profiling run: + + # Metrics list definition, also the form ":" can be used + # All defined metrics can be found in the 'metrics.xml' + # There are basic metrics for raw HW counters and high-level metrics for derived counters + + + # Filter by dispatches range, GPU index and kernel names + + + -o - output CSV file [.csv] + -d - directory where profiler store profiling data including traces [/tmp] + The data directory is renoving autonatically if the directory is matching the temporary one, which is the default. + -t - to change the temporary directory [/tmp] + By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory. + + --basenames - to turn on/off truncating of the kernel full function names till the base ones [off] + --timestamp - to turn on/off the kernel disoatches timestamps, dispatch/begin/end/complete [off] + --ctx-wait - to wait for outstanding contexts on profiler exit [on] + --ctx-limit - maximum number of outstanding contexts [0 - unlimited] + --heartbeat - to print progress heartbeats [0 - disabled] + --obj-tracking - to turn on/off kernels code objects tracking [off] + + --stats - generating kernel execution stats, file .stats.csv + + --roctx-trace - to enable rocTX application code annotation trace, "Markers and Ranges" JSON trace section. + --sys-trace - to trace HIP/HSA APIs and GPU activity, generates stats and JSON trace chrome-tracing compatible + --hip-trace - to trace HIP, generates API execution stats and JSON file chrome-tracing compatible + --hsa-trace - to trace HSA, generates API execution stats and JSON file chrome-tracing compatible + --kfd-trace - to trace KFD, generates API execution stats and JSON file chrome-tracing compatible + Generated files: ._stats.txt .json + Traced API list can be set by input .txt or .xml files. + Input .txt: + hsa: hsa_queue_create hsa_amd_memory_pool_allocate + Input .xml: + + + + + + --trace-start - to enable tracing on start [on] + --trace-period - to enable trace with initial delay, with periodic sample length and rate + Supported time formats: + +Configuration file: + You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/evgeny: + First the configuration file is looking in the current directory, then in your home, and then in the package directory. + Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat', 'obj-tracking'. + An example of 'rpl_rc.xml': + +``` +## 6. Publicly available counters and metrics +The following counters are publicly available for commercially available VEGA10/20 GPUs. + +Counters: +``` +• GRBM_COUNT : Tie High - Count Number of Clocks +• GRBM_GUI_ACTIVE : The GUI is Active +• SQ_WAVES : Count number of waves sent to SQs. (per-simd, emulated, global) +• SQ_INSTS_VALU : Number of VALU instructions issued. (per-simd, emulated) +• SQ_INSTS_VMEM_WR : Number of VMEM write instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_VMEM_RD : Number of VMEM read instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_SALU : Number of SALU instructions issued. (per-simd, emulated) +• SQ_INSTS_SMEM : Number of SMEM instructions issued. (per-simd, emulated) +• SQ_INSTS_FLAT : Number of FLAT instructions issued. (per-simd, emulated) +• SQ_INSTS_FLAT_LDS_ONLY : Number of FLAT instructions issued that read/wrote only from/to LDS (only works if EARLY_TA_DONE is enabled). (per-simd, emulated) +• SQ_INSTS_LDS : Number of LDS instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_GDS : Number of GDS instructions issued. (per-simd, emulated) +• SQ_WAIT_INST_LDS : Number of wave-cycles spent waiting for LDS instruction issue. In units of 4 cycles. (per-simd, nondeterministic) +• SQ_ACTIVE_INST_VALU : regspec 71? Number of cycles the SQ instruction arbiter is working on a VALU instruction. (per-simd, nondeterministic) +• SQ_INST_CYCLES_SALU : Number of cycles needed to execute non-memory read scalar operations. (per-simd, emulated) +• SQ_THREAD_CYCLES_VALU : Number of thread-cycles used to execute VALU operations (similar to INST_CYCLES_VALU but multiplied by # of active threads). (per-simd) +• SQ_LDS_BANK_CONFLICT : Number of cycles LDS is stalled by bank conflicts. (emulated) +• TA_TA_BUSY[0-15] : TA block is busy. Perf_Windowing not supported for this counter. +• TA_FLAT_READ_WAVEFRONTS[0-15] : Number of flat opcode reads processed by the TA. +• TA_FLAT_WRITE_WAVEFRONTS[0-15] : Number of flat opcode writes processed by the TA. +• TCC_HIT[0-15] : Number of cache hits. +• TCC_MISS[0-15] : Number of cache misses. UC reads count as misses. +• TCC_EA_WRREQ[0-15] : Number of transactions (either 32-byte or 64-byte) going over the TC_EA_wrreq interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. +• TCC_EA_WRREQ_64B[0-15] : Number of 64-byte transactions going (64-byte write or CMPSWAP) over the TC_EA_wrreq interface. +• TCC_EA_WRREQ_STALL[0-15] : Number of cycles a write request was stalled. +• TCC_EA_RDREQ[0-15] : Number of TCC/EA read requests (either 32-byte or 64-byte) +• TCC_EA_RDREQ_32B[0-15] : Number of 32-byte TCC/EA read requests +• TCP_TCP_TA_DATA_STALL_CYCLES[0-15] : TCP stalls TA data interface. Now Windowed. +``` + +The following derived metrics have been defined and the profiler metrics XML specification can be found at: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/test/tool/metrics.xml. + +Metrics: +``` +• TA_BUSY_avr : TA block is busy. Average over TA instances. +• TA_BUSY_max : TA block is busy. Max over TA instances. +• TA_BUSY_min : TA block is busy. Min over TA instances. +• TA_FLAT_READ_WAVEFRONTS_sum : Number of flat opcode reads processed by the TA. Sum over TA instances. +• TA_FLAT_WRITE_WAVEFRONTS_sum : Number of flat opcode writes processed by the TA. Sum over TA instances. +• TCC_HIT_sum : Number of cache hits. Sum over TCC instances. +• TCC_MISS_sum : Number of cache misses. Sum over TCC instances. +• TCC_EA_RDREQ_32B_sum : Number of 32-byte TCC/EA read requests. Sum over TCC instances. +• TCC_EA_RDREQ_sum : Number of TCC/EA read requests (either 32-byte or 64-byte). Sum over TCC instances. +• TCC_EA_WRREQ_sum : Number of transactions (either 32-byte or 64-byte) going over the TC_EA_wrreq interface. Sum over TCC instances. +• TCC_EA_WRREQ_64B_sum : Number of 64-byte transactions going (64-byte write or CMPSWAP) over the TC_EA_wrreq interface. Sum over TCC instances. +• TCC_WRREQ_STALL_max : Number of cycles a write request was stalled. Max over TCC instances. +• TCC_MC_WRREQ_sum : Number of 32-byte effective writes. Sum over TCC instaces. +• FETCH_SIZE : The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• WRITE_SIZE : The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• GPUBusy : The percentage of time GPU was busy. +• Wavefronts : Total wavefronts. +• VALUInsts : The average number of vector ALU instructions executed per work-item (affected by flow control). +• SALUInsts : The average number of scalar ALU instructions executed per work-item (affected by flow control). +• VFetchInsts : The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory. +• SFetchInsts : The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control). +• VWriteInsts : The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory. +• FlatVMemInsts : The average number of FLAT instructions that read from or write to the video memory executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch. +• LDSInsts : The average number of LDS read or LDS write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS. +• FlatLDSInsts : The average number of FLAT instructions that read or write to LDS executed per work item (affected by flow control). +• GDSInsts : The average number of GDS read or GDS write instructions executed per work item (affected by flow control). +• VALUUtilization : The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). +• VALUBusy : The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). +• SALUBusy : The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). +• Mem32Bwrites : +• FetchSize : The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• WriteSize : The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• L2CacheHit : The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal). +• MemUnitBusy : The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). +• MemUnitStalled : The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). +• WriteUnitStalled : The percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad). +• ALUStalledByLDS : The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). +• LDSBankConflict : The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). +``` diff --git a/doc/rocprof_tool.md b/doc/rocprof_tool.md new file mode 100644 index 00000000..7746d1ff --- /dev/null +++ b/doc/rocprof_tool.md @@ -0,0 +1,393 @@ +# rocprof +## 1. Overview +The rocProf is a command line tool implemented on the top of rocProfiler and rocTracer APIs. Source code for rocProf may be found here: +GitHub: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/bin/rocprof +This command line tool is implemented as a script which is setting up the environment for attaching the profiler and then run the provided application command line. The tool uses two profiling plugins loaded by ROC runtime and based on rocProfiler and rocTracer for collecting metrics/counters, HW traces and runtime API/activity traces. The tool consumes an input XML or text file with counters list or trace parameters and provides output profiling data and statistics in various formats as text, CSV and JSON traces. Google Chrome tracing can be used to visualize the JSON traces with runtime API/activity timelines and per kernel counters data. +## 2. Profiling Modes +‘rocprof’ can be used for GPU profiling using HW counters and application tracing +### 2.1. GPU profiling +GPU profiling is controlled with input file which defines a list of metrics/counters and a profiling scope. An input file is provided using option ‘-i ’. Output CSV file with a line per submitted kernel is generated. Each line has kernel name, kernel parameters and counter values. By option ‘—stats’ the kernel execution stats can be generated in CSV format. Currently profiling has limitation of serializing submitted kernels. +An example of input file: +``` + # Perf counters group 1 + pmc : Wavefronts VALUInsts SALUInsts SFetchInsts + # Perf counters group 2 + pmc : TCC_HIT[0], TCC_MISS[0] + # Filter by dispatches range, GPU index and kernel names + # supported range formats: "3:9", "3:", "3" + range: 1 : 4 + gpu: 0 1 2 3 + kernel: simple Pass1 simpleConvolutionPass2 +``` +An example of profiling command line for ‘MatrixTranspose’ application +``` +$ rocprof -i input.txt MatrixTranspose +RPL: on '191018_011134' from '/…./rocprofiler_pkg' in '/…./MatrixTranspose' +RPL: profiling '"./MatrixTranspose"' +RPL: input file 'input.txt' +RPL: output dir '/tmp/rpl_data_191018_011134_9695' +RPL: result dir '/tmp/rpl_data_191018_011134_9695/input0_results_191018_011134' +ROCProfiler: rc-file '/…./rpl_rc.xml' +ROCProfiler: input from "/tmp/rpl_data_191018_011134_9695/input0.xml" + gpu_index = + kernel = + range = + 4 metrics + L2CacheHit, VFetchInsts, VWriteInsts, MemUnitStalled + 0 traces +Device name Ellesmere [Radeon RX 470/480/570/570X/580/580X] +PASSED! + +ROCPRofiler: 1 contexts collected, output directory /tmp/rpl_data_191018_011134_9695/input0_results_191018_011134 +RPL: '/…./MatrixTranspose/input.csv' is generated +``` +#### 2.1.1. Counters and metrics +There are two profiling features, metrics and traces. Hardware performance counters are treated as the basic metrics and the formulas can be defined for derived metrics. +Counters and metrics can be dynamically configured using XML configuration files with counters and metrics tables: + - Counters table entry, basic metric: counter name, block name, event id + - Derived metrics table entry: metric name, an expression for calculation the metric from the counters + +Metrics XML File Example: +``` + + + + . . . + + + + . . . + + + + + +``` +##### 2.1.1.1. Metrics query +Available counters and metrics can be queried by options ‘—list-basic’ for counters and ‘—list-derived’ for derived metrics. The output for counters indicates number of block instances and number of block counter registers. The output for derived metrics prints the metrics expressions. +Examples: +``` +$ rocprof --list-basic +RPL: on '191018_014450' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +ROCProfiler: rc-file '/…./rpl_rc.xml' +Basic HW counters: + gpu-agent0 : GRBM_COUNT : Tie High - Count Number of Clocks + block GRBM has 2 counters + gpu-agent0 : GRBM_GUI_ACTIVE : The GUI is Active + block GRBM has 2 counters + . . . + gpu-agent0 : TCC_HIT[0-15] : Number of cache hits. + block TCC has 4 counters + gpu-agent0 : TCC_MISS[0-15] : Number of cache misses. UC reads count as misses. + block TCC has 4 counters + . . . + +$ rocprof --list-derived +RPL: on '191018_015911' from '/opt/rocm/rocprofiler' in '/home/evgeny/work/BUILD/0_MatrixTranspose' +ROCProfiler: rc-file '/home/evgeny/rpl_rc.xml' +Derived metrics: + gpu-agent0 : TCC_HIT_sum : Number of cache hits. Sum over TCC instances. + TCC_HIT_sum = sum(TCC_HIT,16) + gpu-agent0 : TCC_MISS_sum : Number of cache misses. Sum over TCC instances. + TCC_MISS_sum = sum(TCC_MISS,16) + gpu-agent0 : TCC_MC_RDREQ_sum : Number of 32-byte reads. Sum over TCC instaces. + TCC_MC_RDREQ_sum = sum(TCC_MC_RDREQ,16) + . . . +``` +##### 2.1.1.2. Metrics collecting +Counters and metrics accumulated per kernel can be collected using input file with a list of metrics, see an example in 2.1. +Currently profiling has limitation of serializing submitted kernels. +The number of counters which can be dumped by one run is limited by GPU HW by number of counter registers per block. The number of counters can be different for different blocks and can be queried, see 2.1.1.1. +###### 2.1.1.2.1. Blocks instancing +GPU blocks are implemented as several identical instances. To dump counters of specific instance square brackets can be used, see an example in 2.1. +The number of block instances can be queried, see 2.1.1.1. +###### 2.1.1.2.2. HW limitations +The number of counters which can be dumped by one run is limited by GPU HW by number of counter registers per block. The number of counters can be different for different blocks and can be queried, see 2.1.1.1. + - Metrics groups + +To dump a list of metrics exceeding HW limitations the metrics list can be split on groups. +The tool supports automatic splitting on optimal metric groups: +``` +$ rocprof -i input.txt ./MatrixTranspose +RPL: on '191018_032645' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +RPL: profiling './MatrixTranspose' +RPL: input file 'input.txt' +RPL: output dir '/tmp/rpl_data_191018_032645_12106' +RPL: result dir '/tmp/rpl_data_191018_032645_12106/input0_results_191018_032645' +ROCProfiler: rc-file '/…./rpl_rc.xml' +ROCProfiler: input from "/tmp/rpl_data_191018_032645_12106/input0.xml" + gpu_index = + kernel = + range = + 20 metrics + Wavefronts, VALUInsts, SALUInsts, SFetchInsts, FlatVMemInsts, LDSInsts, FlatLDSInsts, GDSInsts, VALUUtilization, FetchSize, WriteSize, L2CacheHit, VWriteInsts, GPUBusy, VALUBusy, SALUBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict, MemUnitBusy + 0 traces +Device name Ellesmere [Radeon RX 470/480/570/570X/580/580X] + +Input metrics out of HW limit. Proposed metrics group set: + group1: L2CacheHit VWriteInsts MemUnitStalled WriteUnitStalled MemUnitBusy FetchSize FlatVMemInsts LDSInsts VALUInsts SALUInsts SFetchInsts FlatLDSInsts GPUBusy Wavefronts + group2: WriteSize GDSInsts VALUUtilization VALUBusy SALUBusy LDSBankConflict + +ERROR: rocprofiler_open(), Construct(), Metrics list exceeds HW limits + +Aborted (core dumped) +Error found, profiling aborted. +``` + - Collecting with multiple runs + +To collect several metric groups a full application replay is used by defining several ‘pmc:’ lines in the input file, see 2.1. + +### 2.2. Application tracing +Supported application tracing includes runtime API and GPU activity tracing’ +Supported runtimes are: ROCr (HSA API) and HIP +Supported GPU activity: kernel execution, async memory copy, barrier packets. +The trace is generated in JSON format compatible with Chrome tracing. +The trace consists of several sections with timelines for API trace per thread and GPU activity. The timelines events show event name and parameters. +Supported options: ‘—hsa-trace’, ‘—hip-trace’, ‘—sys-trace’, where ‘sys trace’ is for HIP and HSA combined trace. +#### 2.2.1. HIP runtime trace +The trace is generated by option ‘—hip-trace’ and includes HIP API timelines and GPU activity at the runtime level. +#### 2.2.2. ROCr runtime trace +The trace is generated by option ‘—hsa-trace’ and includes ROCr API timelines and GPU activity at AQL queue level. Also, can provide counters per kernel. +#### 2.2.3. KFD driver trace +The trace is generated by option ‘—kfd-trace’ and includes KFD Thunk API timeline. +It is planned to add memory allocations/migration tracing. +#### 2.2.4. Code annotation +Support for application code annotation. +Start/stop API is supported to programmatically control the profiling. +A ‘roctx’ library provides annotation API. Annotation is visualized in JSON trace as a separate "Markers and Ranges" timeline section. +##### 2.2.4.1. Start/stop API +``` +// Tracing start API +void roctracer_start(); + +// Tracing stop API +void roctracer_stop(); +``` +##### 2.2.4.2. rocTX basic markers API +``` +// A marker created by given ASCII massage +void roctxMark(const char* message); + +// Returns the 0 based level of a nested range being started by given message associated to this range. +// A negative value is returned on the error. +int roctxRangePush(const char* message); + +// Marks the end of a nested range. +// Returns the 0 based level the range. +// A negative value is returned on the error. +int roctxRangePop(); +``` +### 2.3. Multiple GPUs profiling +The profiler supports multiple GPU’s profiling and provide GPI id for counters and kernels data in CSV output file. Also, GPU id is indicating for respective GPU activity timeline in JSON trace. +## 3. Profiling control +Profiling can be controlled by specifying a profiling scope, by filtering trace events and specifying interesting time intervals. +### 3.1. Profiling scope +Counters profiling scope can be specified by GPU id list, kernel name substrings list and dispatch range. +Supported range formats examples: "3:9", "3:", "3". You can see an example of input file in 2.1. +#### 3.2. Tracing control +Tracing can be filtered by events names using profiler input file and by enabling interesting time intervals by command line option. +#### 3.2.1. Filtering traced APIs +A list of traced API names can be specified in profiler input file. +An example of input file line for ROCr runtime trace (HAS API): +``` +hsa: hsa_queue_create hsa_amd_memory_pool_allocate +``` +#### 3.2.2. Tracing time period +Trace can be dumped periodically with initial delay, dumping period length and rate: +``` +--trace-period +``` +### 3.3. Concurrent kernels +Currently concurrent kernels profiling is not supported which is a planned feature. Kernels are serialized. +### 3.4. Multi-processes profiling +Multi-processes profiling is not currently supported. +### 3.5. Errors logging +Profiler errors are logged to global logs: +``` +/tmp/aql_profile_log.txt +/tmp/rocprofiler_log.txt +/tmp/roctracer_log.txt +``` +## 4. 3rd party visualization tools +‘rocprof’ is producing JSON trace compatible with Chrome Tracing, which is an internal trace visualization tool in Google Chrome. +### 4.1. Chrome tracing +Good review can be found by the link: https://aras-p.info/blog/2017/01/23/Chrome-Tracing-as-Profiler-Frontend/ +## 5. Command line options +The command line options can be printed with option ‘-h’: +``` +$ rocprof -h +RPL: on '191018_023018' from '/opt/rocm/rocprofiler' in '/…./MatrixTranspose' +ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package. +Full path: /opt/rocm/rocprofiler/bin/rocprof +Metrics definition: /opt/rocm/rocprofiler/lib/metrics.xml + +Usage: + rocprof [-h] [--list-basic] [--list-derived] [-i ] [-o ] + +Options: + -h - this help + --verbose - verbose mode, dumping all base counters used in the input metrics + --list-basic - to print the list of basic HW counters + --list-derived - to print the list of derived metrics with formulas + --cmd-qts - quoting profiled cmd-line [on] + + -i <.txt|.xml file> - input file + Input file .txt format, automatically rerun application for every pmc line: + + # Perf counters group 1 + pmc : Wavefronts VALUInsts SALUInsts SFetchInsts FlatVMemInsts LDSInsts FlatLDSInsts GDSInsts VALUUtilization FetchSize + # Perf counters group 2 + pmc : WriteSize L2CacheHit + # Filter by dispatches range, GPU index and kernel names + # supported range formats: "3:9", "3:", "3" + range: 1 : 4 + gpu: 0 1 2 3 + kernel: simple Pass1 simpleConvolutionPass2 + + Input file .xml format, for single profiling run: + + # Metrics list definition, also the form ":" can be used + # All defined metrics can be found in the 'metrics.xml' + # There are basic metrics for raw HW counters and high-level metrics for derived counters + + + # Filter by dispatches range, GPU index and kernel names + + + -o - output CSV file [.csv] + -d - directory where profiler store profiling data including traces [/tmp] + The data directory is renoving autonatically if the directory is matching the temporary one, which is the default. + -t - to change the temporary directory [/tmp] + By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory. + + --basenames - to turn on/off truncating of the kernel full function names till the base ones [off] + --timestamp - to turn on/off the kernel disoatches timestamps, dispatch/begin/end/complete [off] + --ctx-wait - to wait for outstanding contexts on profiler exit [on] + --ctx-limit - maximum number of outstanding contexts [0 - unlimited] + --heartbeat - to print progress heartbeats [0 - disabled] + --obj-tracking - to turn on/off kernels code objects tracking [off] + + --stats - generating kernel execution stats, file .stats.csv + + --roctx-trace - to enable rocTX application code annotation trace, "Markers and Ranges" JSON trace section. + --sys-trace - to trace HIP/HSA APIs and GPU activity, generates stats and JSON trace chrome-tracing compatible + --hip-trace - to trace HIP, generates API execution stats and JSON file chrome-tracing compatible + --hsa-trace - to trace HSA, generates API execution stats and JSON file chrome-tracing compatible + --kfd-trace - to trace KFD, generates API execution stats and JSON file chrome-tracing compatible + Generated files: ._stats.txt .json + Traced API list can be set by input .txt or .xml files. + Input .txt: + hsa: hsa_queue_create hsa_amd_memory_pool_allocate + Input .xml: + + + + + + --trace-start - to enable tracing on start [on] + --trace-period - to enable trace with initial delay, with periodic sample length and rate + Supported time formats: + +Configuration file: + You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/evgeny: + First the configuration file is looking in the current directory, then in your home, and then in the package directory. + Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat', 'obj-tracking'. + An example of 'rpl_rc.xml': + +``` +## 6. Publicly available counters and metrics +The following counters are publicly available for commercially available VEGA10/20 GPUs. + +Counters: +``` +• GRBM_COUNT : Tie High - Count Number of Clocks +• GRBM_GUI_ACTIVE : The GUI is Active +• SQ_WAVES : Count number of waves sent to SQs. (per-simd, emulated, global) +• SQ_INSTS_VALU : Number of VALU instructions issued. (per-simd, emulated) +• SQ_INSTS_VMEM_WR : Number of VMEM write instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_VMEM_RD : Number of VMEM read instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_SALU : Number of SALU instructions issued. (per-simd, emulated) +• SQ_INSTS_SMEM : Number of SMEM instructions issued. (per-simd, emulated) +• SQ_INSTS_FLAT : Number of FLAT instructions issued. (per-simd, emulated) +• SQ_INSTS_FLAT_LDS_ONLY : Number of FLAT instructions issued that read/wrote only from/to LDS (only works if EARLY_TA_DONE is enabled). (per-simd, emulated) +• SQ_INSTS_LDS : Number of LDS instructions issued (including FLAT). (per-simd, emulated) +• SQ_INSTS_GDS : Number of GDS instructions issued. (per-simd, emulated) +• SQ_WAIT_INST_LDS : Number of wave-cycles spent waiting for LDS instruction issue. In units of 4 cycles. (per-simd, nondeterministic) +• SQ_ACTIVE_INST_VALU : regspec 71? Number of cycles the SQ instruction arbiter is working on a VALU instruction. (per-simd, nondeterministic) +• SQ_INST_CYCLES_SALU : Number of cycles needed to execute non-memory read scalar operations. (per-simd, emulated) +• SQ_THREAD_CYCLES_VALU : Number of thread-cycles used to execute VALU operations (similar to INST_CYCLES_VALU but multiplied by # of active threads). (per-simd) +• SQ_LDS_BANK_CONFLICT : Number of cycles LDS is stalled by bank conflicts. (emulated) +• TA_TA_BUSY[0-15] : TA block is busy. Perf_Windowing not supported for this counter. +• TA_FLAT_READ_WAVEFRONTS[0-15] : Number of flat opcode reads processed by the TA. +• TA_FLAT_WRITE_WAVEFRONTS[0-15] : Number of flat opcode writes processed by the TA. +• TCC_HIT[0-15] : Number of cache hits. +• TCC_MISS[0-15] : Number of cache misses. UC reads count as misses. +• TCC_EA_WRREQ[0-15] : Number of transactions (either 32-byte or 64-byte) going over the TC_EA_wrreq interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. +• TCC_EA_WRREQ_64B[0-15] : Number of 64-byte transactions going (64-byte write or CMPSWAP) over the TC_EA_wrreq interface. +• TCC_EA_WRREQ_STALL[0-15] : Number of cycles a write request was stalled. +• TCC_EA_RDREQ[0-15] : Number of TCC/EA read requests (either 32-byte or 64-byte) +• TCC_EA_RDREQ_32B[0-15] : Number of 32-byte TCC/EA read requests +• TCP_TCP_TA_DATA_STALL_CYCLES[0-15] : TCP stalls TA data interface. Now Windowed. +``` + +The following derived metrics have been defined and the profiler metrics XML specification can be found at: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/test/tool/metrics.xml. + +Metrics: +``` +• TA_BUSY_avr : TA block is busy. Average over TA instances. +• TA_BUSY_max : TA block is busy. Max over TA instances. +• TA_BUSY_min : TA block is busy. Min over TA instances. +• TA_FLAT_READ_WAVEFRONTS_sum : Number of flat opcode reads processed by the TA. Sum over TA instances. +• TA_FLAT_WRITE_WAVEFRONTS_sum : Number of flat opcode writes processed by the TA. Sum over TA instances. +• TCC_HIT_sum : Number of cache hits. Sum over TCC instances. +• TCC_MISS_sum : Number of cache misses. Sum over TCC instances. +• TCC_EA_RDREQ_32B_sum : Number of 32-byte TCC/EA read requests. Sum over TCC instances. +• TCC_EA_RDREQ_sum : Number of TCC/EA read requests (either 32-byte or 64-byte). Sum over TCC instances. +• TCC_EA_WRREQ_sum : Number of transactions (either 32-byte or 64-byte) going over the TC_EA_wrreq interface. Sum over TCC instances. +• TCC_EA_WRREQ_64B_sum : Number of 64-byte transactions going (64-byte write or CMPSWAP) over the TC_EA_wrreq interface. Sum over TCC instances. +• TCC_WRREQ_STALL_max : Number of cycles a write request was stalled. Max over TCC instances. +• TCC_MC_WRREQ_sum : Number of 32-byte effective writes. Sum over TCC instaces. +• FETCH_SIZE : The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• WRITE_SIZE : The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• GPUBusy : The percentage of time GPU was busy. +• Wavefronts : Total wavefronts. +• VALUInsts : The average number of vector ALU instructions executed per work-item (affected by flow control). +• SALUInsts : The average number of scalar ALU instructions executed per work-item (affected by flow control). +• VFetchInsts : The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory. +• SFetchInsts : The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control). +• VWriteInsts : The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory. +• FlatVMemInsts : The average number of FLAT instructions that read from or write to the video memory executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch. +• LDSInsts : The average number of LDS read or LDS write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS. +• FlatLDSInsts : The average number of FLAT instructions that read or write to LDS executed per work item (affected by flow control). +• GDSInsts : The average number of GDS read or GDS write instructions executed per work item (affected by flow control). +• VALUUtilization : The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). +• VALUBusy : The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). +• SALUBusy : The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). +• Mem32Bwrites : +• FetchSize : The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• WriteSize : The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. +• L2CacheHit : The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal). +• MemUnitBusy : The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). +• MemUnitStalled : The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). +• WriteUnitStalled : The percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad). +• ALUStalledByLDS : The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). +• LDSBankConflict : The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). +``` diff --git a/doc/rocprofiler_spec.md b/doc/rocprofiler_spec.md new file mode 100644 index 00000000..975d58ca --- /dev/null +++ b/doc/rocprofiler_spec.md @@ -0,0 +1,837 @@ +# ROC Profiler Library Specification +ROC Profiler API version 7 + +## 1. High level overview +``` +The goal of the implementation is to provide a HW specific low-level performance analysis +interface for profiling of GPU compute applications. The profiling includes HW performance +counters with complex performance metrics and HW traces. The implementation distinguishes +two profiling features, metrics and traces. HW performance counters are treated as the basic +metrics and the formulas can be defined for derived complex metrics. +The library can be loaded by HSA runtime as a tool plugin and it can be loaded by higher +level HW independent performance analysis API like PAPI. +The library has C API and is based on AQLprofile AMD specific HSA extension. + + 1. The library provides methods to query the list of supported HW features. + 2. The library provides profiling APIs to start, stop, read metrics results and tracing + data. + 3. The library provides a intercepting API for collecting per-kernel profiling data for + the kernels + dispatched to HSA AQL queues. + 4. The library provides mechanism to load profiling tool library plugin by env variable + ROCP_TOOL_LIB. + 5. The library is responsible for allocation of the buffers for profiling and notifying + about output data buffer overflow for traces. + 6. The library is implemented based on AMD specific AQLprofile HSA extension. + 7. The library implementation is abstracted from the specific GFXIP. + 8. The library implementation is extensible: + - Easy adding of counters and metrics + - Counters enumeration + - Counters and metrics can be dynamically configured using XML configuration files with + counters and metrics tables: + o Counters table entry, basic metric: counter name, block name, event id + o Complex metrics table entry: metric name, an expression for calculation the metric + from the counters + +Metrics XML file example: + + + + . . . + + + + . . . + + + + + +``` +## 2. Environment +``` +* HSA_TOOLS_LIB - required to be set to the name of rocprofiler library to be loaded by +HSA runtime +* ROCP_METRICS - path to the metrics XML file +* ROCP_TOOL_LIB - path to profiling tool library loaded by ROC Profiler +* ROCP_HSA_INTERCEPT - if set then HSA dispatches intercepting is enabled +``` +## 3. General API +### 3.1. Description +``` +The library supports method for getting the error number and error string of the last +failed library API call. +To check the conformance of used library APi header and the library binary the version +macros and API methods can be used. + +Returning the error and error string methods: +- rocprofiler_error_string - method for returning the error string + +Library version: +- ROCPROFILER_VERSION_MAJOR - API major version macro +- ROCPROFILER_VERSION_MINOR - API minor version macro +- rocprofiler_version_major - library major version +- rocprofiler_version_minor - library minor version +``` +### 3.2. Returning the error and error string methods +``` +const char* rocprofiler_error_string(); +``` +### 3.3. Library version +``` +The library provides back compatibility if the library major version is less or equal +then the API major version macro. + +API version macros defined in the library API header 'rocprofiler.h': + +ROCPROFILER_VERSION_MAJOR +ROCPROFILER_VERSION_MINOR + +Methods to check library major and minor venison: + +uint32_t rocprofiler_major_version(); +uint32_t rocprofiler_minor_version(); +``` +## 4. Backend API +### 4.1. Description +``` +The library provides the methods to open/close profiling context, to start, stop and read +HW performance counters and traces, to intercept kernel dispatches to collect per-kernel +profiling data. Also the library provides methods to calculate complex performance metrics +and to query the list of available metrics. The library distinguishes two profiling features, +metrics and traces, where HW performance counters are treated as the basic metrics. To check +if there was an error the library methods return HSA standard status code. +For a given context the profiling can be started/stopped and counters sampled in standalone +mode or profiling can be initiated by intercepting the kernel dispatches with registering +a dispatch callback. +For counters sampling, which is the usage model of higher level APIs like PAPI, +the start/stop/read APIs should be used. +For collecting per-kernel data for the submitted to HSA queues kernels the dispatch callback +API should be used. +The library provides back compatibility if the library major version is less or equal. + +Returned API status: +- hsa_status_t - HSA status codes are used from hsa.h header + +Loading and Configuring, loadable plugin on-load/unload methods: +- rocprofiler_settings_t – global properties +- OnLoadTool +- OnLoadToolProp +- OnUnloadTool + +Info API: +- rocprofiler_info_kind_t - profiling info kind +- rocprofiler_info_query_t - profiling info query +- rocprofiler_info_data_t - profiling info data +- rocprofiler_get_info - return the info for a given info kind +- rocprofiler_iterote_inf_ - iterate over the info for a given info kind +- rocprofiler_query_info - iterate over the info for a given info query + +Context API: +- rocprofiler_t - profiling context handle +- rocprofiler_feature_kind_t - profiling feature kind +- rocprofiler_feature_parameter_t - profiling feature parameter +- rocprofiler_data_kind_t - profiling data kind +- rocprofiler_data_t - profiling data +- rocprofiler_feature_t - profiling feature +- rocprofiler_mode_t - profiling modes +- rocprofiler_properties_t - profiler properties +- rocprofiler_open - open new profiling context +- rocprofiler_close - close profiling context and release all allocated resources +- rocprofiler_group_count - return profiling groups count +- rocprofiler_get_group - return profiling group for a given index +- rocprofiler_get_metrics - method for calculating the metrics data +- rocprofiler_iterate_trace_data - method for iterating output trace data instances +- rocprofiler_time_id_t - supported time value ID enumeration +- rocprofiler_get_time – return time for a given time ID and profiling timestamp value + +Sampling API: +- rocprofiler_start - start profiling +- rocprofiler_stop - stop profiling +- rocprofiler_read - read profiling data to the profiling features objects +- rocprofiler_get_data - wait for profiling data + Group versions of start/stop/read/get_data methods: + o rocprofiler_group_start + o rocprofiler_group_stop + o rocprofiler_group_read + o rocprofiler_group_get_data + +Intercepting API: +- rocprofiler_callback_t - profiling callback type +- rocprofiler_callback_data_t - profiling callback data type +- rocprofiler_dispatch_record_t – dispatch record +- rocprofiler_queue_callbacks_t – queue callbacks, dispatch/destroy +- rocprofiler_set_queue_callbacks - set queue kernel dispatch and queue destroy callbacks +- rocprofiler_remove_queue_callbacks - remove queue callbacks + +Context pool API: +- rocprofiler_pool_t – context pool handle +- rocprofiler_pool_entry_t – context pool entry +- rocprofiler_pool_properties_t – context pool properties +- rocprofiler_pool_handler_t – context pool completion handler +- rocprofiler_pool_open - context pool open +- rocprofiler_pool_close - context pool close +- rocprofiler_pool_fetch – fetch and empty context entry to pool +- rocprofiler_pool_release – release a context entry +- rocprofiler_pool_iterate – iterated fetched context entries +- rocprofiler_pool_flush – flush completed context entries +``` +### 4.2. Loading and Configuring +``` +Loading and Configuring +The profiling properties can be set by profiler plugin on loading by ROC runtime. +The profiler library plugin can be set by ROCP_TOOL_LIB env var. + +Global properties: + +typedef struct { + uint32_t intercept_mode; + uint64_t timeout; + uint32_t timestamp_on; +} rocprofiler_settings_t; + +On load/unload methods defined in profiling tool library loaded by ROCP_TOOL_LIB env var: +extern "C" void OnLoadTool(); +extern "C" void OnLoadToolProp(rocprofiler_settings_t* settings); +extern "C" void OnUnloadTool(); + +``` +### 4.3. Info API +``` +The profiling metrics are defined by name and the traces are defined by name and parameters. +All supported features can be iterated using 'iterate_info/query_info' methods. The counter +names are defined in counters table configuration file, each counter has a unique name and +defined by block name and event id. The traces and trace parameters names are same as in +the hardware documentation and the parameters codes are rocprofiler_feature_parameter_t values, +see below in the "Context API" section. + +Profiling info kind: + +typedef enum { + ROCPROFILER_INFO_KIND_METRIC = 0, // metric info + ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metrics count + ROCPROFILER_INFO_KIND_TRACE = 2, // trace info + ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // traces count +} rocprofiler_info_kind_t; + +Profiling info data: + +typedef struct { + rocprofiler_info_kind_t kind; // info data kind + union { + struct { + const char* name; // metric name + uint32_t instances; // instances number + const char* expr; // metric expression, NULL for basic counters + const char* description; // metric description + const char* block_name; // block name + uint32_t block_counters; // number of block counters + } metric; + struct { + const char* name; // trace name + const char* description; // trace description + uint32_t parameter_count; // supported by the trace number + // parameters + } trace; + }; +} rocprofiler_info_data_t; + +Return info for a given info kind: + +has_status_t rocprofiler_get_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + void *data); // data passed to callback + +Iterate over the info for a given info kind, and invoke an application-defined callback on +every iteration: + +has_status_t rocprofiler_iterate_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); + +Iterate over the info for a given info query, and invoke an application-defined callback on +every iteration. The query +fields set to NULL define the query wildcard: + +has_status_t rocprofiler_query_info( + const hsa_agent_t* agent, // [in] GPU handle, NULL for all + // GPU agents + rocprofiler info_kind_t kind, // kind of iterated info + rocprofiler_info_data_t query, // info query + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); // data passed to callback +``` +### 4.4. Context API +``` +Profiling context is accumulating all profiling information including profiling features +which carry profiling data, required buffers for profiling command packets and output data. +The context can be created and deleted by the library open/close methods. By deleting +the context all accumulated by the library resources associated with this context will be +released. If it is required more than one run to collect all requested counters data then +data for all profiling groups should be collected and then the metrics can be calculated by +loading the saved groups' data to the profiling context. Saving and loading of the groups +data is responsibility of the tool. The groups are automatically identified on the profiling +context open and there is API to access them, see the "Profiling groups" section below. + +Profiling context handle: + +typename rocprofiler_t; + +Profiling feature kind: + +typedef enum { + ROCPROFILER_FEATURE_KIND_METRIC = 0, // metric + ROCPROFILER_FEATURE_KIND_TRACE = 1 // trace +} rocprofiler_feature_kind_t; + +Profiling feature parameter: + +typedef hsa_ven_amd_aqlprofile_parameter_t rocprofiler_feature_parameter_t; + +Profiling data kind: + +typedef enum { + ROCPROFILER_DATA_KIND_UNINIT = 0, // data uninitialized + ROCPROFILER_DATA_KIND_INT32 = 1, // 32bit integer + ROCPROFILER_DATA_KIND_INT64 = 2, // 64bit integer + ROCPROFILER_DATA_KIND_FLOAT = 3, // float single-precision result + ROCPROFILER_DATA_KIND_DOUBLE = 4, // float double-precision result + ROCPROFILER_DATA_KIND_BYTES = 5 // trace output as a bytes array +} rocprofiler_data_kind_t; + + +Profiling data: + +typedef struct { + rocprofiler_data_kind_t kind; // result kind + union { + uint32_t result_int32; // 32bit integer result + uint64_t result_int64; // 64bit integer result + float result_float; // float single-precision result + double result_double; // float double-precision result + typedef struct { + void* ptr; // pointer + uint32_t size; // byte size + uint32_t instances; // number of trace instances + } result_bytes; // data by ptr and byte size + }; +} rocprofiler_data_t; + +Profiling feature: + +typedef struct { + rocprofiler_feature_kind_t type; // feature type + const char* name; // feature name + const rocprofiler_feature_parameter_t* parameters; // feature parameters + uint32_t parameter_count; // feature parameter count + rocprofiler_data_t* data; // profiling data +} rocprofiler_feature_t; + +Profiling mode masks: +There are several modes which can be specified for the profiling context. +STANDALONE mode can be used for the counters sampling in another then application context +to support statistical system wide profiling. In this mode the profiling context supports +its own queue which can be created on the context open if the CREATEQUEUE mode also specified. +See also "Profiler properties" section below for the standalone mode queue properties. +The profiler supports several profiling groups for collecting profiling data in several +runs and 'SINGLEGROUP' mode allows only one group and the context open will fail if more +groups are needed. + +typedef enum { + ROCPROFILER_MODE_STANDALONE = 1, // standalone mode when ROC profiler + // supports own AQL queue + ROCPROFILER_MODE_CREATEQUEUE = 2, // profiler creates queue in STANDALONE mode + ROCPROFILER_MODE_SINGLEGROUP = 4 // profiler allows one group only and fails + // if more groups are needed +} rocprofiler_mode_t; + +Context data readiness callback: + +typedef void (*rocprofiler_context_callback_t)( + rocprofiler_group_t* group, // profiling group + void* arg); // callback arg + +Profiler properties: +There are several properties which can be specified for the context. A callback can be +registered which will be called when the context data is ready. In standalone profiling mode +'ROCPROFILER_MODE_STANDALONE' the context supports its own queue and the queue can be set by +the property 'queue' or a queue will be created with the specified depth 'queue_depth' if mode +'ROCPROFILER_MODE_CREATEQUEUE' also specified. + +typedef struct { + rocprofiler_context_callback_t callback; // callback on the context data readiness + void* callback_arg; // callback arg + has_queue_t* queue; // HSA queue for standalone mode + uint32_t queue_depth; // created queue depth,for create-queue mode +} rocprofiler_properties_t; + +Open/close profiling context: + +hsa_status_t rocprofiler_open( + hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in/out] profiling feature array + uint32_t feature_count, // profiling feature count + rocprofiler_t** context, // [out] profiling context handle + uint32_t mode, // profiling mode mask + rocprofiler_properties_t* properties); // profiler properties + +hsa_status_t rocprofiler_close( + rocprofiler_t* context); // [in] profiling context + +Profiling groups: +The profiler on the context open automatically identifies a required number of the application +runs to collect all data needed for all specified metrics and creates a metric group per each +run. Data for all profiling groups should be collected and then the metrics can be calculated +by loading the saved groups' data to the profiling context. Saving and loading of he groups +data is responsibility of the tool. + +typedef struct { + uint32_t index; // profiling group index + rocprofiler_feature_t** features; // profiling features array + uint32_t feature_count; // profiling feature count + rocprofiler_t* context; // profiling context handle +} rocprofiler_group_t; + +Return profiling groups count: + +hsa_status_t rocprofiler_group_count( + rocprofiler_t* context); // [in/out] profiling context + uint32* count); // [out] profiling groups count + +Return the profiling group for a given index: + +hsa_status_t rocprofiler_get_group( + rocprofiler_t* context, // [in/out] profiling context, + // will be returned as + // a part of the group structure + uint32_t index, // [in] group index + rocprofiler_group_t* group); // [out] profiling group + +Calculate metrics data. The data will be stored to the registered profiling features data fields: +After all profiling context data is ready the registered metrics can be calculated. The context +data readiness can be checked by 'get_data' API or using the context callback. + +hsa_status_t rocprofiler_get_metrics( + rocprofiler_t* context); // [in/out] profiling context + +Method for iterating trace data instances: +Trace data can have several instance, for example, one instance per Shader Engine. + +hsa_status_t rocprofiler_iterate_trace_data( + const rocprofiler_t* contex, // [in] context object + hsa_ven_amd_aqlprofile_data_callback_t callback, // [in] callback to iterate + // the output data + void* callback_data); // [in/out] passed to callback data + +Converting of profiling timestamp to time value for suported time ID. +Supported time value ID enumeration: +typedef enum { + ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time + ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 1, // Linux monotonic clock time +} rocprofiler_time_id_t; + +Method for converting of profiling timestamp to time value for a given time ID: +hsa_status_t rocprofiler_get_time( + rocprofiler_time_id_t time_id, // identifier of the particular + // time to convert the timestamp + uint64_t timestamp, // profiling timestamp + uint64_t* value_ns); // [out] returned time ‘ns’ value +``` +### 4.5. Sampling API +``` +The API supports the counters sampling usage model with start/read/stop methods and also lets +to wait for the profiling data in the intercepting usage model with get_data method. + +Start/stop/read methods: + +hsa_status_t rocprofiler_start( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +hsa_status_t rocprofiler_stop( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +hsa_status_t rocprofiler_read( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +Wait for profiling data: + +hsa_status_t rocprofiler_get_data( + rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index = 0); // group index + +Group versions of the above start/stop/read/get_data methods: + +hsa_status_t rocprofiler_group_start( + rocprofiler_group_t* group); // [in/out] profiling group + +hsa_status_t rocprofiler_group_stop( + rocprofiler_group_t* group); // [in/out] profiling group + + +hsa_status_t rocprofiler_group_read( + rocprofiler_group_t* group); // [in/out] profiling group + + +hsa_status_t rocprofiler_group_get_data( + rocprofiler_group_t* group); // [in/out] profiling group +``` +### 4.6. Intercepting API +``` +The library provides a callback API for enabling profiling for the kernels dispatched to +HSA AQL queues. The API enables per-kernel profiling data collection. +Currently implemented the option with serializing the kernels execution. + +ROC profiler callback type: + +hsa_status_t (*rocprofiler_callback_t)( + const rocprofiler_callback_data_t* callback_data, // callback data passed by HSA runtime + void* user_data, // [in/out] user data passed + // to the callback + rocprofiler_group** group); // [out] returned profiling group + +Profiling callback data: + +typedef struct { + uint64_t dispatch; // dispatch timestamp + uint64_t begin; // begin timestamp + uint64_t end; // end timestamp + uint64_t complete; // completion signal timestamp +} rocprofiler_dispatch_record_t; + +typedef struct { + hsa_agent_t agent; // GPU agent handle + uint32_t agent_index; // GPU index + const hsa_queue_t* queue; // HSA queue + uint64_t queue_index; // Index in the queue + const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet + const char* kernel_name; // Kernel name + const rocprofiler_dispatch_record_t* record; // Dispatch record +} rocprofiler_callback_data_t; + +Queue callbacks: + +typedef struct { + rocprofiler_callback_t dispatch; // kernel dispatch callback + hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // queue destroy callback +} rocprofiler_queue_callbacks_t; + +Adding/removing kernel dispatch and queue destroy callbacks + +hsa_status_t rocprofiler_set_intercepting( + rocprofiler_intercepting_t callbacks, // intercepting callbacks + void* data); // [in/out] passed callbacks data + +hsa_status_t rocprofiler_remove_intercepting(); +``` +### 4.7. Profiling Context Pools +``` +The API provide capability to create a context pool for a given agent and a set of features, to fetch/release a context entry, to register a callback for pool’s contexts completion. +Profiling pool handle: +typename rocprofiler_pool_t; +Profiling pool entry: +typedef struct { + rocprofiler_t* context; // context object + void* payload; // payload data object +} rocprofiler_pool_entry_t; + +Profiling handler, calling on profiling completion: +typedef bool (*rocprofiler_pool_handler_t)(const rocprofiler_pool_entry_t* entry, void* arg); + +Profiling properties: +typedef struct { + uint32_t num_entries; // pool size entries + uint32_t payload_bytes; // payload size bytes + rocprofiler_pool_handler_t handler; // handler on context completion + void* handler_arg; // the handler arg +} rocprofiler_pool_properties_t; + +Open profiling pool: +hsa_status_t rocprofiler_pool_open( + hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in] profiling features array + uint32_t feature_count, // profiling info count + rocprofiler_pool_t** pool, // [out] context object + uint32_t mode, // profiling mode mask + rocprofiler_pool_properties_t*); // pool properties + +Close profiling pool: +hsa_status_t rocprofiler_pool_close( + rocprofiler_pool_t* pool); // profiling pool handle + +Fetch profiling pool entry: +hsa_status_t rocprofiler_pool_fetch( + rocprofiler_pool_t* pool, // profiling pool handle + rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry + +Release profiling pool entry: +hsa_status_t rocprofiler_pool_release( + rocprofiler_pool_entry_t* entry); // released profiling pool entry + +Iterate fetched profiling pool entries: +hsa_status_t rocprofiler_pool_iterate( + rocprofiler_pool_t* pool, // profiling pool handle + hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), + // callback + void *data); // [in/out] data passed to callback + +Flush completed entries in profiling pool: +hsa_status_t rocprofiler_pool_flush( + rocprofiler_pool_t* pool); // profiling pool handle +``` +## 5. Application code examples +### 5.1. Querying available metrics +``` +Info data callback: + + hsa_status_t info_data_callback(const rocprofiler_info_data_t info, void *data) { + switch (info.kind) { + case ROCPROFILER_INFO_KIND_METRIC: { + if (info.metric.expr != NULL) { + fprintf(stdout, "Derived counter: gpu-agent%d : %s : %s\n", + info.agent_index, info.metric.name, info.metric.description); + fprintf(stdout, " %s = %s\n", info.metric.name, info.metric.expr); + } else { + fprintf(stdout, "Basic counter: gpu-agent%d : %s", + info.agent_index, info.metric.name); + if (info.metric.instances > 1) { + fprintf(stdout, "[0-%u]", info.metric.instances - 1); + } + fprintf(stdout, " : %s\n", info.metric.description); + fprintf(stdout, " block %s has %u counters\n", + info.metric.block_name, info.metric.block_counters); + } + fflush(stdout); + break; + } + default: + printf("wrong info kind %u\n", kind); + return HSA_STATUS_ERROR; + } + return HSA_STATUS_SUCCESS; + } + +Printing all available metrics: + + hsa_status_t status = rocprofiler_iterate_info( + agent, + ROCPROFILER_INFO_KIND_METRIC, + info_data_callback, + NULL); + +``` +### 5.2. Profiling code example +``` +Profiling of L1 miss ratio, average memory bandwidth. +In the example below rocprofiler_group_get_data group APIs are used for the purpose of a usage +example but in SINGLEGROUP mode when only one group is allowed the context handle itself can be +saved and then direct context method rocprofiler_get_data with default group index equal to 0 +can be used. + +hsa_status_t dispatch_callback( + const rocprofiler_callback_data_t* callback_data, + void* user_data, + rocprofiler_group_t* group) +{ + hsa_status_t status = HSA_STATUS_SUCCESS; + // Profiling context + rocprofiler_t* context; + // Profiling info objects + rocprofiler_feature_t features* = new rocprofiler_feature_t[2]; + // Tracing parameters + rocprofiler_feature_parameter_t* parameters = new rocprofiler_feature_parameter_t[2]; + + // Setting profiling features + features[0].type = ROCPROFILER_METRIC; + features[0].name = "L1_MISS_RATIO"; + features[1].type = ROCPROFILER_METRIC; + features[1].name = "DRAM_BANDWIDTH"; + + // Creating profiling context + status = rocprofiler_open(callback_data->dispatch.agent, features, 2, &context, + ROCPROFILER_MODE_SINGLEGROUP, NULL); + + + // Get the profiling group + // For general case with many groups there is rocprofiler_group_count() API + const uint32_t group_index = 0 + status = rocprofiler_get_group(context, group_index, group); + + + // In SINGLEGROUP mode the context handle itself can be saved, because there is just one group + + + return status; +} + +Profiling tool constructor is adding the dispatch callback: + +void profiling_libary_constructor() { + // Defining callback data, no data in this simple example + void* callback_data = NULL; + + // Adding observers + hsa_sttaus_t status = rocprofiler_add_dispatch_callback(dispatch_callback, callback_data); + + + // Dispatching profiled kernel + +} + +void profiling_libary_destructor() { + > { + // In SINGLEGROUP mode the rocprofiler_get_group() method with default zero group + // index can be used, if context handle would be saved + status = rocprofiler_group_get_data(entry->group); + + status = rocprofiler_get_metrics(entry->group->context); + + status = rocprofiler_close(entry->group->context); + + + dispatch_data, entry->features, entry->features_count)>; + } +} +``` +### 5.3. Option to use completion callback +``` +Creating profiling context with completion callback: + . . . + rocprofiler_properties_t properties = {}; + properties.callback = completion_callback; + properties.callback_arg = NULL; // no args defined + status = rocprofiler_open(agent, features, 3, &context, + ROCPROFILER_MODE_SINGLEGROUP, properties); + + . . . + +Definition of completion callback: + +void completion_callback(profiler_group_t group, void* arg) { + + hsa_status_t status = rocprofiler_close(group.context); + +} +``` +### 5.4. Option to Use Context Pool +``` +Code example of context pool usage. +Creating profiling contexts pool: + . . . + rocprofiler_pool_properties_t properties{}; + properties.num_entries = 100; + properties.payload_bytes = sizeof(context_entry_t); + properties.handler = context_handler; + properties.handler_arg = handler_arg; + status = rocprofiler_pool_open(agent, features, 3, &context, + ROCPROFILER_MODE_SINGLEGROUP, properties); + + . . . + +Fetching a context entry: + rocprofiler_pool_entry_t pool_entry{}; + status = rocprofiler_pool_fetch(pool, &pool_entry); + + // Profiling context entry + rocprofiler_t* context = pool_entry.context; + context_entry_t* entry = reinterpret_cast + (pool_entry.payload); +``` +### 5.5. Standalone Sampling Usage Code Example +``` +The profiling metrics are being read from separate standalone queue other than the application kernels are submitted to. +To enable the sampling mode, the profiling mode in all user queues should be enabled. It can be done by loading ROC-profiler +library to HSA runtime using the environment variable HSA_TOOLS_LIB for all shell sessions. + // Sampling rate + uint32_t sampling_rate = ; + // Sampling count + uint32_t sampling_count = ; + // HSA status + hsa_status_t status = HSA_STATUS_ERROR; + // HSA agent + hsa_agent_t agent; + // Profiling context + rocprofiler_t* context = NULL; + // Profiling properties + rocprofiler_properties_t properties; + + // Getting HSA agent + + + // Profiling feature objects + const unsigned feature_count = 2; + rocprofiler_feature_t feature[feature_count]; + + // Counters and metrics + feature[0].kind = ROCPROFILER_FEATURE_KIND_METRIC; + feature[0].name = "GPUBusy"; + feature[1].kind = ROCPROFILER_FEATURE_KIND_METRIC; + feature[1].name = "SQ_WAVES"; + + // Creating profiling context with standalone queue + properties = {}; + properties.queue_depth = 128; + status = rocprofiler_open(agent, feature, feature_count, &context, + ROCPROFILER_MODE_STANDALONE| ROCPROFILER_MODE_CREATEQUEUE| + ROCPROFILER_MODE_SINGLEGROUP, &properties); + + + // Start counters and sample them in the loop with the sampling rate + status = rocprofiler_start(context, 0); + + + for (unsigned ind = 0; ind < sampling_count; ++ind) { + sleep(sampling_rate); + status = rocprofiler_read(context, 0); + + status = rocprofiler_get_data(context, 0); + + status = rocprofiler_get_metrics(context); + + print_results(feature, feature_count); + } + + // Stop counters + status = rocprofiler_stop(context, group_n); + + + // Finishing cleanup + // Deleting profiling context will delete all allocated resources + status = rocprofiler_close(context); + +``` +### 5.6. Printing Out Profiling Results +``` +Below is a code example for printing out the profiling results from profiling features array: +void print_results(rocprofiler_feature_t* feature, uint32_t feature_count) { + for (rocprofiler_feature_t* p = feature; p < feature + feature_count; ++p) + { + std::cout << (p - feature) << ": " << p->name; + switch (p->data.kind) { + case ROCPROFILER_DATA_KIND_INT64: + std::cout << " result_int64 (" << p->data.result_int64 << ")" + << std::endl; + break; + + case ROCPROFILER_DATA_KIND_BYTES: { + std::cout << " result_bytes ptr(" << p->data.result_bytes.ptr << + ") " << " size(" << p->data.result_bytes.size << ")" + << " instance_count(" << p->data.result_bytes.instance_count + << ")"; + break; + } + default: + std::cout << "bad result kind (" << p->data.kind << ")" + << std::endl; + + } + } +} +``` diff --git a/include/rocprofiler/rocprofiler.h b/include/rocprofiler/rocprofiler.h new file mode 100644 index 00000000..7d9a02ef --- /dev/null +++ b/include/rocprofiler/rocprofiler.h @@ -0,0 +1,600 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ + +//////////////////////////////////////////////////////////////////////////////// +// +// ROC Profiler API +// +// The goal of the implementation is to provide a HW specific low-level +// performance analysis interface for profiling of GPU compute applications. +// The profiling includes HW performance counters (PMC) with complex +// performance metrics and traces. +// +// The library can be used by a tool library loaded by HSA runtime or by +// higher level HW independent performance analysis API like PAPI. +// +// The library is written on C and will be based on AQLprofile AMD specific +// HSA extension. The library implementation requires HSA API intercepting and +// a profiling queue supporting a submit callback interface. +// +// + +#ifndef INC_ROCPROFILER_H_ +#define INC_ROCPROFILER_H_ + +/* Placeholder for calling convention and import/export macros */ +#if !defined(ROCPROFILER_CALL) +#define ROCPROFILER_CALL +#endif /* !defined (ROCPROFILER_CALL) */ + +#if !defined(ROCPROFILER_EXPORT_DECORATOR) +#if defined(__GNUC__) +#define ROCPROFILER_EXPORT_DECORATOR __attribute__((visibility("default"))) +#elif defined(_MSC_VER) +#define ROCPROFILER_EXPORT_DECORATOR __declspec(dllexport) +#endif /* defined (_MSC_VER) */ +#endif /* !defined (ROCPROFILER_EXPORT_DECORATOR) */ + +#if !defined(ROCPROFILER_IMPORT_DECORATOR) +#if defined(__GNUC__) +#define ROCPROFILER_IMPORT_DECORATOR +#elif defined(_MSC_VER) +#define ROCPROFILER_IMPORT_DECORATOR __declspec(dllimport) +#endif /* defined (_MSC_VER) */ +#endif /* !defined (ROCPROFILER_IMPORT_DECORATOR) */ + +#define ROCPROFILER_EXPORT ROCPROFILER_EXPORT_DECORATOR ROCPROFILER_CALL +#define ROCPROFILER_IMPORT ROCPROFILER_IMPORT_DECORATOR ROCPROFILER_CALL + +#if !defined(ROCPROFILER) +#if defined(ROCPROFILER_EXPORTS) +#define ROCPROFILER_API ROCPROFILER_EXPORT +#else /* !defined (ROCPROFILER_EXPORTS) */ +#define ROCPROFILER_API ROCPROFILER_IMPORT +#endif /* !defined (ROCPROFILER_EXPORTS) */ +#endif /* !defined (ROCPROFILER) */ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +#include +#include +#include +#include +#include + + +#define ROCPROFILER_VERSION_MAJOR 8 +#define ROCPROFILER_VERSION_MINOR 0 + +//////////////////////////////////////////////////////////////////////////////// +// Returning library version +uint32_t rocprofiler_version_major(); +uint32_t rocprofiler_version_minor(); + +//////////////////////////////////////////////////////////////////////////////// +// Global properties structure + +typedef struct { + uint32_t intercept_mode; + uint32_t code_obj_tracking; + uint32_t memcopy_tracking; + uint32_t trace_size; + uint32_t trace_local; + uint64_t timeout; + uint32_t timestamp_on; + uint32_t hsa_intercepting; + uint32_t k_concurrent; + uint32_t opt_mode; + uint32_t obj_dumping; +} rocprofiler_settings_t; + +//////////////////////////////////////////////////////////////////////////////// +// Returning the error string method + +hsa_status_t rocprofiler_error_string( + const char** str); // [out] the API error string pointer returning + +//////////////////////////////////////////////////////////////////////////////// +// Profiling features and data +// +// Profiling features objects have profiling feature info, type, parameters and data +// Also profiling data samplaes can be iterated using a callback + +// Profiling feature kind +typedef enum { + ROCPROFILER_FEATURE_KIND_METRIC = 0, + ROCPROFILER_FEATURE_KIND_TRACE = 1, + ROCPROFILER_FEATURE_KIND_SPM_MOD = 2, + ROCPROFILER_FEATURE_KIND_PCSMP_MOD = 4 +} rocprofiler_feature_kind_t; + +// Profiling feture parameter +typedef hsa_ven_amd_aqlprofile_parameter_t rocprofiler_parameter_t; + +// Profiling data kind +typedef enum { + ROCPROFILER_DATA_KIND_UNINIT = 0, + ROCPROFILER_DATA_KIND_INT32 = 1, + ROCPROFILER_DATA_KIND_INT64 = 2, + ROCPROFILER_DATA_KIND_FLOAT = 3, + ROCPROFILER_DATA_KIND_DOUBLE = 4, + ROCPROFILER_DATA_KIND_BYTES = 5 +} rocprofiler_data_kind_t; + +// Profiling data type +typedef struct { + rocprofiler_data_kind_t kind; // result kind + union { + uint32_t result_int32; // 32bit integer result + uint64_t result_int64; // 64bit integer result + float result_float; // float single-precision result + double result_double; // float double-precision result + struct { + void* ptr; + uint32_t size; + uint32_t instance_count; + bool copy; + } result_bytes; // data by ptr and byte size + }; +} rocprofiler_data_t; + +// Profiling feature type +typedef struct { + rocprofiler_feature_kind_t kind; // feature kind + union { + const char* name; // feature name + struct { + const char* block; // counter block name + uint32_t event; // counter event id + } counter; + }; + const rocprofiler_parameter_t* parameters; // feature parameters array + uint32_t parameter_count; // feature parameters count + rocprofiler_data_t data; // profiling data +} rocprofiler_feature_t; + +// Profiling features set type +typedef void rocprofiler_feature_set_t; + +//////////////////////////////////////////////////////////////////////////////// +// Profiling context +// +// Profiling context object accumuate all profiling information + +// Profiling context object +typedef void rocprofiler_t; + +// Profiling group object +typedef struct { + unsigned index; // group index + rocprofiler_feature_t** features; // profiling info array + uint32_t feature_count; // profiling info count + rocprofiler_t* context; // context object +} rocprofiler_group_t; + +// Profiling mode mask +typedef enum { + ROCPROFILER_MODE_STANDALONE = 1, // standalone mode when ROC profiler supports a queue + ROCPROFILER_MODE_CREATEQUEUE = 2, // ROC profiler creates queue in standalone mode + ROCPROFILER_MODE_SINGLEGROUP = 4 // only one group is allowed, failed otherwise +} rocprofiler_mode_t; + +// Profiling handler, calling on profiling completion +typedef bool (*rocprofiler_handler_t)(rocprofiler_group_t group, void* arg); + +// Profiling preperties +typedef struct { + hsa_queue_t* queue; // queue for STANDALONE mode + // the queue is created and returned in CREATEQUEUE mode + uint32_t queue_depth; // created queue depth + rocprofiler_handler_t handler; // handler on completion + void* handler_arg; // the handler arg +} rocprofiler_properties_t; + +// Create new profiling context +hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in] profiling features array + uint32_t feature_count, // profiling info count + rocprofiler_t** context, // [out] context object + uint32_t mode, // profiling mode mask + rocprofiler_properties_t* properties); // profiling properties + +// Add feature to a features set +hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in] + rocprofiler_feature_set_t* features_set); // [in/out] profiling features set + +// Create new profiling context +hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle + rocprofiler_feature_set_t* features_set, // [in] profiling features set + rocprofiler_t** context, // [out] context object + uint32_t mode, // profiling mode mask + rocprofiler_properties_t* properties); // profiling properties + +// Delete profiling info +hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context + +// Context reset before reusing +hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling context + uint32_t group_index); // group index + +// Return context agent +hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context + hsa_agent_t* agent); // [out] GPU handle + +// Supported time value ID +typedef enum { + ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time + ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time + ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time + ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time + ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time +} rocprofiler_time_id_t; + +// Return time value for a given time ID and profiling timestamp +hsa_status_t rocprofiler_get_time( + rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp + uint64_t timestamp, // profiling timestamp + uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL + uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL + +//////////////////////////////////////////////////////////////////////////////// +// Queue callbacks +// +// Queue callbacks for initiating profiling per kernel dispatch and to wait +// the profiling data on the queue destroy. + +// Dispatch record +typedef struct { + uint64_t dispatch; // dispatch timestamp, ns + uint64_t begin; // kernel begin timestamp, ns + uint64_t end; // kernel end timestamp, ns + uint64_t complete; // completion signal timestamp, ns +} rocprofiler_dispatch_record_t; + +// Profiling callback data +typedef struct { + hsa_agent_t agent; // GPU agent handle + uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology) + const hsa_queue_t* queue; // HSA queue + uint64_t queue_index; // Index in the queue + uint32_t queue_id; // Queue id + hsa_signal_t completion_signal; // Completion signal + const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet + const char* kernel_name; // Kernel name + uint64_t kernel_object; // Kernel object address + const amd_kernel_code_t* kernel_code; // Kernel code pointer + uint32_t thread_id; // Thread id + const rocprofiler_dispatch_record_t* record; // Dispatch record +} rocprofiler_callback_data_t; + +// Profiling callback type +typedef hsa_status_t (*rocprofiler_callback_t)( + const rocprofiler_callback_data_t* callback_data, // [in] callback data + void* user_data, // [in/out] user data passed to the callback + rocprofiler_group_t* group); // [out] returned profiling group + +// Queue callbacks +typedef struct { + rocprofiler_callback_t dispatch; // dispatch callback + hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback + hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback +} rocprofiler_queue_callbacks_t; + +// Set queue callbacks +hsa_status_t rocprofiler_set_queue_callbacks( + rocprofiler_queue_callbacks_t callbacks, // callbacks + void* data); // [in/out] passed callbacks data + +// Remove queue callbacks +hsa_status_t rocprofiler_remove_queue_callbacks(); + +// Start/stop queue callbacks +hsa_status_t rocprofiler_start_queue_callbacks(); +hsa_status_t rocprofiler_stop_queue_callbacks(); + +//////////////////////////////////////////////////////////////////////////////// +// Start/stop profiling +// +// Start/stop the context profiling invocation, have to be as many as +// contect.invocations' to collect all profiling data + +// Start profiling +hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index); // group index + +// Stop profiling +hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index); // group index + +// Read profiling +hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index); // group index + +// Read profiling data +hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context + uint32_t group_index); // group index + +// Get profiling groups count +hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context + uint32_t* group_count); // [out] profiling groups count + +// Get profiling group for a given index +hsa_status_t rocprofiler_get_group(rocprofiler_t* context, // [in] profiling context + uint32_t group_index, // profiling group index + rocprofiler_group_t* group); // [out] profiling group + +// Start profiling +hsa_status_t rocprofiler_group_start(rocprofiler_group_t* group); // [in/out] profiling group + +// Stop profiling +hsa_status_t rocprofiler_group_stop(rocprofiler_group_t* group); // [in/out] profiling group + +// Read profiling +hsa_status_t rocprofiler_group_read(rocprofiler_group_t* group); // [in/out] profiling group + +// Get profiling data +hsa_status_t rocprofiler_group_get_data(rocprofiler_group_t* group); // [in/out] profiling group + +// Get metrics data +hsa_status_t rocprofiler_get_metrics(const rocprofiler_t* context); // [in/out] profiling context + +// Definition of output data iterator callback +typedef hsa_ven_amd_aqlprofile_data_callback_t rocprofiler_trace_data_callback_t; + +// Method for iterating the events output data +hsa_status_t rocprofiler_iterate_trace_data( + rocprofiler_t* context, // [in] profiling context + rocprofiler_trace_data_callback_t callback, // callback to iterate the output data + void* data); // [in/out] callback data + +//////////////////////////////////////////////////////////////////////////////// +// Profiling features and data +// +// Profiling features objects have profiling feature info, type, parameters and data +// Also profiling data samplaes can be iterated using a callback + +// Profiling info kind +typedef enum { + ROCPROFILER_INFO_KIND_METRIC = 0, // metric info + ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32 + ROCPROFILER_INFO_KIND_TRACE = 2, // trace info + ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32 + ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info + ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32 +} rocprofiler_info_kind_t; + +// Profiling info query +typedef union { + rocprofiler_info_kind_t info_kind; // queried profiling info kind + struct { + const char* trace_name; // queried info trace name + } trace_parameter; +} rocprofiler_info_query_t; + +// Profiling info data +typedef struct { + uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology) + rocprofiler_info_kind_t kind; // info data kind + union { + struct { + const char* name; // metric name + uint32_t instances; // instances number + const char* expr; // metric expression, NULL for basic counters + const char* description; // metric description + const char* block_name; // block name + uint32_t block_counters; // number of block counters + } metric; + struct { + const char* name; // trace name + const char* description; // trace description + uint32_t parameter_count; // supported by the trace number parameters + } trace; + struct { + uint32_t code; // parameter code + const char* trace_name; // trace name + const char* parameter_name; // parameter name + const char* description; // trace parameter description + } trace_parameter; + }; +} rocprofiler_info_data_t; + +// Return the info for a given info kind +hsa_status_t rocprofiler_get_info( + const hsa_agent_t* agent, // [in] GFXIP handle + rocprofiler_info_kind_t kind, // kind of iterated info + void *data); // [in/out] returned data + +// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration +hsa_status_t rocprofiler_iterate_info( + const hsa_agent_t* agent, // [in] GFXIP handle + rocprofiler_info_kind_t kind, // kind of iterated info + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); // [in/out] data passed to callback + +// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration +hsa_status_t rocprofiler_query_info( + const hsa_agent_t *agent, // [in] GFXIP handle + rocprofiler_info_query_t query, // iterated info query + hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback + void *data); // [in/out] data passed to callback + +// Create a profiled queue. All dispatches on this queue will be profiled +hsa_status_t rocprofiler_queue_create_profiled( + hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type, + void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), + void* data, uint32_t private_segment_size, uint32_t group_segment_size, + hsa_queue_t** queue); + +//////////////////////////////////////////////////////////////////////////////// +// Profiling pool +// +// Support for profiling contexts pool +// The API provide capability to create a contexts pool for a given agent and a set of features, +// to fetch/relase a context entry, to register a callback for the contexts completion. + +// Profiling pool handle +typedef void rocprofiler_pool_t; + +// Profiling pool entry +typedef struct { + rocprofiler_t* context; // context object + void* payload; // payload data object +} rocprofiler_pool_entry_t; + +// Profiling handler, calling on profiling completion +typedef bool (*rocprofiler_pool_handler_t)(const rocprofiler_pool_entry_t* entry, void* arg); + +// Profiling preperties +typedef struct { + uint32_t num_entries; // pool size entries + uint32_t payload_bytes; // payload size bytes + rocprofiler_pool_handler_t handler; // handler on context completion + void* handler_arg; // the handler arg +} rocprofiler_pool_properties_t; + +// Open profiling pool +hsa_status_t rocprofiler_pool_open( + hsa_agent_t agent, // GPU handle + rocprofiler_feature_t* features, // [in] profiling features array + uint32_t feature_count, // profiling info count + rocprofiler_pool_t** pool, // [out] context object + uint32_t mode, // profiling mode mask + rocprofiler_pool_properties_t*); // pool properties + +// Close profiling pool +hsa_status_t rocprofiler_pool_close( + rocprofiler_pool_t* pool); // profiling pool handle + +// Fetch profiling pool entry +hsa_status_t rocprofiler_pool_fetch( + rocprofiler_pool_t* pool, // profiling pool handle + rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry + +// Release profiling pool entry +hsa_status_t rocprofiler_pool_release( + rocprofiler_pool_entry_t* entry); // released profiling pool entry + +// Iterate fetched profiling pool entries +hsa_status_t rocprofiler_pool_iterate( + rocprofiler_pool_t* pool, // profiling pool handle + hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback + void *data); // [in/out] data passed to callback + +// Flush completed entries in profiling pool +hsa_status_t rocprofiler_pool_flush( + rocprofiler_pool_t* pool); // profiling pool handle + +//////////////////////////////////////////////////////////////////////////////// +// HSA intercepting API + +// HSA callbacks ID enumeration +typedef enum { + ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback + ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback + ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback + ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback + ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol + ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol +} rocprofiler_hsa_cb_id_t; + +// HSA callback data type +typedef struct { + union { + struct { + const void* ptr; // allocated area ptr + size_t size; // allocated area size, zero size means 'free' callback + hsa_amd_segment_t segment; // allocated area's memory segment type + hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag + int is_code; // equal to 1 if code is allocated + } allocate; + struct { + hsa_device_type_t type; // type of assigned device + uint32_t id; // id of assigned device + hsa_agent_t agent; // device HSA agent handle + const void* ptr; // ptr the device is assigned to + } device; + struct { + const void* dst; // memcopy dst ptr + const void* src; // memcopy src ptr + size_t size; // memcopy size bytes + } memcopy; + struct { + const void* packet; // submitted to GPU packet + const char* kernel_name; // kernel name, not NULL if dispatch + hsa_queue_t* queue; // HSA queue the kernel was submitted to + uint32_t device_type; // type of device the packed is submitted to + uint32_t device_id; // id of device the packed is submitted to + } submit; + struct { + uint64_t object; // kernel symbol object + const char* name; // kernel symbol name + uint32_t name_length; // kernel symbol name length + int unload; // symbol executable destroy + } ksymbol; + struct { + uint32_t storage_type; // code object storage type + int storage_file; // origin file descriptor + uint64_t memory_base; // origin memory base + uint64_t memory_size; // origin memory size + uint64_t load_base; // codeobj load base + uint64_t load_size; // codeobj load size + uint64_t load_delta; // codeobj load size + uint32_t uri_length; // URI string length + char* uri; // URI string + int unload; // unload flag + } codeobj; + }; +} rocprofiler_hsa_callback_data_t; + +// HSA callback function type +typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)( + rocprofiler_hsa_cb_id_t id, // callback id + const rocprofiler_hsa_callback_data_t* data, // [in] callback data + void* arg); // [in/out] user passed data + +// HSA callbacks structure +typedef struct { + rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback + rocprofiler_hsa_callback_fun_t device; // agent assign callback + rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback + rocprofiler_hsa_callback_fun_t submit; // packet submit callback + rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback + rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback +} rocprofiler_hsa_callbacks_t; + +// Set callbacks. If the callback is NULL then it is disabled. +// If callback returns a value that is not HSA_STATUS_SUCCESS the callback +// will be unregistered. +hsa_status_t rocprofiler_set_hsa_callbacks( + const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function + void* arg); // callback user data + +#ifdef __cplusplus +} // extern "C" block +#endif // __cplusplus + +#endif // INC_ROCPROFILER_H_ diff --git a/inc/rocprofiler.h b/include/rocprofiler/v2/rocprofiler.h similarity index 76% rename from inc/rocprofiler.h rename to include/rocprofiler/v2/rocprofiler.h index 4784a489..19838583 100644 --- a/inc/rocprofiler.h +++ b/include/rocprofiler/v2/rocprofiler.h @@ -138,10 +138,10 @@ extern "C" { */ /** - * The function was introduced in version 1.5 of the interface and has the - * symbol version string of ``"ROCPROFILER_1.5"``. + * The function was introduced in version 9.0 of the interface and has the + * symbol version string of ``"ROCPROFILER_9.0"``. */ -#define ROCPROFILER_VERSION_2_0 +#define ROCPROFILER_VERSION_9_0 /** @} */ @@ -162,7 +162,7 @@ extern "C" { * The major version of the interface as a macro so it can be used by the * preprocessor. */ -#define ROCPROFILER_VERSION_MAJOR 2 +#define ROCPROFILER_VERSION_MAJOR 9 /** * The minor version of the interface as a macro so it can be used by the @@ -190,8 +190,6 @@ ROCPROFILER_API uint32_t rocprofiler_version_minor(); /** @} */ -#ifndef ROCPROFILER_V1 - // TODO(aelwazir): Fix them to use the new Error codes /** \defgroup status_codes_group Status Codes * @@ -352,7 +350,7 @@ typedef enum { * * @retval Return the error string. */ -ROCPROFILER_API const char* rocprofiler_error_str(rocprofiler_status_t status) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API const char* rocprofiler_error_str(rocprofiler_status_t status) ROCPROFILER_VERSION_9_0; /** @} */ @@ -370,7 +368,7 @@ ROCPROFILER_API const char* rocprofiler_error_str(rocprofiler_status_t status) R * @retval ::ROCPROFILER_STATUS_ERROR_API_ALREADY_INITIALIZED If initialize * wasn't called or finalized called twice */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_initialize() ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_initialize() ROCPROFILER_VERSION_9_0; /** * Finalize the API Tools @@ -380,7 +378,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_initialize() ROCPROFILER_VERSIO * @retval ::ROCPROFILER_STATUS_ERROR_API_NOT_INITIALIZED If initialize wasn't * called or finalized called twice */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_finalize() ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_finalize() ROCPROFILER_VERSION_9_0; /** * \addtogroup sessions_handling_group @@ -428,7 +426,7 @@ typedef struct { * failed to get the timestamp using HSA Function. * */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_get_timestamp(rocprofiler_timestamp_t* timestamp) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_get_timestamp(rocprofiler_timestamp_t* timestamp) ROCPROFILER_VERSION_9_0; /** * Timestamps (start & end), it will be used for kernel dispatch tracing as @@ -569,7 +567,7 @@ typedef enum { */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_agent_info_size(rocprofiler_agent_info_kind_t kind, rocprofiler_agent_id_t agent_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query Agent Information Data using an allocated data pointer by the user, @@ -590,7 +588,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_agent_info_size(rocprofil */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_agent_info(rocprofiler_agent_info_kind_t kind, rocprofiler_agent_id_t descriptor, - const char** name) ROCPROFILER_VERSION_2_0; + const char** name) ROCPROFILER_VERSION_9_0; /** @} */ @@ -642,7 +640,7 @@ typedef enum { */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_queue_info_size(rocprofiler_queue_info_kind_t kind, rocprofiler_queue_id_t agent_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query Queue Information Data using an allocated data pointer by the user, @@ -663,7 +661,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_queue_info_size(rocprofil */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_queue_info(rocprofiler_queue_info_kind_t kind, rocprofiler_queue_id_t descriptor, - const char** name) ROCPROFILER_VERSION_2_0; + const char** name) ROCPROFILER_VERSION_9_0; /** @} */ @@ -713,7 +711,7 @@ typedef enum { */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_kernel_info_size(rocprofiler_kernel_info_kind_t kind, rocprofiler_kernel_id_t kernel_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query Kernel Information Data using an allocated data pointer by the user, @@ -734,7 +732,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_kernel_info_size(rocprofi */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_kernel_info(rocprofiler_kernel_info_kind_t kind, rocprofiler_kernel_id_t kernel_id, - const char** data) ROCPROFILER_VERSION_2_0; + const char** data) ROCPROFILER_VERSION_9_0; /** @} */ @@ -777,10 +775,10 @@ typedef struct { } rocprofiler_counter_info_t; typedef int (*rocprofiler_counters_info_callback_t)(rocprofiler_counter_info_t counter, - const char* gpu_name, uint32_t gpu_index) ROCPROFILER_VERSION_2_0; + const char* gpu_name, uint32_t gpu_index) ROCPROFILER_VERSION_9_0; ROCPROFILER_API rocprofiler_status_t -rocprofiler_iterate_counters(rocprofiler_counters_info_callback_t counters_info_callback) ROCPROFILER_VERSION_2_0; +rocprofiler_iterate_counters(rocprofiler_counters_info_callback_t counters_info_callback) ROCPROFILER_VERSION_9_0; /** * Counter ID to be used to query counter information using @@ -834,7 +832,7 @@ typedef enum { */ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_counter_info_size( rocprofiler_session_id_t session_id, rocprofiler_counter_info_kind_t counter_info_type, - rocprofiler_counter_id_t counter_id, size_t* data_size) ROCPROFILER_VERSION_2_0; + rocprofiler_counter_id_t counter_id, size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query Counter Information Data using an allocated data pointer by the user, @@ -857,7 +855,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_counter_info_size( ROCPROFILER_API rocprofiler_status_t rocprofiler_query_counter_info(rocprofiler_session_id_t session_id, rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id, - const char** data) ROCPROFILER_VERSION_2_0; + const char** data) ROCPROFILER_VERSION_9_0; typedef struct { /** @@ -1236,7 +1234,7 @@ typedef enum { ROCPROFILER_API rocprofiler_status_t rocprofiler_query_roctx_tracer_api_data_info_size( rocprofiler_session_id_t session_id, rocprofiler_tracer_roctx_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query API Data Information using an allocated data pointer by the user, @@ -1264,7 +1262,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_roctx_tracer_api_data_inf ROCPROFILER_API rocprofiler_status_t rocprofiler_query_roctx_tracer_api_data_info( rocprofiler_session_id_t session_id, rocprofiler_tracer_roctx_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - char** data) ROCPROFILER_VERSION_2_0; + char** data) ROCPROFILER_VERSION_9_0; /** @} */ @@ -1325,7 +1323,7 @@ typedef enum { ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hsa_tracer_api_data_info_size( rocprofiler_session_id_t session_id, rocprofiler_tracer_hsa_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query API Data Information using an allocated data pointer by the user, @@ -1353,7 +1351,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hsa_tracer_api_data_info_ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hsa_tracer_api_data_info( rocprofiler_session_id_t session_id, rocprofiler_tracer_hsa_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - char** data) ROCPROFILER_VERSION_2_0; + char** data) ROCPROFILER_VERSION_9_0; /** @} */ @@ -1438,7 +1436,7 @@ typedef enum { ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hip_tracer_api_data_info_size( rocprofiler_session_id_t session_id, rocprofiler_tracer_hip_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - size_t* data_size) ROCPROFILER_VERSION_2_0; + size_t* data_size) ROCPROFILER_VERSION_9_0; /** * Query API Data Information using an allocated data pointer by the user, @@ -1465,7 +1463,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hip_tracer_api_data_info_ ROCPROFILER_API rocprofiler_status_t rocprofiler_query_hip_tracer_api_data_info( rocprofiler_session_id_t session_id, rocprofiler_tracer_hip_api_data_info_t kind, rocprofiler_tracer_api_data_handle_t api_data_id, rocprofiler_tracer_operation_id_t operation_id, - char** data) ROCPROFILER_VERSION_2_0; + char** data) ROCPROFILER_VERSION_9_0; /** @} */ @@ -1658,7 +1656,7 @@ typedef void (*rocprofiler_buffer_callback_t)(const rocprofiler_record_header_t* * the session buffer is corrupted */ ROCPROFILER_API rocprofiler_status_t rocprofiler_flush_data(rocprofiler_session_id_t session_id, - rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_2_0; + rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_9_0; /** * Get a pointer to the next profiling record. @@ -1682,7 +1680,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_flush_data(rocprofiler_session_ ROCPROFILER_API rocprofiler_status_t rocprofiler_next_record(const rocprofiler_record_header_t* record, const rocprofiler_record_header_t** next, rocprofiler_session_id_t session_id, - rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_2_0; + rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_9_0; /** @} */ @@ -1745,7 +1743,7 @@ typedef enum { * wasn't called before or if rocprofiler_finalize is called */ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_session(rocprofiler_replay_mode_t replay_mode, - rocprofiler_session_id_t* session_id) ROCPROFILER_VERSION_2_0; + rocprofiler_session_id_t* session_id) ROCPROFILER_VERSION_9_0; /** * Destroy Session @@ -1762,7 +1760,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_session(rocprofiler_repl * @retval ::ROCPROFILER_STATUS_ERROR_SESSION_NOT_FOUND may return if * the session is not found */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_destroy_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_destroy_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** \defgroup session_filter_group Session Filters Handling * \ingroup sessions_handling_group @@ -2021,7 +2019,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_filter(rocprofiler_sessi rocprofiler_filter_data_t data, uint64_t data_count, rocprofiler_filter_id_t* filter_id, - rocprofiler_filter_property_t property = {}) ROCPROFILER_VERSION_2_0; + rocprofiler_filter_property_t property) ROCPROFILER_VERSION_9_0; /** * Set Session Filter Buffer @@ -2042,7 +2040,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_filter(rocprofiler_sessi */ ROCPROFILER_API rocprofiler_status_t rocprofiler_set_filter_buffer(rocprofiler_session_id_t session_id, rocprofiler_filter_id_t filter_id, - rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_2_0; + rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_9_0; /** * Synchronous Callback @@ -2078,7 +2076,7 @@ typedef void (*rocprofiler_sync_callback_t)(rocprofiler_record_tracer_t record, */ ROCPROFILER_API rocprofiler_status_t rocprofiler_set_api_trace_sync_callback( rocprofiler_session_id_t session_id, rocprofiler_filter_id_t filter_id, - rocprofiler_sync_callback_t callback) ROCPROFILER_VERSION_2_0; + rocprofiler_sync_callback_t callback) ROCPROFILER_VERSION_9_0; /** * Destroy Session Filter @@ -2095,7 +2093,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_set_api_trace_sync_callback( * @retval ::ROCPROFILER_STATUS_FILTER_NOT_FOUND Couldn't find session filter */ ROCPROFILER_API rocprofiler_status_t rocprofiler_destroy_filter(rocprofiler_session_id_t session_id, - rocprofiler_filter_id_t filter_id) ROCPROFILER_VERSION_2_0; + rocprofiler_filter_id_t filter_id) ROCPROFILER_VERSION_9_0; /** * Create Buffer @@ -2122,7 +2120,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_destroy_filter(rocprofiler_sess */ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_buffer( rocprofiler_session_id_t session_id, rocprofiler_buffer_callback_t buffer_callback, - size_t buffer_size, rocprofiler_buffer_id_t* buffer_id) ROCPROFILER_VERSION_2_0; + size_t buffer_size, rocprofiler_buffer_id_t* buffer_id) ROCPROFILER_VERSION_9_0; /** * Setting Buffer Properties @@ -2149,7 +2147,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_buffer( */ ROCPROFILER_API rocprofiler_status_t rocprofiler_set_buffer_properties( rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id, - rocprofiler_buffer_property_t* buffer_properties, uint32_t buffer_properties_count) ROCPROFILER_VERSION_2_0; + rocprofiler_buffer_property_t* buffer_properties, uint32_t buffer_properties_count) ROCPROFILER_VERSION_9_0; /** * Destroy Buffer @@ -2171,7 +2169,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_set_buffer_properties( * or corrupted */ ROCPROFILER_API rocprofiler_status_t rocprofiler_destroy_buffer(rocprofiler_session_id_t session_id, - rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_2_0; + rocprofiler_buffer_id_t buffer_id) ROCPROFILER_VERSION_9_0; /** @} */ @@ -2225,7 +2223,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_ready_session( rocprofiler_replay_mode_t replay_mode, rocprofiler_filter_kind_t filter_kind, rocprofiler_filter_data_t data, uint64_t data_count, size_t buffer_size, rocprofiler_buffer_callback_t buffer_callback, rocprofiler_session_id_t* session_id, - rocprofiler_filter_property_t property = {}, rocprofiler_sync_callback_t callback = nullptr) ROCPROFILER_VERSION_2_0; + rocprofiler_filter_property_t property, rocprofiler_sync_callback_t callback) ROCPROFILER_VERSION_9_0; // TODO(aelwazir): Multiple sessions activate for different set of filters /** @@ -2247,7 +2245,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_create_ready_session( * @retval ::ROCPROFILER_STATUS_ERROR_HAS_ACTIVE_SESSION if there is already * active session */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_start_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_start_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** * Deactivate Session @@ -2263,7 +2261,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_start_session(rocprofiler_sessi * active */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_terminate_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_terminate_session(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** \defgroup session_range_group Session Range Labeling * \ingroup sessions_handling_group @@ -2283,7 +2281,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_terminate_session(rocprofiler_s * @retval ::ROCPROFILER_STATUS_ERROR_CORRUPTED_LABEL_DATA may return if * the label pointer can't be read by the API */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_push_range(const char* label) ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_push_range(const char* label) ROCPROFILER_VERSION_9_0; /** * Setting an endpoint for a range @@ -2297,7 +2295,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_push_range(const char* label) R * @retval ::ROCPROFILER_STATUS_ERROR_RANGE_STACK_IS_EMPTY may return if * ::rocprofiler_push_range wasn't called correctly */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_pop_range() ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_pop_range() ROCPROFILER_VERSION_9_0; /** @} */ @@ -2319,7 +2317,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_pop_range() ROCPROFILER_VERSION * @retval ::ROCPROFILER_STATUS_ERROR_SESSION_NOT_FOUND If the no active session * found */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_start_replay_pass() ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_start_replay_pass() ROCPROFILER_VERSION_9_0; /** * End a pass @@ -2332,7 +2330,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_start_replay_pass() ROCPROFILER * @retval ::ROCPROFILER_STATUS_ERROR_PASS_NOT_STARTED if there is no pass * started before this call */ -ROCPROFILER_API rocprofiler_status_t rocprofiler_end_replay_pass() ROCPROFILER_VERSION_2_0; +ROCPROFILER_API rocprofiler_status_t rocprofiler_end_replay_pass() ROCPROFILER_VERSION_9_0; /** @} */ /** @} */ @@ -2369,7 +2367,7 @@ typedef struct { */ ROCPROFILER_API rocprofiler_status_t rocprofiler_device_profiling_session_create( const char** counter_names, uint64_t num_counters, rocprofiler_session_id_t* session_id, - int cpu_index, int gpu_index) ROCPROFILER_VERSION_2_0; + int cpu_index, int gpu_index) ROCPROFILER_VERSION_9_0; /** * Start the device profiling session that was created previously. @@ -2380,7 +2378,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_device_profiling_session_create * successfully. */ ROCPROFILER_API rocprofiler_status_t -rocprofiler_device_profiling_session_start(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +rocprofiler_device_profiling_session_start(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** * Poll the device profiling session to read counters from the GPU device. @@ -2395,7 +2393,7 @@ rocprofiler_device_profiling_session_start(rocprofiler_session_id_t session_id) * successfully. */ ROCPROFILER_API rocprofiler_status_t rocprofiler_device_profiling_session_poll( - rocprofiler_session_id_t session_id, rocprofiler_device_profile_metric_t* data) ROCPROFILER_VERSION_2_0; + rocprofiler_session_id_t session_id, rocprofiler_device_profile_metric_t* data) ROCPROFILER_VERSION_9_0; /** * Stop the device profiling session that was created previously. @@ -2406,7 +2404,7 @@ ROCPROFILER_API rocprofiler_status_t rocprofiler_device_profiling_session_poll( * successfully. */ ROCPROFILER_API rocprofiler_status_t -rocprofiler_device_profiling_session_stop(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +rocprofiler_device_profiling_session_stop(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** * Destroy the device profiling session that was created previously. @@ -2416,524 +2414,12 @@ rocprofiler_device_profiling_session_stop(rocprofiler_session_id_t session_id) R * successfully. */ ROCPROFILER_API rocprofiler_status_t -rocprofiler_device_profiling_session_destroy(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_2_0; +rocprofiler_device_profiling_session_destroy(rocprofiler_session_id_t session_id) ROCPROFILER_VERSION_9_0; /** @} */ -#endif - -//////////////////////////////////////////////////////////////////////////////// -//////////////////////////////////////////////////////////////////////////////// -// Old ROCProfiler -//////////////////////////////////////////////////////////////////////////////// -//////////////////////////////////////////////////////////////////////////////// - -#include -#include -#include -#include -#include - -//////////////////////////////////////////////////////////////////////////////// -// Global properties structure - -typedef struct { - uint32_t intercept_mode; - uint32_t code_obj_tracking; - uint32_t memcopy_tracking; - uint32_t trace_size; - uint32_t trace_local; - uint64_t timeout; - uint32_t timestamp_on; - uint32_t hsa_intercepting; - uint32_t k_concurrent; - uint32_t opt_mode; - uint32_t obj_dumping; -} rocprofiler_settings_t; - -//////////////////////////////////////////////////////////////////////////////// -// Returning the error string method - -hsa_status_t rocprofiler_error_string( - const char** str); // [out] the API error string pointer returning - -//////////////////////////////////////////////////////////////////////////////// -// Profiling features and data -// -// Profiling features objects have profiling feature info, type, parameters and data -// Also profiling data samplaes can be iterated using a callback - -// Profiling feature kind -typedef enum { - ROCPROFILER_FEATURE_KIND_METRIC = 0, - ROCPROFILER_FEATURE_KIND_TRACE = 1, - ROCPROFILER_FEATURE_KIND_SPM_MOD = 2, - ROCPROFILER_FEATURE_KIND_PCSMP_MOD = 4 -} rocprofiler_feature_kind_t; - -// Profiling feture parameter -typedef hsa_ven_amd_aqlprofile_parameter_t rocprofiler_parameter_t; - -// Profiling data kind -typedef enum { - ROCPROFILER_DATA_KIND_UNINIT = 0, - ROCPROFILER_DATA_KIND_INT32 = 1, - ROCPROFILER_DATA_KIND_INT64 = 2, - ROCPROFILER_DATA_KIND_FLOAT = 3, - ROCPROFILER_DATA_KIND_DOUBLE = 4, - ROCPROFILER_DATA_KIND_BYTES = 5 -} rocprofiler_data_kind_t; - -// Profiling data type -typedef struct { - rocprofiler_data_kind_t kind; // result kind - union { - uint32_t result_int32; // 32bit integer result - uint64_t result_int64; // 64bit integer result - float result_float; // float single-precision result - double result_double; // float double-precision result - struct { - void* ptr; - uint32_t size; - uint32_t instance_count; - bool copy; - } result_bytes; // data by ptr and byte size - }; -} rocprofiler_data_t; - -// Profiling feature type -typedef struct { - rocprofiler_feature_kind_t kind; // feature kind - union { - const char* name; // feature name - struct { - const char* block; // counter block name - uint32_t event; // counter event id - } counter; - }; - const rocprofiler_parameter_t* parameters; // feature parameters array - uint32_t parameter_count; // feature parameters count - rocprofiler_data_t data; // profiling data -} rocprofiler_feature_t; - -// Profiling features set type -typedef void rocprofiler_feature_set_t; - -//////////////////////////////////////////////////////////////////////////////// -// Profiling context -// -// Profiling context object accumuate all profiling information - -// Profiling context object -typedef void rocprofiler_t; - -// Profiling group object -typedef struct { - unsigned index; // group index - rocprofiler_feature_t** features; // profiling info array - uint32_t feature_count; // profiling info count - rocprofiler_t* context; // context object -} rocprofiler_group_t; - -// Profiling mode mask -typedef enum { - ROCPROFILER_MODE_STANDALONE = 1, // standalone mode when ROC profiler supports a queue - ROCPROFILER_MODE_CREATEQUEUE = 2, // ROC profiler creates queue in standalone mode - ROCPROFILER_MODE_SINGLEGROUP = 4 // only one group is allowed, failed otherwise -} rocprofiler_mode_t; - -// Profiling handler, calling on profiling completion -typedef bool (*rocprofiler_handler_t)(rocprofiler_group_t group, void* arg); - -// Profiling preperties -typedef struct { - hsa_queue_t* queue; // queue for STANDALONE mode - // the queue is created and returned in CREATEQUEUE mode - uint32_t queue_depth; // created queue depth - rocprofiler_handler_t handler; // handler on completion - void* handler_arg; // the handler arg -} rocprofiler_properties_t; - -// Create new profiling context -hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle - rocprofiler_feature_t* features, // [in] profiling features array - uint32_t feature_count, // profiling info count - rocprofiler_t** context, // [out] context object - uint32_t mode, // profiling mode mask - rocprofiler_properties_t* properties); // profiling properties - -// Add feature to a features set -hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in] - rocprofiler_feature_set_t* features_set); // [in/out] profiling features set - -// Create new profiling context -hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle - rocprofiler_feature_set_t* features_set, // [in] profiling features set - rocprofiler_t** context, // [out] context object - uint32_t mode, // profiling mode mask - rocprofiler_properties_t* properties); // profiling properties - -// Delete profiling info -hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context - -// Context reset before reusing -hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling context - uint32_t group_index); // group index - -// Return context agent -hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context - hsa_agent_t* agent); // [out] GPU handle - -// Supported time value ID -typedef enum { - ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time - ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time - ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time - ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time - ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time -} rocprofiler_time_id_t; - -// Return time value for a given time ID and profiling timestamp -hsa_status_t rocprofiler_get_time( - rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp - uint64_t timestamp, // profiling timestamp - uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL - uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL - -//////////////////////////////////////////////////////////////////////////////// -// Queue callbacks -// -// Queue callbacks for initiating profiling per kernel dispatch and to wait -// the profiling data on the queue destroy. - -// Dispatch record -typedef struct { - uint64_t dispatch; // dispatch timestamp, ns - uint64_t begin; // kernel begin timestamp, ns - uint64_t end; // kernel end timestamp, ns - uint64_t complete; // completion signal timestamp, ns -} rocprofiler_dispatch_record_t; - -// Profiling callback data -typedef struct { - hsa_agent_t agent; // GPU agent handle - uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology) - const hsa_queue_t* queue; // HSA queue - uint64_t queue_index; // Index in the queue - uint32_t queue_id; // Queue id - hsa_signal_t completion_signal; // Completion signal - const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet - const char* kernel_name; // Kernel name - uint64_t kernel_object; // Kernel object address - const amd_kernel_code_t* kernel_code; // Kernel code pointer - uint32_t thread_id; // Thread id - const rocprofiler_dispatch_record_t* record; // Dispatch record -} rocprofiler_callback_data_t; - -// Profiling callback type -typedef hsa_status_t (*rocprofiler_callback_t)( - const rocprofiler_callback_data_t* callback_data, // [in] callback data - void* user_data, // [in/out] user data passed to the callback - rocprofiler_group_t* group); // [out] returned profiling group - -// Queue callbacks -typedef struct { - rocprofiler_callback_t dispatch; // dispatch callback - hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback - hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback -} rocprofiler_queue_callbacks_t; - -// Set queue callbacks -hsa_status_t rocprofiler_set_queue_callbacks( - rocprofiler_queue_callbacks_t callbacks, // callbacks - void* data); // [in/out] passed callbacks data - -// Remove queue callbacks -hsa_status_t rocprofiler_remove_queue_callbacks(); - -// Start/stop queue callbacks -hsa_status_t rocprofiler_start_queue_callbacks(); -hsa_status_t rocprofiler_stop_queue_callbacks(); - -//////////////////////////////////////////////////////////////////////////////// -// Start/stop profiling -// -// Start/stop the context profiling invocation, have to be as many as -// contect.invocations' to collect all profiling data - -// Start profiling -hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context - uint32_t group_index); // group index - -// Stop profiling -hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context - uint32_t group_index); // group index - -// Read profiling -hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context - uint32_t group_index); // group index - -// Read profiling data -hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context - uint32_t group_index); // group index - -// Get profiling groups count -hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context - uint32_t* group_count); // [out] profiling groups count - -// Get profiling group for a given index -hsa_status_t rocprofiler_get_group(rocprofiler_t* context, // [in] profiling context - uint32_t group_index, // profiling group index - rocprofiler_group_t* group); // [out] profiling group - -// Start profiling -hsa_status_t rocprofiler_group_start(rocprofiler_group_t* group); // [in/out] profiling group - -// Stop profiling -hsa_status_t rocprofiler_group_stop(rocprofiler_group_t* group); // [in/out] profiling group - -// Read profiling -hsa_status_t rocprofiler_group_read(rocprofiler_group_t* group); // [in/out] profiling group - -// Get profiling data -hsa_status_t rocprofiler_group_get_data(rocprofiler_group_t* group); // [in/out] profiling group - -// Get metrics data -hsa_status_t rocprofiler_get_metrics(const rocprofiler_t* context); // [in/out] profiling context - -// Definition of output data iterator callback -typedef hsa_ven_amd_aqlprofile_data_callback_t rocprofiler_trace_data_callback_t; - -// Method for iterating the events output data -hsa_status_t rocprofiler_iterate_trace_data( - rocprofiler_t* context, // [in] profiling context - rocprofiler_trace_data_callback_t callback, // callback to iterate the output data - void* data); // [in/out] callback data - -//////////////////////////////////////////////////////////////////////////////// -// Profiling features and data -// -// Profiling features objects have profiling feature info, type, parameters and data -// Also profiling data samplaes can be iterated using a callback - -// Profiling info kind -typedef enum { - ROCPROFILER_INFO_KIND_METRIC = 0, // metric info - ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32 - ROCPROFILER_INFO_KIND_TRACE = 2, // trace info - ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32 - ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info - ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32 -} rocprofiler_info_kind_t; - -// Profiling info query -typedef union { - rocprofiler_info_kind_t info_kind; // queried profiling info kind - struct { - const char* trace_name; // queried info trace name - } trace_parameter; -} rocprofiler_info_query_t; - -// Profiling info data -typedef struct { - uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology) - rocprofiler_info_kind_t kind; // info data kind - union { - struct { - const char* name; // metric name - uint32_t instances; // instances number - const char* expr; // metric expression, NULL for basic counters - const char* description; // metric description - const char* block_name; // block name - uint32_t block_counters; // number of block counters - } metric; - struct { - const char* name; // trace name - const char* description; // trace description - uint32_t parameter_count; // supported by the trace number parameters - } trace; - struct { - uint32_t code; // parameter code - const char* trace_name; // trace name - const char* parameter_name; // parameter name - const char* description; // trace parameter description - } trace_parameter; - }; -} rocprofiler_info_data_t; - -// Return the info for a given info kind -hsa_status_t rocprofiler_get_info( - const hsa_agent_t* agent, // [in] GFXIP handle - rocprofiler_info_kind_t kind, // kind of iterated info - void *data); // [in/out] returned data - -// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration -hsa_status_t rocprofiler_iterate_info( - const hsa_agent_t* agent, // [in] GFXIP handle - rocprofiler_info_kind_t kind, // kind of iterated info - hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback - void *data); // [in/out] data passed to callback - -// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration -hsa_status_t rocprofiler_query_info( - const hsa_agent_t *agent, // [in] GFXIP handle - rocprofiler_info_query_t query, // iterated info query - hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback - void *data); // [in/out] data passed to callback - -// Create a profiled queue. All dispatches on this queue will be profiled -hsa_status_t rocprofiler_queue_create_profiled( - hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type, - void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), - void* data, uint32_t private_segment_size, uint32_t group_segment_size, - hsa_queue_t** queue); - -//////////////////////////////////////////////////////////////////////////////// -// Profiling pool -// -// Support for profiling contexts pool -// The API provide capability to create a contexts pool for a given agent and a set of features, -// to fetch/relase a context entry, to register a callback for the contexts completion. - -// Profiling pool handle -typedef void rocprofiler_pool_t; - -// Profiling pool entry -typedef struct { - rocprofiler_t* context; // context object - void* payload; // payload data object -} rocprofiler_pool_entry_t; - -// Profiling handler, calling on profiling completion -typedef bool (*rocprofiler_pool_handler_t)(const rocprofiler_pool_entry_t* entry, void* arg); - -// Profiling preperties -typedef struct { - uint32_t num_entries; // pool size entries - uint32_t payload_bytes; // payload size bytes - rocprofiler_pool_handler_t handler; // handler on context completion - void* handler_arg; // the handler arg -} rocprofiler_pool_properties_t; - -// Open profiling pool -hsa_status_t rocprofiler_pool_open( - hsa_agent_t agent, // GPU handle - rocprofiler_feature_t* features, // [in] profiling features array - uint32_t feature_count, // profiling info count - rocprofiler_pool_t** pool, // [out] context object - uint32_t mode, // profiling mode mask - rocprofiler_pool_properties_t*); // pool properties - -// Close profiling pool -hsa_status_t rocprofiler_pool_close( - rocprofiler_pool_t* pool); // profiling pool handle - -// Fetch profiling pool entry -hsa_status_t rocprofiler_pool_fetch( - rocprofiler_pool_t* pool, // profiling pool handle - rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry - -// Release profiling pool entry -hsa_status_t rocprofiler_pool_release( - rocprofiler_pool_entry_t* entry); // released profiling pool entry - -// Iterate fetched profiling pool entries -hsa_status_t rocprofiler_pool_iterate( - rocprofiler_pool_t* pool, // profiling pool handle - hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback - void *data); // [in/out] data passed to callback - -// Flush completed entries in profiling pool -hsa_status_t rocprofiler_pool_flush( - rocprofiler_pool_t* pool); // profiling pool handle - -//////////////////////////////////////////////////////////////////////////////// -// HSA intercepting API - -// HSA callbacks ID enumeration -typedef enum { - ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback - ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback - ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback - ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback - ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol - ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol -} rocprofiler_hsa_cb_id_t; - -// HSA callback data type -typedef struct { - union { - struct { - const void* ptr; // allocated area ptr - size_t size; // allocated area size, zero size means 'free' callback - hsa_amd_segment_t segment; // allocated area's memory segment type - hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag - int is_code; // equal to 1 if code is allocated - } allocate; - struct { - hsa_device_type_t type; // type of assigned device - uint32_t id; // id of assigned device - hsa_agent_t agent; // device HSA agent handle - const void* ptr; // ptr the device is assigned to - } device; - struct { - const void* dst; // memcopy dst ptr - const void* src; // memcopy src ptr - size_t size; // memcopy size bytes - } memcopy; - struct { - const void* packet; // submitted to GPU packet - const char* kernel_name; // kernel name, not NULL if dispatch - hsa_queue_t* queue; // HSA queue the kernel was submitted to - uint32_t device_type; // type of device the packed is submitted to - uint32_t device_id; // id of device the packed is submitted to - } submit; - struct { - uint64_t object; // kernel symbol object - const char* name; // kernel symbol name - uint32_t name_length; // kernel symbol name length - int unload; // symbol executable destroy - } ksymbol; - struct { - uint32_t storage_type; // code object storage type - int storage_file; // origin file descriptor - uint64_t memory_base; // origin memory base - uint64_t memory_size; // origin memory size - uint64_t load_base; // codeobj load base - uint64_t load_size; // codeobj load size - uint64_t load_delta; // codeobj load size - uint32_t uri_length; // URI string length - char* uri; // URI string - int unload; // unload flag - } codeobj; - }; -} rocprofiler_hsa_callback_data_t; - -// HSA callback function type -typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)( - rocprofiler_hsa_cb_id_t id, // callback id - const rocprofiler_hsa_callback_data_t* data, // [in] callback data - void* arg); // [in/out] user passed data - -// HSA callbacks structure -typedef struct { - rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback - rocprofiler_hsa_callback_fun_t device; // agent assign callback - rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback - rocprofiler_hsa_callback_fun_t submit; // packet submit callback - rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback - rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback -} rocprofiler_hsa_callbacks_t; - -// Set callbacks. If the callback is NULL then it is disabled. -// If callback returns a value that is not HSA_STATUS_SUCCESS the callback -// will be unregistered. -hsa_status_t rocprofiler_set_hsa_callbacks( - const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function - void* arg); // callback user data - #ifdef __cplusplus } // extern "C" block #endif // __cplusplus -#endif // INC_ROCPROFILER_H_ +#endif // INC_ROCPROFILER_H_ \ No newline at end of file diff --git a/inc/rocprofiler_plugin.h b/include/rocprofiler/v2/rocprofiler_plugin.h similarity index 100% rename from inc/rocprofiler_plugin.h rename to include/rocprofiler/v2/rocprofiler_plugin.h diff --git a/plugin/att/CMakeLists.txt b/plugin/att/CMakeLists.txt index 021f27df..5ae4c86c 100644 --- a/plugin/att/CMakeLists.txt +++ b/plugin/att/CMakeLists.txt @@ -39,13 +39,13 @@ target_compile_definitions(att_plugin PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1) target_include_directories( - att_plugin PRIVATE ${PROJECT_SOURCE_DIR}/inc ${PROJECT_SOURCE_DIR} + att_plugin PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}) target_link_options( att_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined) -target_link_libraries(att_plugin PRIVATE ${ROCPROFILER_TARGET} +target_link_libraries(att_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs) install(TARGETS att_plugin diff --git a/plugin/ctf/CMakeLists.txt b/plugin/ctf/CMakeLists.txt index 8b0d7e0f..10a214d3 100644 --- a/plugin/ctf/CMakeLists.txt +++ b/plugin/ctf/CMakeLists.txt @@ -38,7 +38,6 @@ target_compile_definitions(ctf_plugin PRIVATE __HIP_PLATFORM_HCC__=1 CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata") target_include_directories(ctf_plugin PRIVATE - "${PROJECT_SOURCE_DIR}/inc" "${PROJECT_SOURCE_DIR}" "${CMAKE_BINARY_DIR}/src/api" "${CMAKE_CURRENT_BINARY_DIR}") @@ -46,7 +45,7 @@ target_link_options(ctf_plugin PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap" -Wl,--no-undefined) target_link_libraries(ctf_plugin PRIVATE - ${ROCPROFILER_TARGET} + rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs dl) diff --git a/plugin/file/CMakeLists.txt b/plugin/file/CMakeLists.txt index 9939ede3..d9b162b8 100644 --- a/plugin/file/CMakeLists.txt +++ b/plugin/file/CMakeLists.txt @@ -33,11 +33,11 @@ set_target_properties(file_plugin PROPERTIES target_compile_definitions(file_plugin PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1) -target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR}/inc ${PROJECT_SOURCE_DIR}) +target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR}) target_link_options(file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined) -target_link_libraries(file_plugin PRIVATE ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl) +target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl) install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} diff --git a/plugin/perfetto/CMakeLists.txt b/plugin/perfetto/CMakeLists.txt index 4664205a..c942aa7a 100644 --- a/plugin/perfetto/CMakeLists.txt +++ b/plugin/perfetto/CMakeLists.txt @@ -14,13 +14,13 @@ target_compile_definitions(perfetto_plugin __HIP_PLATFORM_HCC__=1) target_include_directories(perfetto_plugin - PRIVATE ${PROJECT_SOURCE_DIR}/inc ${PROJECT_SOURCE_DIR} + PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk) target_link_options(perfetto_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined) -target_link_libraries(perfetto_plugin PRIVATE ${ROCPROFILER_TARGET} Threads::Threads stdc++fs amd_comgr) +target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs amd_comgr) install(TARGETS perfetto_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} diff --git a/rocprofiler-backward-compat.cmake b/rocprofiler-backward-compat.cmake index 658aab0d..f624efb9 100644 --- a/rocprofiler-backward-compat.cmake +++ b/rocprofiler-backward-compat.cmake @@ -73,7 +73,7 @@ endfunction() function(generate_wrapper_header) file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR}) #find all header files from inc - file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/inc/*.h) + file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h) #Convert the list of files into #includes foreach(header_file ${include_files}) #set include guard diff --git a/samples/CMakeLists.txt b/samples/CMakeLists.txt index 56945401..8a21d8bc 100644 --- a/samples/CMakeLists.txt +++ b/samples/CMakeLists.txt @@ -32,7 +32,7 @@ find_package(LibDw REQUIRED) ## Add a custom targets to build and run all the tests add_custom_target(samples) -add_dependencies(samples ${ROCPROFILER_TARGET}) +add_dependencies(samples rocprofiler-v2) add_custom_target(run-samples COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh DEPENDS samples) file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp) @@ -51,8 +51,8 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp) ## Build Application Replay Sample set_source_files_properties(profiler/application_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(profiler_application_replay profiler/application_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(profiler_application_replay PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(profiler_application_replay PRIVATE ${ROCPROFILER_TARGET} amd_comgr) +target_include_directories(profiler_application_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(profiler_application_replay PRIVATE rocprofiler-v2 amd_comgr) target_link_options(profiler_application_replay PRIVATE "-Wl,--build-id=md5") add_dependencies(samples profiler_application_replay) install(TARGETS profiler_application_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -60,8 +60,8 @@ install(TARGETS profiler_application_replay RUNTIME DESTINATION ${CMAKE_INSTALL_ ## Build Kernel Replay Sample set_source_files_properties(profiler/kernel_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(profiler_kernel_replay profiler/kernel_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(profiler_kernel_replay PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(profiler_kernel_replay PRIVATE ${ROCPROFILER_TARGET} amd_comgr) +target_include_directories(profiler_kernel_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(profiler_kernel_replay PRIVATE rocprofiler-v2 amd_comgr) target_link_options(profiler_kernel_replay PRIVATE "-Wl,--build-id=md5") add_dependencies(samples profiler_kernel_replay) install(TARGETS profiler_kernel_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -69,17 +69,17 @@ install(TARGETS profiler_kernel_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAR ## Build User Replay Sample set_source_files_properties(profiler/user_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(profiler_user_replay profiler/user_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(profiler_user_replay PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_include_directories(profiler_user_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) target_link_options(profiler_user_replay PRIVATE "-Wl,--build-id=md5") -target_link_libraries(profiler_user_replay PRIVATE ${ROCPROFILER_TARGET} amd_comgr) +target_link_libraries(profiler_user_replay PRIVATE rocprofiler-v2 amd_comgr) add_dependencies(samples profiler_user_replay) install(TARGETS profiler_user_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) ## Build Device Profiling Sample set_source_files_properties(profiler/device_profiling_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(profiler_device_profiling PRIVATE ${ROCPROFILER_TARGET} amd_comgr) +target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(profiler_device_profiling PRIVATE rocprofiler-v2 amd_comgr) target_link_options(profiler_device_profiling PRIVATE "-Wl,--build-id=md5") add_dependencies(samples profiler_device_profiling) install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -87,8 +87,8 @@ install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DA ## Build Counters Sampling example set_source_files_properties(counters_sampler/pcie_counters_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(pcie_counters_sampler PRIVATE ${ROCPROFILER_TARGET} systemd amd_comgr) +target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(pcie_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr) target_link_options(pcie_counters_sampler PRIVATE "-Wl,--build-id=md5") add_dependencies(samples pcie_counters_sampler) install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -96,8 +96,8 @@ install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATARO ## Build XGMI Counters Sampling example set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(xgmi_counters_sampler PRIVATE ${ROCPROFILER_TARGET} systemd amd_comgr) +target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(xgmi_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr) target_link_options(xgmi_counters_sampler PRIVATE "-Wl,--build-id=md5") add_dependencies(samples xgmi_counters_sampler) install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -111,8 +111,8 @@ install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATARO ## Build HIP/HSA Trace Sample set_source_files_properties(tracer/sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(tracer_hip_hsa tracer/sample.cpp ${ROCPROFILER_UTIL_SRC_FILES}) -target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${CMAKE_CURRENT_SOURCE_DIR}/common) -target_link_libraries(tracer_hip_hsa PRIVATE ${ROCPROFILER_TARGET} amd_comgr) +target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common) +target_link_libraries(tracer_hip_hsa PRIVATE rocprofiler-v2 amd_comgr) target_link_options(tracer_hip_hsa PRIVATE "-Wl,--build-id=md5") add_dependencies(samples tracer_hip_hsa) install(TARGETS tracer_hip_hsa RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples) @@ -141,7 +141,7 @@ check_c_source_compiles(" target_link_libraries(pc_sampling_code_printing PRIVATE - ${ROCPROFILER_TARGET} + rocprofiler-v2 rocm-dbgapi ${LIBELF_LIBRARIES} ${LIBDW_LIBRARIES} diff --git a/samples/common/common.h b/samples/common/common.h index 50fd4ed7..06955728 100644 --- a/samples/common/common.h +++ b/samples/common/common.h @@ -1,5 +1,5 @@ #include -#include +#include #include #include diff --git a/samples/pcsampler/code_printing_sample/disassembly.cpp b/samples/pcsampler/code_printing_sample/disassembly.cpp index 9760e904..d48b539d 100644 --- a/samples/pcsampler/code_printing_sample/disassembly.cpp +++ b/samples/pcsampler/code_printing_sample/disassembly.cpp @@ -41,7 +41,7 @@ #include #include -#include "rocprofiler.h" +#include #include "code_printing.hpp" #include "program.hpp" diff --git a/samples/pcsampler/code_printing_sample/main.cpp b/samples/pcsampler/code_printing_sample/main.cpp index b6abc91c..ed4374c5 100644 --- a/samples/pcsampler/code_printing_sample/main.cpp +++ b/samples/pcsampler/code_printing_sample/main.cpp @@ -33,7 +33,7 @@ #include #include -#include +#include #include "program.hpp" #include "program_options.hpp" diff --git a/samples/profiler/device_profiling_sample.cpp b/samples/profiler/device_profiling_sample.cpp index 7effab1a..307b303f 100644 --- a/samples/profiler/device_profiling_sample.cpp +++ b/samples/profiler/device_profiling_sample.cpp @@ -3,7 +3,7 @@ #include #include -#include "rocprofiler.h" +#include int main(int argc, char** argv) { int poll_duration = 5; diff --git a/src/api/CMakeLists.txt b/src/api/CMakeLists.txt index 566b9340..22adc12c 100644 --- a/src/api/CMakeLists.txt +++ b/src/api/CMakeLists.txt @@ -134,16 +134,18 @@ set(GENERATED_SOURCES find_path(PCIACCESS_INCLUDE_DIR pciaccess.h REQUIRED) find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED) -set(PUBLIC_HEADERS - rocprofiler_plugin.h - rocprofiler.h) +set(PUBLIC_HEADERS rocprofiler.h) foreach(header ${PUBLIC_HEADERS}) - install(FILES ${PROJECT_SOURCE_DIR}/inc/${header} + install(FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME} COMPONENT dev) endforeach() +install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2 + DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME} + COMPONENT dev) + # Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils file(GLOB ROCPROFILER_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp) @@ -187,8 +189,31 @@ file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp) set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler) file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp) + +#### V1 Library +# Compiling/Installing ROCProfiler API V1 +add_library(${ROCPROFILER_TARGET} SHARED ${OLD_LIB_SRC}) +set_target_properties(${ROCPROFILER_TARGET} PROPERTIES + CXX_VISIBILITY_PRESET hidden + VERSION 1.0.0 + SOVERSION 1) +# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined +target_compile_definitions(${ROCPROFILER_TARGET} PUBLIC AMD_INTERNAL_BUILD) +target_include_directories(${ROCPROFILER_TARGET} + PUBLIC + $ + PRIVATE + ${LIB_DIR} ${ROOT_DIR} + ${PROJECT_SOURCE_DIR}/include/rocprofiler) +target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 c stdc++) +## Install libraries: Non versioned lib file in dev package +# install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT dev NAMELINK_COMPONENT runtime) +install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime NAMELINK_SKIP) +# install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan) + +#### V2 Library # Compiling/Installing ROCProfiler API -add_library(${ROCPROFILER_TARGET} SHARED +add_library(rocprofiler-v2 SHARED ${ROCPROFILER_SRC_FILES} ${ROCPROFILER_CLASS_SRC_FILES} ${ROCPROFILER_PROFILER_SRC_FILES} @@ -211,47 +236,54 @@ add_library(${ROCPROFILER_TARGET} SHARED ${ROCPROFILER_ROCTRACER_SRC_FILES} ${GENERATED_SOURCES} ${CORE_COUNTERS_SRC_FILES} - ${CORE_PC_SAMPLING_FILES} - ${OLD_LIB_SRC}) - -set_target_properties(${ROCPROFILER_TARGET} PROPERTIES + ${CORE_PC_SAMPLING_FILES}) +set_target_properties(rocprofiler-v2 PROPERTIES CXX_VISIBILITY_PRESET hidden DEFINE_SYMBOL "ROCPROFILER_EXPORTS" LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap + OUTPUT_NAME rocprofiler64 + LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/v2 VERSION ${PROJECT_VERSION} SOVERSION ${PROJECT_VERSION_MAJOR}) - -target_compile_definitions(${ROCPROFILER_TARGET} - PUBLIC AMD_INTERNAL_BUILD - PRIVATE PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1) - -target_include_directories(${ROCPROFILER_TARGET} +target_compile_definitions(rocprofiler-v2 + # As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined + PRIVATE AMD_INTERNAL_BUILD + PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1) +target_include_directories(rocprofiler-v2 PUBLIC - ${ROCM_PATH}/include ${HIP_INCLUDE_DIRECTORIES} ${HSA_RUNTIME_INCLUDE_DIRECTORIES} $ - $ - $ + $ + $ PRIVATE ${LIB_DIR} ${ROOT_DIR} ${CMAKE_CURRENT_BINARY_DIR} ${PROJECT_SOURCE_DIR} - ${PROJECT_SOURCE_DIR}/tools - ${PROJECT_SOURCE_DIR}/inc) - + ${PROJECT_SOURCE_DIR}/tools) if(ASAN) - target_compile_options(${ROCPROFILER_TARGET} PRIVATE -fsanitize=address) - target_link_options(${ROCPROFILER_TARGET} PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address) - target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES}) + target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address) + target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address) + target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES}) else() - target_link_options(${ROCPROFILER_TARGET} PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined) - target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES}) + target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined) + target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES}) endif() - ## Install libraries: Non versioned lib file in dev package -install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT dev NAMELINK_ONLY ) -install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime NAMELINK_SKIP ) -install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan ) +# install(TARGETS rocprofiler-v2 LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT dev) +install(TARGETS rocprofiler-v2 LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime NAMELINK_SKIP) +# install(TARGETS rocprofiler-v2 LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan) + +file(CONFIGURE OUTPUT ${CMAKE_BINARY_DIR}/librocprofiler64.so + CONTENT "OUTPUT_FORMAT(elf64-x86-64)\nINPUT(librocprofiler64.so.1)") +install(FILES ${CMAKE_BINARY_DIR}/librocprofiler64.so DESTINATION ${CMAKE_INSTALL_LIBDIR} + PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE + COMPONENT runtime) + +file(CONFIGURE OUTPUT ${CMAKE_BINARY_DIR}/librocprofiler64v2.so + CONTENT "OUTPUT_FORMAT(elf64-x86-64)\nINPUT(librocprofiler64.so.${PROJECT_VERSION_MAJOR})") +install(FILES ${CMAKE_BINARY_DIR}/librocprofiler64v2.so DESTINATION ${CMAKE_INSTALL_LIBDIR} + PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE + COMPONENT runtime) configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml ${PROJECT_BINARY_DIR}/counters/basic_counters.xml COPYONLY) configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml ${PROJECT_BINARY_DIR}/counters/derived_counters.xml COPYONLY) diff --git a/src/api/exportmap b/src/api/exportmap index 505c87f2..ab110ed1 100644 --- a/src/api/exportmap +++ b/src/api/exportmap @@ -1,48 +1,9 @@ -ROCPROFILER_1.0 { +ROCPROFILER_9.0 { global: OnLoad; OnUnload; rocprofiler_version_major; rocprofiler_version_minor; - rocprofiler_error_string; - rocprofiler_open; - rocprofiler_add_feature; - rocprofiler_features_set_open; - rocprofiler_close; - rocprofiler_reset; - rocprofiler_get_agent; - rocprofiler_get_time; - rocprofiler_set_queue_callbacks; - rocprofiler_remove_queue_callbacks; - rocprofiler_start_queue_callbacks; - rocprofiler_stop_queue_callbacks; - rocprofiler_start; - rocprofiler_stop; - rocprofiler_read; - rocprofiler_get_data; - rocprofiler_group_count; - rocprofiler_get_group; - rocprofiler_group_start; - rocprofiler_group_stop; - rocprofiler_group_read; - rocprofiler_group_get_data; - rocprofiler_get_metrics; - rocprofiler_iterate_trace_data; - rocprofiler_get_info; - rocprofiler_iterate_info; - rocprofiler_query_info; - rocprofiler_queue_create_profiled; - rocprofiler_pool_open; - rocprofiler_pool_close; - rocprofiler_pool_fetch; - rocprofiler_pool_release; - rocprofiler_pool_iterate; - rocprofiler_pool_flush; - rocprofiler_set_hsa_callbacks; -local: *; -}; - -ROCPROFILER_2.0 { -global: HSA_AMD_TOOL_PRIORITY; + HSA_AMD_TOOL_PRIORITY; rocprofiler_error_str; rocprofiler_initialize; rocprofiler_finalize; @@ -85,4 +46,5 @@ global: HSA_AMD_TOOL_PRIORITY; rocprofiler_device_profiling_session_poll; rocprofiler_device_profiling_session_stop; rocprofiler_device_profiling_session_destroy; -} ROCPROFILER_1.0; \ No newline at end of file +local: *; +}; \ No newline at end of file diff --git a/src/api/rocmtool.cpp b/src/api/rocmtool.cpp index 3015be74..fb1a5405 100644 --- a/src/api/rocmtool.cpp +++ b/src/api/rocmtool.cpp @@ -199,7 +199,7 @@ size_t rocmtool::GetKernelInfoSize(rocprofiler_kernel_info_kind_t kind, rocprofiler_kernel_id_t kernel_id) { switch (kind) { case ROCPROFILER_KERNEL_NAME: - return GetKernelNameFromKsymbols(kernel_id.handle).size(); + return GetKernelNameUsingDispatchID(kernel_id.handle).size(); default: warning("The provided Kernel Kind is not yet supported!"); return 0; @@ -209,7 +209,7 @@ const char* rocmtool::GetKernelInfo(rocprofiler_kernel_info_kind_t kind, rocprofiler_kernel_id_t kernel_id) { switch (kind) { case ROCPROFILER_KERNEL_NAME: - return strdup(GetKernelNameFromKsymbols(kernel_id.handle).c_str()); + return strdup(GetKernelNameUsingDispatchID(kernel_id.handle).c_str()); default: warning("The provided Kernel Kind is not yet supported!"); return ""; diff --git a/src/api/rocmtools.cpp b/src/api/rocmtools.cpp index 014ad84d..7b4dfcf0 100644 --- a/src/api/rocmtools.cpp +++ b/src/api/rocmtools.cpp @@ -720,7 +720,7 @@ rocprofiler_device_profiling_session_destroy(rocprofiler_session_id_t session_id } -// static bool started{false}; +static bool started{false}; extern "C" { @@ -729,27 +729,28 @@ extern "C" { // The HSA_AMD_TOOL_PRIORITY variable must be a constant value type // initialized by the loader itself, not by code during _init. 'extern const' // seems do that although that is not a guarantee. -// ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25; +ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25; /** * @brief Callback function called upon loading the HSA. * The function updates the core api table function pointers to point to the * interceptor functions in this file. */ -// ROCPROFILER_EXPORT bool OnLoad(HsaApiTable* table, uint64_t runtime_version, -// uint64_t failed_tool_count, const char* const* failed_tool_names) { -// if (started) rocmtools::fatal("HSA Tool started already!"); -// started = true; -// rocmtools::hsa_support::Initialize(table); -// return true; -// } +ROCPROFILER_EXPORT bool OnLoad(HsaApiTable* table, uint64_t runtime_version, + uint64_t failed_tool_count, const char* const* failed_tool_names) { + if (started) rocmtools::fatal("HSA Tool started already!"); + started = true; + rocmtools::hsa_support::Initialize(table); + return true; +} /** * @brief Callback function upon unloading the HSA. */ -// ROCPROFILER_EXPORT void OnUnload() { -// if (!started) rocmtools::fatal("HSA Tool hasn't started yet!"); -// rocmtools::hsa_support::Finalize(); -// } +ROCPROFILER_EXPORT void OnUnload() { + if (!started) rocmtools::fatal("HSA Tool hasn't started yet!"); + rocmtools::hsa_support::Finalize(); + started=false; +} } // extern "C" diff --git a/src/core/activity.cpp b/src/core/activity.cpp index a4cdb90a..30862522 100644 --- a/src/core/activity.cpp +++ b/src/core/activity.cpp @@ -20,7 +20,6 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. *******************************************************************************/ -#define ROCP_INTERNAL_BUILD #include "activity.h" #include diff --git a/src/core/activity.h b/src/core/activity.h index a7d77380..f82f4773 100644 --- a/src/core/activity.h +++ b/src/core/activity.h @@ -1,13 +1,7 @@ #ifndef _SRC_CORE_ACTIVITY_H #define _SRC_CORE_ACTIVITY_H -#define ROCPROFILER_V1 - -#ifdef ROCP_INTERNAL_BUILD -#include "inc/rocprofiler.h" -#else -#include -#endif +#include "rocprofiler.h" #include diff --git a/src/core/context.h b/src/core/context.h index b66b1584..4b999b84 100644 --- a/src/core/context.h +++ b/src/core/context.h @@ -23,7 +23,7 @@ THE SOFTWARE. #ifndef SRC_CORE_CONTEXT_H_ #define SRC_CORE_CONTEXT_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include diff --git a/src/core/context_pool.h b/src/core/context_pool.h index c5c45cc5..68eafb39 100644 --- a/src/core/context_pool.h +++ b/src/core/context_pool.h @@ -23,7 +23,7 @@ THE SOFTWARE. #ifndef SRC_CORE_CONTEXT_POOL_H_ #define SRC_CORE_CONTEXT_POOL_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include diff --git a/src/core/counters/mmio/perfmon.h b/src/core/counters/mmio/perfmon.h index 9c90c234..f7ad8c8d 100644 --- a/src/core/counters/mmio/perfmon.h +++ b/src/core/counters/mmio/perfmon.h @@ -21,7 +21,7 @@ #ifndef SRC_CORE_COUNTERS_PERFMON_H #define SRC_CORE_COUNTERS_PERFMON_H -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "mmio.h" #include diff --git a/src/core/hsa/hsa_support.cpp b/src/core/hsa/hsa_support.cpp index b2448cc4..c3c22abf 100644 --- a/src/core/hsa/hsa_support.cpp +++ b/src/core/hsa/hsa_support.cpp @@ -52,6 +52,7 @@ namespace { hsa_status_t hsa_executable_iteration_callback(hsa_executable_t executable, hsa_agent_t agent, hsa_executable_symbol_t symbol, void* args) { + hsa_symbol_kind_t type; rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( symbol, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &type); @@ -62,14 +63,21 @@ hsa_status_t hsa_executable_iteration_callback(hsa_executable_t executable, hsa_ // TODO(aelwazir): to be removed if the HSA fixed the issue of corrupted // names overflowing the length given if (name_length > 1) { - char name[name_length + 1]; - uint64_t kernel_object; - rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( - symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, name); - rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( - symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel_object); - std::string kernel_name = std::string(name).substr(0, name_length); - rocmtools::AddKernelName(kernel_object, kernel_name); + if(!(*static_cast(args))) { + char name[name_length + 1]; + uint64_t kernel_object; + rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, name); + rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel_object); + std::string kernel_name = std::string(name).substr(0, name_length); + rocmtools::AddKernelName(kernel_object, kernel_name); + } else { + uint64_t kernel_object; + rocmtools::hsa_support::GetCoreApiTable().hsa_executable_symbol_get_info_fn( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel_object); + rocmtools::RemoveKernelName(kernel_object); + } } } @@ -447,7 +455,7 @@ hsa_status_t CodeObjectCallback(hsa_executable_t executable, ReportActivity(ACTIVITY_DOMAIN_HSA_EVT, HSA_EVT_ID_CODEOBJ, &data); hsa_executable_iterate_agent_symbols(executable, data.codeobj.agent, - hsa_executable_iteration_callback, nullptr); + hsa_executable_iteration_callback, &(data.codeobj.unload)); return HSA_STATUS_SUCCESS; } diff --git a/src/core/hsa/packets/packets_generator.h b/src/core/hsa/packets/packets_generator.h index 05c92dcd..29a562b2 100644 --- a/src/core/hsa/packets/packets_generator.h +++ b/src/core/hsa/packets/packets_generator.h @@ -20,7 +20,7 @@ #ifndef SRC_CORE_HSA_PACKETS_PACKETS_GENERATOR_H_ #define SRC_CORE_HSA_PACKETS_PACKETS_GENERATOR_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include diff --git a/src/core/hsa/queues/queue.cpp b/src/core/hsa/queues/queue.cpp index 26213bbf..f9fc853b 100644 --- a/src/core/hsa/queues/queue.cpp +++ b/src/core/hsa/queues/queue.cpp @@ -27,7 +27,6 @@ #include #include -#include "rocprofiler.h" #include "src/api/rocmtool.h" #include "src/core/hsa/packets/packets_generator.h" #include "src/core/hsa/hsa_support.h" @@ -63,15 +62,47 @@ void AddKernelName(uint64_t handle, std::string name) { std::lock_guard lock(ksymbol_map_lock); ksymbols->emplace(handle, name); } +void RemoveKernelName(uint64_t handle) { + std::lock_guard lock(ksymbol_map_lock); + ksymbols->erase(handle); +} std::string GetKernelNameFromKsymbols(uint64_t handle) { std::lock_guard lock(ksymbol_map_lock); return ksymbols->at(handle); } + +static std::mutex kernel_names_map_lock; +static std::map>* kernel_names; +static std::atomic kernel_names_flag{true}; +void AddKernelNameWithDispatchID(std::string name, uint64_t id) { + std::lock_guard lock(kernel_names_map_lock); + if(kernel_names->find(name) == kernel_names->end()) + kernel_names->emplace(name, std::vector()); + kernel_names->at(name).push_back(id); +} +std::string GetKernelNameUsingDispatchID(uint64_t given_id) { + std::lock_guard lock(kernel_names_map_lock); + for(auto kernel_name : (*kernel_names)) { + for(auto dispatch_id : kernel_name.second) { + if(dispatch_id == given_id) + return kernel_name.first; + } + } + return "Unknown Kernel!"; +} + void InitKsymbols() { if (ksymbols_flag.load(std::memory_order_relaxed)) { - std::lock_guard lock(ksymbol_map_lock); - ksymbols = new std::map(); - ksymbols_flag.exchange(false, std::memory_order_release); + { + std::lock_guard lock(ksymbol_map_lock); + ksymbols = new std::map(); + ksymbols_flag.exchange(false, std::memory_order_release); + } + { + std::lock_guard lock(kernel_names_map_lock); + kernel_names = new std::map>(); + kernel_names_flag.exchange(false, std::memory_order_release); + } } } void FinitKsymbols() { @@ -81,8 +112,16 @@ void FinitKsymbols() { delete ksymbols; ksymbols_flag.exchange(true, std::memory_order_release); } + if (!kernel_names_flag.load(std::memory_order_relaxed)) { + std::lock_guard lock(kernel_names_map_lock); + kernel_names->clear(); + delete kernel_names; + kernel_names_flag.exchange(true, std::memory_order_release); + } } + + struct kernel_descriptor_t { uint8_t reserved0[16]; int64_t kernel_code_entry_byte_offset; @@ -413,7 +452,6 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) { hsa_support::GetAmdExtTable().hsa_amd_profiling_get_dispatch_time_fn( queue_info_session->agent, pending.signal, &time); rocprofiler_record_profiler_t record{}; - record.kernel_id = rocprofiler_kernel_id_t{pending.kernel_descriptor}; record.gpu_id = rocprofiler_agent_id_t{ (uint64_t)hsa_support::GetAgentInfo(queue_info_session->agent.handle).getIndex()}; record.kernel_properties = pending.kernel_properties; @@ -426,7 +464,8 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) { AddRecordCounters(&record, pending); } record.header = {ROCPROFILER_PROFILER_RECORD, - rocprofiler_record_id_t{GetROCMToolObj()->GetUniqueRecordId()}}; + rocprofiler_record_id_t{pending.kernel_descriptor}}; + record.kernel_id = rocprofiler_kernel_id_t{pending.kernel_descriptor}; if (pending.session_id.handle == 0) { pending.session_id = GetROCMToolObj()->GetCurrentSessionId(); @@ -506,7 +545,7 @@ bool AsyncSignalHandlerATT(hsa_signal_value_t /* signal */, void* data) { AddAttRecord(&record, queue_info_session->agent, pending); } record.header = {ROCPROFILER_ATT_TRACER_RECORD, - rocprofiler_record_id_t{GetROCMToolObj()->GetUniqueRecordId()}}; + rocprofiler_record_id_t{pending.kernel_descriptor}}; if (pending.session_id.handle == 0) { pending.session_id = GetROCMToolObj()->GetCurrentSessionId(); @@ -737,14 +776,16 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt rocprofiler_kernel_properties_t kernel_properties = set_kernel_properties(dispatch_packet, queue_info.GetGPUAgent()); if (session) { + uint64_t record_id = GetROCMToolObj()->GetUniqueRecordId(); + AddKernelNameWithDispatchID(GetKernelNameFromKsymbols(dispatch_packet.kernel_object), record_id); if (profiles && replay_mode_count > 0) { session->GetProfiler()->AddPendingSignals( - writer_id, dispatch_packet.kernel_object, dispatch_packet.completion_signal, + writer_id, record_id, dispatch_packet.completion_signal, session_id, buffer_id, profile.first, profile.first->metrics_list.size(), profile.second, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index); } else { session->GetProfiler()->AddPendingSignals( - writer_id, dispatch_packet.kernel_object, dispatch_packet.completion_signal, + writer_id, record_id, dispatch_packet.completion_signal, session_id, buffer_id, nullptr, 0, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index); } @@ -926,13 +967,15 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt // list to be processed by the signal interrupt rocprofiler_kernel_properties_t kernel_properties = set_kernel_properties(dispatch_packet, queue_info.GetGPUAgent()); + uint64_t record_id = GetROCMToolObj()->GetUniqueRecordId(); + AddKernelNameWithDispatchID(GetKernelNameFromKsymbols(dispatch_packet.kernel_object), record_id); if (session && profile) { session->GetAttTracer()->AddPendingSignals( - writer_id, dispatch_packet.kernel_object, dispatch_packet.completion_signal, session_id, + writer_id, record_id, dispatch_packet.completion_signal, session_id, buffer_id, profile, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index); } else { session->GetAttTracer()->AddPendingSignals( - writer_id, dispatch_packet.kernel_object, dispatch_packet.completion_signal, session_id, + writer_id, record_id, dispatch_packet.completion_signal, session_id, buffer_id, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index); } diff --git a/src/core/hsa/queues/queue.h b/src/core/hsa/queues/queue.h index 1999b1ce..1b1ee121 100644 --- a/src/core/hsa/queues/queue.h +++ b/src/core/hsa/queues/queue.h @@ -21,7 +21,7 @@ #ifndef SRC_CORE_HSA_QUEUES_QUEUE_H_ #define SRC_CORE_HSA_QUEUES_QUEUE_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include @@ -42,6 +42,9 @@ namespace rocmtools { void InitKsymbols(); void FinitKsymbols(); void AddKernelName(uint64_t handle, std::string kernel_name); +void RemoveKernelName(uint64_t handle); +void AddKernelNameWithDispatchID(std::string name, uint64_t id); +std::string GetKernelNameUsingDispatchID(uint64_t given_id); std::string GetKernelNameFromKsymbols(uint64_t handle); uint32_t GetCurrentActiveInterruptSignalsCount(); diff --git a/src/core/hsa_interceptor.h b/src/core/hsa_interceptor.h index abad9b94..151fec64 100644 --- a/src/core/hsa_interceptor.h +++ b/src/core/hsa_interceptor.h @@ -33,7 +33,7 @@ SOFTWARE. #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "util/exception.h" #include "util/hsa_rsrc_factory.h" diff --git a/src/core/intercept_queue.h b/src/core/intercept_queue.h index 103d5c39..f97c8104 100644 --- a/src/core/intercept_queue.h +++ b/src/core/intercept_queue.h @@ -36,7 +36,7 @@ THE SOFTWARE. #include "core/proxy_queue.h" #include "core/tracker.h" #include "core/types.h" -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "util/hsa_rsrc_factory.h" namespace rocprofiler { diff --git a/src/core/memory/generic_buffer.h b/src/core/memory/generic_buffer.h index 65c0f827..c8dedb0e 100644 --- a/src/core/memory/generic_buffer.h +++ b/src/core/memory/generic_buffer.h @@ -20,7 +20,7 @@ #ifndef SRC_CORE_MEMORY_GENERIC_BUFFER_H_ #define SRC_CORE_MEMORY_GENERIC_BUFFER_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include diff --git a/src/core/profile.h b/src/core/profile.h index 904e3095..90d29c53 100644 --- a/src/core/profile.h +++ b/src/core/profile.h @@ -23,7 +23,7 @@ THE SOFTWARE. #ifndef SRC_CORE_PROFILE_H_ #define SRC_CORE_PROFILE_H_ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include diff --git a/src/core/rocprofiler.cpp b/src/core/rocprofiler.cpp index 59082129..02581d18 100644 --- a/src/core/rocprofiler.cpp +++ b/src/core/rocprofiler.cpp @@ -20,7 +20,7 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. *******************************************************************************/ -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include #include @@ -398,11 +398,6 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25; PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t failed_tool_count, const char* const* failed_tool_names) { ONLOAD_TRACE_BEG(); - if (started) rocmtools::fatal("HSA Tool started already!"); - started = true; - if (!getenv("ROCP_TOOL_LIB") && !getenv("ROCP_HSA_INTERCEPT")) { - rocmtools::hsa_support::Initialize(table); - } else { rocprofiler::SaveHsaApi(table); rocprofiler::ProxyQueue::InitFactory(); @@ -469,20 +464,14 @@ PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t fa ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")" << " intercept_mode_mask(" << std::hex << intercept_mode_mask << ")" << std::dec); - } return true; } // HSA-runtime tool on-unload method PUBLIC_API void OnUnload() { ONLOAD_TRACE_BEG(); - if (!started) rocmtools::fatal("HSA Tool hasn't started yet!"); - if (!getenv("ROCP_TOOL_LIB") && !getenv("ROCP_HSA_INTERCEPT")) { - rocmtools::hsa_support::Finalize(); - } else { rocprofiler::UnloadTool(); rocprofiler::RestoreHsaApi(); - } ONLOAD_TRACE_END(); } @@ -755,7 +744,7 @@ PUBLIC_API hsa_status_t rocprofiler_iterate_info( rocprofiler::util::HsaRsrcFactory* hsa_rsrc = &rocprofiler::util::HsaRsrcFactory::Instance(); rocprofiler_info_data_t info{}; info.kind = kind; - uint32_t agent_idx = 0; + uint32_t agent_idx = hsa_rsrc->GetCountOfCpuAgents(); uint32_t agent_max = 0; const rocprofiler::util::AgentInfo* agent_info = NULL; diff --git a/src/core/session/att/att.h b/src/core/session/att/att.h index 85360db7..2db25532 100644 --- a/src/core/session/att/att.h +++ b/src/core/session/att/att.h @@ -28,7 +28,7 @@ #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" namespace rocmtools { diff --git a/src/core/session/device_profiling.h b/src/core/session/device_profiling.h index 658c4538..056631b9 100644 --- a/src/core/session/device_profiling.h +++ b/src/core/session/device_profiling.h @@ -21,7 +21,7 @@ #ifndef SRC_CORE_SESSION_DEVICE_PROFILING_H_ #define SRC_CORE_SESSION_DEVICE_PROFILING_H_ -#include +#include "rocprofiler.h" #include "src/core/hsa/packets/packets_generator.h" #include // #include "src/core/counters/rdc/rdc_metrics.h" diff --git a/src/core/session/filter.h b/src/core/session/filter.h index 31a4be42..a0bcb23b 100644 --- a/src/core/session/filter.h +++ b/src/core/session/filter.h @@ -25,7 +25,7 @@ #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #define ASSERTM(exp, msg) assert(((void)msg, exp)) diff --git a/src/core/session/profiler/profiler.h b/src/core/session/profiler/profiler.h index cbac757c..1fc1ee69 100644 --- a/src/core/session/profiler/profiler.h +++ b/src/core/session/profiler/profiler.h @@ -31,7 +31,7 @@ #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "src/core/counters/basic/basic_counter.h" #include "src/core/counters/metrics/eval_metrics.h" diff --git a/src/core/session/session.h b/src/core/session/session.h index dcba49b3..a0da53ce 100644 --- a/src/core/session/session.h +++ b/src/core/session/session.h @@ -32,7 +32,7 @@ #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "src/core/memory/generic_buffer.h" #include "src/core/session/filter.h" #include "profiler/profiler.h" diff --git a/src/core/session/spm/spm.h b/src/core/session/spm/spm.h index d25f7217..e1cb9365 100644 --- a/src/core/session/spm/spm.h +++ b/src/core/session/spm/spm.h @@ -10,7 +10,7 @@ #include "hsa/hsa_ext_amd.h" #include "src/core/hsa/packets/packets_generator.h" #include "src/utils/exception.h" -#include "inc/rocprofiler.h" +#include "rocprofiler.h" namespace rocmtools { diff --git a/src/core/session/tracer/src/roctracer.h b/src/core/session/tracer/src/roctracer.h index 8bf6b893..90a9585e 100644 --- a/src/core/session/tracer/src/roctracer.h +++ b/src/core/session/tracer/src/roctracer.h @@ -33,7 +33,7 @@ #include "hip_ostream_ops.h" #include "hsa_ostream_ops.h" #include "hsa_prof_str.h" -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "src/core/memory/generic_buffer.h" typedef struct { diff --git a/src/core/session/tracer/tracer.h b/src/core/session/tracer/tracer.h index c698959e..42c0cfe2 100644 --- a/src/core/session/tracer/tracer.h +++ b/src/core/session/tracer/tracer.h @@ -27,7 +27,7 @@ #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "src/roctracer.h" typedef bool is_filtered_domain_t; diff --git a/src/core/tracker.h b/src/core/tracker.h index 8d9c29f9..7c1fdd35 100644 --- a/src/core/tracker.h +++ b/src/core/tracker.h @@ -33,7 +33,7 @@ THE SOFTWARE. #include #include "util/hsa_rsrc_factory.h" -#include "inc/rocprofiler.h" +#include "rocprofiler.h" #include "util/exception.h" #include "util/logger.h" diff --git a/src/pcsampler/gfxip/gfxip.cpp b/src/pcsampler/gfxip/gfxip.cpp index 6fe291ed..e43f78c1 100644 --- a/src/pcsampler/gfxip/gfxip.cpp +++ b/src/pcsampler/gfxip/gfxip.cpp @@ -29,6 +29,8 @@ #include +#include "rocprofiler.h" + #include "gfxip.h" #include "src/utils/helper.h" diff --git a/src/tools/CMakeLists.txt b/src/tools/CMakeLists.txt index 4d1a7236..df9aa71c 100644 --- a/src/tools/CMakeLists.txt +++ b/src/tools/CMakeLists.txt @@ -18,7 +18,6 @@ set_target_properties(rocprofiler_tool PROPERTIES target_include_directories(rocprofiler_tool PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR} - ${PROJECT_SOURCE_DIR}/inc ${PROJECT_SOURCE_DIR}/src) target_compile_definitions(rocprofiler_tool @@ -26,10 +25,10 @@ target_compile_definitions(rocprofiler_tool if(ASAN) target_compile_options(rocprofiler_tool PRIVATE -fsanitize=address) - target_link_libraries(rocprofiler_tool ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 Threads::Threads atomic asan dl rt stdc++fs amd_comgr) + target_link_libraries(rocprofiler_tool rocprofiler-v2 hsa-runtime64::hsa-runtime64 Threads::Threads atomic asan dl rt stdc++fs amd_comgr) target_link_options(rocprofiler_tool PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address) else() - target_link_libraries(rocprofiler_tool ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 Threads::Threads atomic dl rt stdc++fs amd_comgr) + target_link_libraries(rocprofiler_tool rocprofiler-v2 hsa-runtime64::hsa-runtime64 Threads::Threads atomic dl rt stdc++fs amd_comgr) target_link_options(rocprofiler_tool PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined) endif() @@ -44,9 +43,8 @@ add_subdirectory(amdsys) add_subdirectory(rocprofv2) add_executable(ctrl ctrl.cpp) -target_include_directories(ctrl PRIVATE ${PROJECT_SOURCE_DIR}/inc) target_link_options(rocprofiler_tool PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined) -target_link_libraries(ctrl PRIVATE ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64) +target_link_libraries(ctrl PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64) install(TARGETS ctrl RUNTIME DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler COMPONENT runtime) diff --git a/src/tools/ctrl.cpp b/src/tools/ctrl.cpp index 3b58e206..a9c13302 100644 --- a/src/tools/ctrl.cpp +++ b/src/tools/ctrl.cpp @@ -1,5 +1,5 @@ #include -#include +#include "rocprofiler.h" #include diff --git a/src/tools/tool.cpp b/src/tools/tool.cpp index 7b2f857f..6d2f5bed 100644 --- a/src/tools/tool.cpp +++ b/src/tools/tool.cpp @@ -18,15 +18,12 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ -#define ROCPROFILER_V2 - #include #include #include #include -#include -#include -#include +#include "rocprofiler.h" +#include "rocprofiler_plugin.h" #include #include #include @@ -54,7 +51,6 @@ #include #include -#include "rocprofiler.h" #include "utils/helper.h" namespace fs = std::experimental::filesystem; @@ -409,7 +405,7 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 1025; The function updates the core api table function pointers to point to the interceptor functions in this file. */ -ROCPROFILER_EXPORT bool OnLoad(HsaApiTable* table, uint64_t runtime_version, +ROCPROFILER_EXPORT bool OnLoad(void* table, uint64_t runtime_version, uint64_t failed_tool_count, const char* const* failed_tool_names) { if (rocprofiler_version_major() != ROCPROFILER_VERSION_MAJOR || rocprofiler_version_minor() < ROCPROFILER_VERSION_MINOR) { diff --git a/src/utils/exception.h b/src/utils/exception.h index 14e105f9..df739eb8 100644 --- a/src/utils/exception.h +++ b/src/utils/exception.h @@ -26,7 +26,7 @@ #include #include "helper.h" -#include "inc/rocprofiler.h" +#include "rocprofiler.h" // TODO(aelwazir): namespace rocmtool namespace rocmtools { diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 625dcd7c..fd99d3fb 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -47,7 +47,7 @@ include_directories(${HSA_RUNTIME_INC_PATH}) ## C test add_executable ( "c_test" ${TEST_DIR}/app/c_test.c ) -target_include_directories ( "c_test" PRIVATE ${ROOT_DIR} $ ) +target_include_directories ( "c_test" PRIVATE ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include $ ) ## Util sources file( GLOB UTIL_SRC "${TEST_DIR}/util/*.cpp" ) @@ -93,22 +93,22 @@ add_custom_target( mytest ## Building standalone test executable add_executable ( ${ST_EXE_NAME} ${ST_TST_SRC} ${UTIL_SRC} ${KERN_SRC} ) -target_include_directories ( ${ST_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ) +target_include_directories ( ${ST_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include ) target_link_libraries ( ${ST_EXE_NAME} ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 hsakmt::hsakmt Threads::Threads dl ) ## Building standalone intercept test executable add_executable ( ${STIN_EXE_NAME} ${STIN_TST_SRC} ${UTIL_SRC} ${KERN_SRC} ) -target_include_directories ( ${STIN_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ) +target_include_directories ( ${STIN_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include ) target_link_libraries ( ${STIN_EXE_NAME} ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 hsakmt::hsakmt Threads::Threads dl ) ## Building intercept test executable add_library ( ${IN_EXE_NAME} SHARED ${IN_TST_SRC} ${UTIL_SRC} ${KERN_SRC} ) -target_include_directories ( ${IN_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ) +target_include_directories ( ${IN_EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include ) target_link_libraries ( ${IN_EXE_NAME} ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 hsakmt::hsakmt Threads::Threads dl ) ## Building ctrl test executable add_executable ( ${EXE_NAME} ${CTRL_SRC} ${UTIL_SRC} ${KERN_SRC} ) -target_include_directories ( ${EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ) +target_include_directories ( ${EXE_NAME} PRIVATE ${TEST_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include ) target_link_libraries ( ${EXE_NAME} hsa-runtime64::hsa-runtime64 hsakmt::hsakmt Threads::Threads dl ) execute_process ( COMMAND sh -xc "cp ${TEST_DIR}/run.sh ${PROJECT_BINARY_DIR}" ) execute_process ( COMMAND sh -xc "cp ${TEST_DIR}/tool/*.xml ${PROJECT_BINARY_DIR}" ) @@ -118,7 +118,7 @@ execute_process ( COMMAND sh -xc "mkdir -p ${PROJECT_BINARY_DIR}/RESULTS" ) set ( TEST_LIB "rocprof-tool" ) set ( TEST_LIB_SRC ${TEST_DIR}/tool/tool.cpp ${UTIL_SRC} ) add_library ( ${TEST_LIB} SHARED ${TEST_LIB_SRC} ) -target_include_directories ( ${TEST_LIB} PRIVATE ${TEST_DIR} ${ROOT_DIR} ) +target_include_directories ( ${TEST_LIB} PRIVATE ${TEST_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include ) target_link_libraries ( ${TEST_LIB} ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 Threads::Threads dl ) ## Build memory test bench diff --git a/test/app/c_test.c b/test/app/c_test.c index 3d0b5d53..ec0ba39f 100644 --- a/test/app/c_test.c +++ b/test/app/c_test.c @@ -19,7 +19,6 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. *******************************************************************************/ -#define ROCPROFILER_V1 -#include "inc/rocprofiler.h" +#include "rocprofiler/rocprofiler.h" const int ret = 0; int main() { return ret; } diff --git a/test/app/intercept_test.cpp b/test/app/intercept_test.cpp index 9226b4af..5ad96499 100644 --- a/test/app/intercept_test.cpp +++ b/test/app/intercept_test.cpp @@ -30,7 +30,7 @@ THE SOFTWARE. #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler/rocprofiler.h" #include "util/hsa_rsrc_factory.h" #define PUBLIC_API __attribute__((visibility("default"))) diff --git a/test/app/stand_intercept_test.cpp b/test/app/stand_intercept_test.cpp index 32d7e6ed..0a4b92f4 100644 --- a/test/app/stand_intercept_test.cpp +++ b/test/app/stand_intercept_test.cpp @@ -30,7 +30,7 @@ THE SOFTWARE. #include "ctrl/run_kernel.h" #include "ctrl/test_aql.h" #include "ctrl/test_hsa.h" -#include "inc/rocprofiler.h" +#include "rocprofiler/rocprofiler.h" #include "dummy_kernel/dummy_kernel.h" #include "simple_convolution/simple_convolution.h" #include "util/test_assert.h" diff --git a/test/app/standalone_test.cpp b/test/app/standalone_test.cpp index 7986c2a3..707430e2 100644 --- a/test/app/standalone_test.cpp +++ b/test/app/standalone_test.cpp @@ -28,7 +28,7 @@ THE SOFTWARE. #include "ctrl/run_kernel.h" #include "ctrl/test_aql.h" #include "ctrl/test_hsa.h" -#include "inc/rocprofiler.h" +#include "rocprofiler/rocprofiler.h" #include "dummy_kernel/dummy_kernel.h" #include "simple_convolution/simple_convolution.h" #include "util/hsa_rsrc_factory.h" diff --git a/test/tool/tool.cpp b/test/tool/tool.cpp index 5bd7412d..577f9177 100644 --- a/test/tool/tool.cpp +++ b/test/tool/tool.cpp @@ -51,7 +51,7 @@ THE SOFTWARE. #include #include -#include "inc/rocprofiler.h" +#include "rocprofiler/rocprofiler.h" #include "util/hsa_rsrc_factory.h" #include "util/xml.h" @@ -1170,6 +1170,12 @@ extern "C" PUBLIC_API void OnLoadToolProp(rocprofiler_settings_t* settings) gpu_index_vec = new std::vector; get_xml_array(xml, "top.metric", "gpu_index", ",", gpu_index_vec, " "); + // Skipping cpu count to get to correct gpu index + const uint32_t cpu_count = HsaRsrcFactory::Instance().GetCountOfCpuAgents(); + std::transform(gpu_index_vec->begin(), gpu_index_vec->end(), + gpu_index_vec->begin(), + [&](int count) { return count + cpu_count; }); + // Skipping cpu count to get to correct gpu index const uint32_t cpu_count = HsaRsrcFactory::Instance().GetCountOfCpuAgents(); std::transform(gpu_index_vec->begin(), gpu_index_vec->end(), diff --git a/test/util/hsa_rsrc_factory.h b/test/util/hsa_rsrc_factory.h index cc5db829..1c557c5f 100644 --- a/test/util/hsa_rsrc_factory.h +++ b/test/util/hsa_rsrc_factory.h @@ -25,8 +25,6 @@ POSSIBILITY OF SUCH DAMAGE. #ifndef TEST_UTIL_HSA_RSRC_FACTORY_H_ #define TEST_UTIL_HSA_RSRC_FACTORY_H_ -// #define AMD_INTERNAL_BUILD - #include #include #include diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 045fbcc4..b1ed6983 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -6,4 +6,5 @@ add_custom_target(check COMMAND ${PROJECT_BINARY_DIR}/run_tests.sh DEPENDS tests add_subdirectory(unittests) add_subdirectory(featuretests) add_subdirectory(memorytests) +add_subdirectory(microbenchmarks) configure_file(run_tests.sh ${PROJECT_BINARY_DIR} COPYONLY) \ No newline at end of file diff --git a/tests/featuretests/gtests_main.cpp b/tests/featuretests/gtests_main.cpp index 646fb114..d36e22a9 100644 --- a/tests/featuretests/gtests_main.cpp +++ b/tests/featuretests/gtests_main.cpp @@ -8,7 +8,8 @@ int main(int argc, char** argv) { testing::InitGoogleTest(&argc, argv); testing::FLAGS_gtest_death_test_style = "threadsafe"; // Add line below to disable any problematic test - testing::GTEST_FLAG(filter) = "-OpenMPTest.*:ProfilerSPMTest*"; + testing::GTEST_FLAG(filter) = + "-OpenMPTest.*:ProfilerSPMTest*:ProfilerMQTest*:ProfilerMPTest*:MPITest*"; // Disable ATT test fir gfx10 GPUs until its supported hsa_init(); // iterate for gpu's @@ -18,7 +19,9 @@ int main(int argc, char** argv) { hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, gpu_name); std::string gfx_name = gpu_name; if (gfx_name.find("gfx10") != std::string::npos) { - testing::GTEST_FLAG(filter) = "-ATTCollection.*:OpenMPTest.*:-ProfilerSPMTest*"; + testing::GTEST_FLAG(filter) = + "-ATTCollection.*:OpenMPTest.*:-ProfilerSPMTest*:ProfilerMQTest:*ProfilerMPTest*:" + "MPITest*"; } return HSA_STATUS_SUCCESS; }, diff --git a/tests/featuretests/profiler/CMakeLists.txt b/tests/featuretests/profiler/CMakeLists.txt index c6e7c0ec..6dd1d5e2 100644 --- a/tests/featuretests/profiler/CMakeLists.txt +++ b/tests/featuretests/profiler/CMakeLists.txt @@ -33,10 +33,9 @@ find_package(HIP REQUIRED MODULE) find_program(CLANG_TIDY_EXE NAMES "clang-tidy") if (CLANG_TIDY_EXE) set(CMAKE_CXX_CLANG_TIDY - clang-tidy; + ${CLANG_TIDY_EXE}; -format-style='file'; - -header-filter=${CMAKE_CURRENT_SOURCE_DIR}; - ) + -header-filter=${CMAKE_CURRENT_SOURCE_DIR};) endif() # ############################################################################################################################################ # App Based FeatureTests @@ -53,12 +52,15 @@ endforeach() set_source_files_properties(apps/hello_world_hip.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(hip_helloworld apps/hello_world_hip.cpp) set_target_properties(hip_helloworld PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") - +target_link_options(hip_helloworld PRIVATE "-Wl,--build-id=md5") +install(TARGETS hip_helloworld RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) #hip_vectoradd set_source_files_properties(apps/vector_add_hip.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(hip_vectoradd apps/vector_add_hip.cpp) set_target_properties(hip_vectoradd PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") +target_link_options(hip_vectoradd PRIVATE "-Wl,--build-id=md5") +install(TARGETS hip_vectoradd RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) #mpi_vectoradd find_package(MPI) @@ -67,6 +69,8 @@ include_directories(SYSTEM ${MPI_INCLUDE_PATH}) set_source_files_properties(apps/vector_add_mpi.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(mpi_vectoradd apps/vector_add_mpi.cpp) set_target_properties(mpi_vectoradd PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") +target_link_options(mpi_vectoradd PRIVATE "-Wl,--build-id=md5") +install(TARGETS mpi_vectoradd RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) target_link_libraries(mpi_vectoradd ${MPI_C_LIBRARIES} stdc++fs) endif() @@ -88,6 +92,8 @@ endif() set_source_files_properties(apps/async_mem_copy.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) hip_add_executable(hsa_async_mem_copy apps/async_mem_copy.cpp) set_target_properties(hsa_async_mem_copy PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") +target_link_options(hsa_async_mem_copy PRIVATE "-Wl,--build-id=md5") +install(TARGETS hsa_async_mem_copy RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) target_link_libraries(hsa_async_mem_copy hsa-runtime64::hsa-runtime64 Threads::Threads dl stdc++fs) @@ -111,6 +117,8 @@ hip_add_executable(multithreaded_testapp apps/multithreaded_testapp.cpp ../utils target_include_directories(multithreaded_testapp PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/featuretests/profiler/apps) target_link_libraries(multithreaded_testapp hsa-runtime64::hsa-runtime64 Threads::Threads dl stdc++fs amd_comgr) set_target_properties(multithreaded_testapp PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") +target_link_options(multithreaded_testapp PRIVATE "-Wl,--build-id=md5") +install(TARGETS multithreaded_testapp RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) add_dependencies(tests multithreaded_testapp) # Multi-Queue Dependency Test @@ -132,17 +140,21 @@ set(GPU_LIST "gfx900" "gfx906" "gfx908" "gfx90a" "gfx1030") foreach(target_id ${GPU_LIST}) ## generate kernel bitcodes generate_hsaco(${target_id} ${CMAKE_CURRENT_SOURCE_DIR}/apps/copy.cl ${target_id}_copy.hsaco) +# install(FILES "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/${target_id}_copy.hsaco" +# DESTINATION "${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests" +# COMPONENT tests) + endforeach(target_id) add_custom_target(hsaco_targets DEPENDS ${HSACO_TARGET_LIST}) add_executable(multiqueue_testapp apps/multiqueue_testapp.cpp) - target_include_directories(multiqueue_testapp PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/inc ${PROJECT_SOURCE_DIR}/tests/featuretests/profiler) + target_include_directories(multiqueue_testapp PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/tests/featuretests/profiler) # Link test executable against gtest & gtest_main - target_link_libraries(multiqueue_testapp PRIVATE ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 GTest::gtest GTest::gtest_main stdc++fs Threads::Threads amd_comgr dl) + target_link_libraries(multiqueue_testapp PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 GTest::gtest GTest::gtest_main stdc++fs Threads::Threads amd_comgr dl) add_dependencies(multiqueue_testapp hsaco_targets) add_dependencies(tests multiqueue_testapp ) set_target_properties(multiqueue_testapp PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps") - install(TARGETS multiqueue_testapp RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests COMPONENT tests) + install(TARGETS multiqueue_testapp RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps COMPONENT tests) # add_executable(profiler_multiqueue_test discretetests/binary/multiqueue_test.cpp utils/csv_parser.cpp utils/test_utils.cpp) # target_include_directories(profiler_multiqueue_test PRIVATE ${PROJECT_SOURCE_DIR} ${PROJECT_SOURCE_DIR}/tests/featuretests/profiler) @@ -166,14 +178,20 @@ target_include_directories(runFeatureTests PRIVATE ${TEST_DIR} ${PROJECT_SOURCE_DIR}/tests/featuretests/profiler) # Link test executable against gtest & gtest_main -target_link_libraries(runFeatureTests PRIVATE ${ROCPROFILER_TARGET} ${ROCPROFILER_TARGET} hsa-runtime64::hsa-runtime64 +target_link_libraries(runFeatureTests PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 GTest::gtest GTest::gtest_main Threads::Threads dl stdc++fs amd_comgr) add_dependencies(tests runFeatureTests) - +target_link_options(runFeatureTests PRIVATE "-Wl,--build-id=md5") +install(TARGETS runFeatureTests RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests COMPONENT tests) add_test(AllTests runFeatureTests) # Copy scripts, input files to samples folder configure_file(${CMAKE_CURRENT_SOURCE_DIR}/apps/goldentraces/basic_metrics.txt ${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps COPYONLY) configure_file(${CMAKE_CURRENT_SOURCE_DIR}/apps/goldentraces/input.txt ${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps COPYONLY) configure_file(${CMAKE_CURRENT_SOURCE_DIR}/apps/mpi_run.sh ${PROJECT_BINARY_DIR}/tests/featuretests/profiler/apps/ COPYONLY) + +install( + DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/apps/goldentraces/ + DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/featuretests/profiler/apps/goldentraces + COMPONENT tests) diff --git a/tests/featuretests/profiler/apps/multiqueue_testapp.cpp b/tests/featuretests/profiler/apps/multiqueue_testapp.cpp index 4c95bfc4..53b7e3e1 100644 --- a/tests/featuretests/profiler/apps/multiqueue_testapp.cpp +++ b/tests/featuretests/profiler/apps/multiqueue_testapp.cpp @@ -36,6 +36,10 @@ namespace fs = std::experimental::filesystem; std::vector Device::all_devices; std::string GetRunningPath(std::string string_to_erase); +static void init_test_path(); + +std::string test_app_path; +std::string hasco_path; int main() { hsa_status_t status; @@ -48,11 +52,12 @@ int main() { status = hsa_agent_get_info(gpu[0].agent, HSA_AGENT_INFO_NAME, agent_name); ASSERT_EQ(status, HSA_STATUS_SUCCESS); + // set global test path for this test + init_test_path(); // Getting Current Path - std::string app_path = GetRunningPath("tests/featuretests/profiler/apps/multiqueue_testapp"); + std::string app_path = GetRunningPath(test_app_path + "multiqueue_testapp"); // Getting hasco Path - std::string ko_path = - app_path + "tests/featuretests/profiler/" + std::string(agent_name) + "_copy.hsaco"; + std::string ko_path = app_path + hasco_path + std::string(agent_name) + "_copy.hsaco"; MQDependencyTest::CodeObject code_object; if (!obj.LoadCodeObject(ko_path, gpu[0].agent, code_object)) { @@ -305,3 +310,33 @@ std::string GetRunningPath(std::string string_to_erase) { } return path; } + +bool is_installed_path() { + std::string path; + char* real_path; + Dl_info dl_info; + + if (0 != dladdr(reinterpret_cast(main), &dl_info)) { + path = dl_info.dli_fname; + real_path = realpath(path.c_str(), NULL); + if (real_path == nullptr) { + throw(std::string("Error! in extracting real path")); + } + path.clear(); // reset path + path.append(real_path); + if (path.find("/opt") != std::string::npos) { + return true; + } + } + return false; +} + +static void init_test_path() { + if (is_installed_path()) { + test_app_path = "share/rocprofiler/tests/featuretests/profiler/apps/"; + hasco_path = "share/rocprofiler/tests/"; + } else { + test_app_path = "tests/featuretests/profiler/apps/"; + hasco_path = "tests/featuretests/profiler/"; + } +} diff --git a/tests/featuretests/profiler/discretetests/api/att_test.cpp b/tests/featuretests/profiler/discretetests/api/att_test.cpp new file mode 100644 index 00000000..311c35b0 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/api/att_test.cpp @@ -0,0 +1,259 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ +#include +#include + +#include +#include +#include +#include +#include + +#include "utils/test_utils.h" + +#ifdef NDEBUG +#define HIP_ASSERT(x) x +#else +#define HIP_ASSERT(x) (assert((x)==hipSuccess)) +#endif + + +#define WIDTH 1024 +#define HEIGHT 1024 + +#define NUM (WIDTH*HEIGHT) + +#define THREADS_PER_BLOCK_X 16 +#define THREADS_PER_BLOCK_Y 16 +#define THREADS_PER_BLOCK_Z 1 + + + +/** \mainpage ROC Profiler API Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROCProfiler APIs to collect ATT traces. + * + * A simple vectoradd_float kernel is launched and the trace results are printed + * as console output + */ + + +// function to check att tracing API status +auto CheckApi = [](rocprofiler_status_t status) { + if (status != ROCPROFILER_STATUS_SUCCESS) { + std::cout << "ROCProfiler API Error" << std::endl; + } + assert(status == ROCPROFILER_STATUS_SUCCESS); +}; + + +// callback function to dump att tracing data +void FlushCallback(const rocprofiler_record_header_t* record, + const rocprofiler_record_header_t* end_record, rocprofiler_session_id_t session_id, + rocprofiler_buffer_id_t buffer_id) { + while (record < end_record) { + if (!record) break; + else if (record->kind == ROCPROFILER_ATT_TRACER_RECORD){ + const rocprofiler_record_att_tracer_t* att_tracer_record = + reinterpret_cast(record); + size_t name_length; + CheckApi(rocprofiler_query_kernel_info_size(ROCPROFILER_KERNEL_NAME, att_tracer_record->kernel_id, + &name_length)); + const char* kernel_name_c = static_cast(malloc(name_length * sizeof(char))); + CheckApi(rocprofiler_query_kernel_info(ROCPROFILER_KERNEL_NAME, att_tracer_record->kernel_id, + &kernel_name_c)); + int gpu_index = att_tracer_record->gpu_id.handle; + printf( + "Kernel Info:\n\tGPU Index: %d\n\tKernel Name: %s\n", + gpu_index, kernel_name_c); + + // Get the number of shader engine traces + int se_num = att_tracer_record->shader_engine_data_count; + + // iterate over each shader engine att trace + for (int i = 0; i < se_num; i++){ + + printf("\n\n-------------- shader_engine %d --------------\n\n", i); + rocprofiler_record_se_att_data_t* se_att_trace = &att_tracer_record->shader_engine_data[i]; + uint32_t size = se_att_trace->buffer_size; + const unsigned short* data_buffer_ptr = reinterpret_cast(se_att_trace->buffer_ptr); + + // Print the buffer in terms of shorts (16 bits) + for (uint32_t j = 0; j < (size / sizeof(short)); j++) + printf("%04x\n", data_buffer_ptr[j]); + + } + + } + CheckApi(rocprofiler_next_record(record, &record, session_id, buffer_id)); + } +} + + + + +__global__ void +vectoradd_float(float* __restrict__ a, const float* __restrict__ b, const float* __restrict__ c, int width, int height) + + { + + int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x; + int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y; + + int i = y * width + x; + if ( i < (width * height)) { + a[i] = b[i] + c[i]; + } + + + + } + +int LaunchVectorAddKernel() { + + float* hostA; + float* hostB; + float* hostC; + + float* deviceA; + float* deviceB; + float* deviceC; + + hipDeviceProp_t devProp; + hipGetDeviceProperties(&devProp, 0); + std::cout << " System minor " << devProp.minor << std::endl; + std::cout << " System major " << devProp.major << std::endl; + std::cout << " agent prop name " << devProp.name << std::endl; + + + + std::cout << "hip Device prop succeeded " << std::endl ; + + + int i; + int errors; + + hostA = (float*)malloc(NUM * sizeof(float)); + hostB = (float*)malloc(NUM * sizeof(float)); + hostC = (float*)malloc(NUM * sizeof(float)); + + // initialize the input data + for (i = 0; i < NUM; i++) { + hostB[i] = (float)i; + hostC[i] = (float)i*100.0f; + } + + HIP_ASSERT(hipMalloc((void**)&deviceA, NUM * sizeof(float))); + HIP_ASSERT(hipMalloc((void**)&deviceB, NUM * sizeof(float))); + HIP_ASSERT(hipMalloc((void**)&deviceC, NUM * sizeof(float))); + + HIP_ASSERT(hipMemcpy(deviceB, hostB, NUM*sizeof(float), hipMemcpyHostToDevice)); + HIP_ASSERT(hipMemcpy(deviceC, hostC, NUM*sizeof(float), hipMemcpyHostToDevice)); + + + hipLaunchKernelGGL(vectoradd_float, + dim3(WIDTH/THREADS_PER_BLOCK_X, HEIGHT/THREADS_PER_BLOCK_Y), + dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), + 0, 0, + deviceA ,deviceB ,deviceC ,WIDTH ,HEIGHT); + + + HIP_ASSERT(hipMemcpy(hostA, deviceA, NUM*sizeof(float), hipMemcpyDeviceToHost)); + + // verify the results + errors = 0; + for (i = 0; i < NUM; i++) { + if (hostA[i] != (hostB[i] + hostC[i])) { + errors++; + } + } + if (errors!=0) { + printf("FAILED: %d errors\n",errors); + } else { + printf ("PASSED!\n"); + } + + HIP_ASSERT(hipFree(deviceA)); + HIP_ASSERT(hipFree(deviceB)); + HIP_ASSERT(hipFree(deviceC)); + + free(hostA); + free(hostB); + free(hostC); + + //hipResetDefaultAccelerator(); + + return errors; +} + + +int main(int argc, char** argv) { + + // inititalize ROCProfiler + CheckApi(rocprofiler_initialize()); + + // Att trace collection parameters + rocprofiler_session_id_t session_id; + std::vector parameters; + parameters.emplace_back(rocprofiler_att_parameter_t{ROCPROFILER_ATT_COMPUTE_UNIT_TARGET, 0}); + parameters.emplace_back(rocprofiler_att_parameter_t{ROCPROFILER_ATT_MASK, 0x0F00}); + parameters.emplace_back(rocprofiler_att_parameter_t{ROCPROFILER_ATT_TOKEN_MASK, 0x344B}); + parameters.emplace_back(rocprofiler_att_parameter_t{ROCPROFILER_ATT_TOKEN_MASK2, 0xFFFF}); + + // create a session + CheckApi(rocprofiler_create_session(ROCPROFILER_KERNEL_REPLAY_MODE, &session_id)); + + // create a buffer to hold att trace records for each kernel launch + rocprofiler_buffer_id_t buffer_id; + CheckApi(rocprofiler_create_buffer(session_id, FlushCallback, 0x9999, &buffer_id)); + + // create a filter for collecting att traces + rocprofiler_filter_id_t filter_id; + rocprofiler_filter_property_t property = {}; + CheckApi(rocprofiler_create_filter(session_id, ROCPROFILER_ATT_TRACE_COLLECTION, + rocprofiler_filter_data_t{.att_parameters = ¶meters[0]}, + parameters.size(), &filter_id, property)); + + // set buffer for the filter + CheckApi(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id)); + + // activating att tracing session + CheckApi(rocprofiler_start_session(session_id)); + + // Launch a kernel + LaunchVectorAddKernel(); + + // deactivate att tracing session + CheckApi(rocprofiler_terminate_session(session_id)); + + // dump att tracing data + CheckApi(rocprofiler_flush_data(session_id, buffer_id)); + + // destroy session + CheckApi(rocprofiler_destroy_session(session_id)); + + // finalize att tracing by destroying rocprofiler object + CheckApi(rocprofiler_finalize()); + return 0; +} diff --git a/tests/featuretests/profiler/discretetests/api/multithreaded_test.cpp b/tests/featuretests/profiler/discretetests/api/multithreaded_test.cpp new file mode 100644 index 00000000..583de7bc --- /dev/null +++ b/tests/featuretests/profiler/discretetests/api/multithreaded_test.cpp @@ -0,0 +1,145 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ +#include +#include + +#include +#include +#include +#include +#include + +#include "utils/test_utils.h" + +/** \mainpage ROC Profiler API Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROCmTools APIs from multiple threads + * and verify if each API succeeds and multiple contexts are collected and + * printed. + * + * An empty kernel is launched on multiple threads and profiling context is + * collected and printed from each thread. + */ + + +// function to check profiler API status +auto CheckApi = [](rocprofiler_status_t status) { + if (status != ROCPROFILER_STATUS_SUCCESS) { + std::cout << "ROCmTools API Error" << std::endl; + } + assert(status == ROCPROFILER_STATUS_SUCCESS); +}; + +// empty kernel +__global__ void kernel() { printf("empty kernel\n"); } + +// callback function to dump profiler data +void FlushCallback(const rocprofiler_record_header_t* record, + const rocprofiler_record_header_t* end_record, rocprofiler_session_id_t session_id, + rocprofiler_buffer_id_t buffer_id) { + while (record < end_record) { + if (!record) break; + if (record->kind == ROCPROFILER_PROFILER_RECORD) { + const rocprofiler_record_profiler_t* profiler_record = + reinterpret_cast(record); + size_t name_length; + CheckApi(rocprofiler_query_kernel_info_size(ROCPROFILER_KERNEL_NAME, profiler_record->kernel_id, + &name_length)); + const char* kernel_name_c = static_cast(malloc(name_length * sizeof(char))); + CheckApi(rocprofiler_query_kernel_info(ROCPROFILER_KERNEL_NAME, profiler_record->kernel_id, + &kernel_name_c)); + int gpu_index = profiler_record->gpu_id.handle; + uint64_t start_time = profiler_record->timestamps.begin.value; + uint64_t end_time = profiler_record->timestamps.end.value; + printf( + "Kernel Info:\n\tGPU Index: %d\n\tKernel Name: %s\n\tStart " + "Time: " + "%lu\n\tEnd Time: %lu\n", + gpu_index, kernel_name_c, start_time, end_time); + } + CheckApi(rocprofiler_next_record(record, &record, session_id, buffer_id)); + } +} + +// launches an empty kernel in profiler context +void KernelLaunch() { + // run empty kernel + kernel<<<1, 1>>>(); + hipDeviceSynchronize(); +} + +int main(int argc, char** argv) { + // Get the system cores + int num_cpu_cores = GetNumberOfCores(); + + // create as many threads as number of cores in system + std::vector threads(num_cpu_cores); + + // inititalize profiler by creating rocmtool object + CheckApi(rocprofiler_initialize()); + + // Counter Collection with timestamps + rocprofiler_session_id_t session_id; + std::vector counters; + counters.emplace_back("SQ_WAVES"); + + CheckApi(rocprofiler_create_session(ROCPROFILER_KERNEL_REPLAY_MODE, &session_id)); + + rocprofiler_buffer_id_t buffer_id; + CheckApi(rocprofiler_create_buffer(session_id, FlushCallback, 0x9999, &buffer_id)); + + rocprofiler_filter_id_t filter_id; + rocprofiler_filter_property_t property = {}; + CheckApi(rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION, + rocprofiler_filter_data_t{.counters_names = &counters[0]}, + counters.size(), &filter_id, property)); + + CheckApi(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id)); + + // activating profiler session + CheckApi(rocprofiler_start_session(session_id)); + + // launch kernel on each thread + for (int n = 0; n < num_cpu_cores; ++n) { + threads[n] = std::thread(KernelLaunch); + } + + // wait for all kernel launches to complete + for (int n = 0; n < num_cpu_cores; ++n) { + threads[n].join(); + } + + // deactivate session + CheckApi(rocprofiler_terminate_session(session_id)); + + // dump profiler data + CheckApi(rocprofiler_flush_data(session_id, buffer_id)); + + // destroy session + CheckApi(rocprofiler_destroy_session(session_id)); + + // finalize profiler by destroying rocmtool object + CheckApi(rocprofiler_finalize()); + return 0; +} diff --git a/tests/featuretests/profiler/discretetests/api/spm_test.cpp b/tests/featuretests/profiler/discretetests/api/spm_test.cpp new file mode 100644 index 00000000..edf262b9 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/api/spm_test.cpp @@ -0,0 +1,233 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "utils/test_utils.h" + +#ifdef NDEBUG +#define HIP_ASSERT(x) x +#else +#define HIP_ASSERT(x) (assert((x) == hipSuccess)) +#endif + + +#define WIDTH 1024 +#define HEIGHT 1024 + +#define NUM (WIDTH * HEIGHT) + +#define THREADS_PER_BLOCK_X 16 +#define THREADS_PER_BLOCK_Y 16 +#define THREADS_PER_BLOCK_Z 1 + + +/** \mainpage ROC Profiler API Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROCmTools APIs to collect SPM. + * + * A simple vectoradd_float kernel is launched and the SPM results are printed + * as console output + */ + + +// function to check spm tracing API status +auto CheckApi = [](rocprofiler_status_t status) { + if (status != ROCPROFILER_STATUS_SUCCESS) { + std::cout << "ROCmTools API Error" << std::endl; + } + assert(status == ROCPROFILER_STATUS_SUCCESS); +}; + + +void FlushCallback(const rocprofiler_record_header_t* record, + const rocprofiler_record_header_t* end_record, rocprofiler_session_id_t session_id, + rocprofiler_buffer_id_t buffer_id) { + while (record < end_record) { + if (!record) + break; + else if (record->kind == ROCPROFILER_SPM_RECORD) { + const rocprofiler_record_spm_t* spm_record = + reinterpret_cast(record); + size_t name_length; + int se_num = 4; + // iterate over each shader engine + for (int i = 0; i < se_num; i++) { + printf("\n\n-------------- shader_engine %d --------------\n\n", i); + rocprofiler_record_se_spm_data_t se_spm = spm_record->shader_engine_data[i]; + for (int i = 0; i < 32; i++) { + printf("%04x\n", se_spm.counters_data[i].value); + } + } + } + CheckApi(rocprofiler_next_record(record, &record, session_id, buffer_id)); + } +} + +__global__ void vectoradd_float(float* __restrict__ a, const float* __restrict__ b, + const float* __restrict__ c, int width, int height) { + int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x; + int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y; + + int i = y * width + x; + if (i < (width * height)) { + a[i] = b[i] + c[i]; + } +} + +int LaunchVectorAddKernel() { + float* hostA; + float* hostB; + float* hostC; + + float* deviceA; + float* deviceB; + float* deviceC; + + hipDeviceProp_t devProp; + hipGetDeviceProperties(&devProp, 0); + std::cout << " System minor " << devProp.minor << std::endl; + std::cout << " System major " << devProp.major << std::endl; + std::cout << " agent prop name " << devProp.name << std::endl; + + + std::cout << "hip Device prop succeeded " << std::endl; + + + int i; + int errors; + + hostA = (float*)malloc(NUM * sizeof(float)); + hostB = (float*)malloc(NUM * sizeof(float)); + hostC = (float*)malloc(NUM * sizeof(float)); + + // initialize the input data + for (i = 0; i < NUM; i++) { + hostB[i] = (float)i; + hostC[i] = (float)i * 100.0f; + } + + HIP_ASSERT(hipMalloc((void**)&deviceA, NUM * sizeof(float))); + HIP_ASSERT(hipMalloc((void**)&deviceB, NUM * sizeof(float))); + HIP_ASSERT(hipMalloc((void**)&deviceC, NUM * sizeof(float))); + + HIP_ASSERT(hipMemcpy(deviceB, hostB, NUM * sizeof(float), hipMemcpyHostToDevice)); + HIP_ASSERT(hipMemcpy(deviceC, hostC, NUM * sizeof(float), hipMemcpyHostToDevice)); + + + for (int i = 0; i < 20; i++) + hipLaunchKernelGGL(vectoradd_float, + dim3(WIDTH / THREADS_PER_BLOCK_X, HEIGHT / THREADS_PER_BLOCK_Y), + dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0, 0, deviceA, deviceB, + deviceC, WIDTH, HEIGHT); + + + HIP_ASSERT(hipMemcpy(hostA, deviceA, NUM * sizeof(float), hipMemcpyDeviceToHost)); + + // verify the results + errors = 0; + for (i = 0; i < NUM; i++) { + if (hostA[i] != (hostB[i] + hostC[i])) { + errors++; + } + } + if (errors != 0) { + printf("FAILED: %d errors\n", errors); + } else { + printf("PASSED!\n"); + } + + HIP_ASSERT(hipFree(deviceA)); + HIP_ASSERT(hipFree(deviceB)); + HIP_ASSERT(hipFree(deviceC)); + + free(hostA); + free(hostB); + free(hostC); + + // hipResetDefaultAccelerator(); + + return errors; +} + + +int main(int argc, char** argv) { + // inititalize rocmtools + hsa_init(); + CheckApi(rocprofiler_initialize()); + + // spm trace collection parameters + rocprofiler_session_id_t session_id; + rocprofiler_spm_parameter_t spm_parameters; + const char* counter_name = "SQ_WAVES"; + spm_parameters.counters_names = &counter_name; + spm_parameters.counters_count = 1; + spm_parameters.gpu_agent_id = NULL; + // spm_parameters.cpu_agent_id = NULL; + spm_parameters.sampling_rate = 10000; + // create a session + CheckApi(rocprofiler_create_session(ROCPROFILER_KERNEL_REPLAY_MODE, &session_id)); + + // create a buffer to hold spm trace records for each kernel launch + rocprofiler_buffer_id_t buffer_id; + CheckApi(rocprofiler_create_buffer(session_id, FlushCallback, 0x99999999, &buffer_id)); + + // create a filter for collecting spm traces + rocprofiler_filter_id_t filter_id; + rocprofiler_filter_property_t property = {}; + CheckApi(rocprofiler_create_filter(session_id, ROCPROFILER_SPM_COLLECTION, + rocprofiler_filter_data_t{.spm_parameters = &spm_parameters}, 1, + &filter_id, property)); + + // set buffer for the filter + CheckApi(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id)); + + // activating spm tracing session + CheckApi(rocprofiler_start_session(session_id)); + + // Launch a kernel + LaunchVectorAddKernel(); + + // deactivate spm tracing session + // dump spm tracing data + // + CheckApi(rocprofiler_terminate_session(session_id)); + // CheckApi(rocprofiler_flush_data(session_id, buffer_id)); + + // destroy session + CheckApi(rocprofiler_destroy_session(session_id)); + + // finalize spm tracing by destroying rocmtool object + CheckApi(rocprofiler_finalize()); + hsa_shut_down(); + return 0; +} \ No newline at end of file diff --git a/tests/featuretests/profiler/discretetests/basic_metrics.txt b/tests/featuretests/profiler/discretetests/basic_metrics.txt new file mode 100644 index 00000000..9df3728f --- /dev/null +++ b/tests/featuretests/profiler/discretetests/basic_metrics.txt @@ -0,0 +1 @@ +pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE \ No newline at end of file diff --git a/tests/featuretests/profiler/discretetests/binary/copy.cl b/tests/featuretests/profiler/discretetests/binary/copy.cl new file mode 100644 index 00000000..eadc65f1 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/copy.cl @@ -0,0 +1,32 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +__kernel void copyA(__global unsigned int* a, __global unsigned int* b) { + uint tid = get_global_id(0); + a[tid] = b[tid]; +} +__kernel void copyB(__global unsigned int* a, __global unsigned int* b) { + uint tid = get_global_id(0); + a[tid] = b[tid]; +} +__kernel void copyC(__global unsigned int* a, __global unsigned int* b) { + uint tid = get_global_id(0); + a[tid] = b[tid]; +} diff --git a/tests/featuretests/profiler/discretetests/binary/multiprocess_test.cpp b/tests/featuretests/profiler/discretetests/binary/multiprocess_test.cpp new file mode 100644 index 00000000..91689574 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multiprocess_test.cpp @@ -0,0 +1,88 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +/** \mainpage ROC Profiler Multi Process Binary Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROC profiler as a binary against a + * multiprocess application.Test application launches an empty kernel + * on multiple threads from both parent and child process. + * + * The test then parses the csv and verifies if the nuber of context collected + * are equal to number of threads launched in test application. + * + * Test also does some basic verification if counter values are non-negative + */ + +#include +#include +#include + +#include +#include +#include + +#include "utils/test_utils.h" + +// empty kernel +__global__ void kernel() {} + +void KernelLaunch() { + // run empty kernel + kernel<<<1, 1>>>(); + hipDeviceSynchronize(); +} + +int main(int argc, char **argv) { + // create as many threads as number of cores in system + int num_cpu_cores = GetNumberOfCores(); + + pid_t childpid = fork(); + + if (childpid > 0) { // Parent + // create a pool of thrads + std::vector threads(num_cpu_cores); + for (int n = 0; n < num_cpu_cores / 2; ++n) { + threads[n] = std::thread(KernelLaunch); + } + + for (int n = 0; n < num_cpu_cores / 2; ++n) { + threads[n].join(); + } + // wait for child exit + wait(NULL); + + } else if (!childpid) { // child + // create a pool of thrads + std::vector threads(num_cpu_cores); + for (int n = 0; n < num_cpu_cores / 2; ++n) { + threads[n] = std::thread(KernelLaunch); + } + + for (int n = 0; n < num_cpu_cores / 2; ++n) { + threads[n].join(); + } + } else { // failure + return -1; + } +} diff --git a/tests/featuretests/profiler/discretetests/binary/multiqueue_test.cpp b/tests/featuretests/profiler/discretetests/binary/multiqueue_test.cpp new file mode 100644 index 00000000..1ad94d2f --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multiqueue_test.cpp @@ -0,0 +1,113 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +/** \mainpage ROC Profiler Binary Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROC profiler as a binary against a + * multithreaded application.Test application launches an empty kernel + * on multiple threads. + * + * The test then parses the csv and verifies if the nuber of kernel dispatches + * are equal to number of threads launched in test application. + * + * Test also does some basic verification if counter values are non-negative + */ + +#include +#include +#include +#include +#include + +#include "utils/csv_parser.h" +#include "utils/test_utils.h" + +// Multi Queue kernel dispatch count test +int QueueDependencyTest(std::string profiler_output) { + CSVParser parser; + parser.ParseCSV(profiler_output); + countermap counter_map = parser.GetCounterMap(); + + // number of kernel dispatches in test + uint32_t dispatch_count = 3; + + uint32_t dispatch_counter = 0; + for (size_t i = 0; i < counter_map.size(); i++) { + std::string* dispatch_id = parser.ReadCounter(i, 1); + if (dispatch_id != nullptr) { + if (dispatch_id->find("dispatch") != std::string::npos) { + dispatch_counter++; + } + } + } + + // dispatch count test: Number of dispatches must be equal to + // number of kernel launches in test_app + if (dispatch_counter == dispatch_count) { + return 0; + } + return -1; +} + +std::string ReadProfilerBuffer(const char* cmd) { + std::vector buffer(1028); + std::string profiler_output; + + std::unique_ptr pipe(popen(cmd, "r"), pclose); + if (!pipe) { + throw std::runtime_error("popen() failed!"); + } + while (fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) { + profiler_output += buffer.data(); + } + return profiler_output; +} + +std::string InitMultiQueueTest() { + std::string input_app_path = GetRunningPath("profiler_multiqueue_test"); + std::stringstream input_txt_path; + input_txt_path << input_app_path << "gtests/apps/goldentraces/input.txt"; + std::string rocprofv2_path = + GetRunningPath("build/tests/featuretests/profiler/profiler_multiqueue_test"); + std::stringstream command(rocprofv2_path); + + command << "./rocprofv2 -i " << input_txt_path.str().c_str() << " " << input_app_path + << "multiqueue_testapp"; + + std::string result = ReadProfilerBuffer(command.str().c_str()); + return result; +} + +int main(int argc, char** argv) { + int test_status = -1; + std::string profiler_output; + + // initialize multi queue dependecy test + profiler_output = InitMultiQueueTest(); + + // multi queue dispatch count test + test_status = QueueDependencyTest(profiler_output); + + return test_status; +} diff --git a/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.cpp b/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.cpp new file mode 100644 index 00000000..17fe2d20 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.cpp @@ -0,0 +1,284 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +/** \mainpage ROC Profiler Multi Queue Dependency Test + * + * \section introduction Introduction + * + * The goal of this test is to ensure ROC profiler does not go to deadlock + * when multiple queue are created and they are dependent on each other + * + */ + +#include "discretetests/binary/multiqueue_testapp.h" + +#include "src/utils/exception.h" + +namespace fs = std::experimental::filesystem; +std::vector Device::all_devices; + +int main() { + hsa_status_t status; + MQDependencyTest obj; + + // Get Agent info + obj.DeviceDiscovery(); + + char agent_name[64]; + status = hsa_agent_get_info(gpu[0].agent, HSA_AGENT_INFO_NAME, agent_name); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + // Getting Current Path + std::string current_path = fs::current_path().generic_string(); + // Getting hasco Path + std::string ko_path = current_path + "/featuretests/profiler/" + + std::string(agent_name) + "_copy.hsaco"; + + MQDependencyTest::CodeObject code_object; + if (!obj.LoadCodeObject(ko_path, gpu[0].agent, code_object)) { + printf("Kernel file not found or not usable with given agent.\n"); + abort(); + } + + MQDependencyTest::Kernel copyA; + if (!obj.GetKernel(code_object, "copyA", gpu[0].agent, copyA)) { + printf("Test kernel A not found.\n"); + abort(); + } + MQDependencyTest::Kernel copyB; + if (!obj.GetKernel(code_object, "copyB", gpu[0].agent, copyB)) { + printf("Test kernel B not found.\n"); + abort(); + } + MQDependencyTest::Kernel copyC; + if (!obj.GetKernel(code_object, "copyC", gpu[0].agent, copyC)) { + printf("Test kernel C not found.\n"); + abort(); + } + + struct args_t { + uint32_t* a; + uint32_t* b; + MQDependencyTest::OCLHiddenArgs hidden; + }; + + args_t* args; + args = static_cast(obj.hsaMalloc(sizeof(args_t), kernarg)); + memset(args, 0, sizeof(args_t)); + + uint32_t* a = + static_cast(obj.hsaMalloc(64 * sizeof(uint32_t), kernarg)); + uint32_t* b = + static_cast(obj.hsaMalloc(64 * sizeof(uint32_t), kernarg)); + + memset(a, 0, 64 * sizeof(uint32_t)); + memset(b, 1, 64 * sizeof(uint32_t)); + + // Create queue in gpu agent and prepare a kernel dispatch packet + hsa_queue_t* queue1; + status = hsa_queue_create(gpu[0].agent, 1024, HSA_QUEUE_TYPE_SINGLE, NULL, + NULL, UINT32_MAX, UINT32_MAX, &queue1); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + // Create a signal with a value of 1 and attach it to the first kernel + // dispatch packet + hsa_signal_t completion_signal_1; + status = hsa_signal_create(1, 0, NULL, &completion_signal_1); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + // First dispath packet on queue 1, Kernel A + { + MQDependencyTest::Aql packet{}; + packet.header.type = HSA_PACKET_TYPE_KERNEL_DISPATCH; + packet.header.barrier = 1; + packet.header.acquire = HSA_FENCE_SCOPE_SYSTEM; + packet.header.release = HSA_FENCE_SCOPE_SYSTEM; + + packet.dispatch.setup = 1; + packet.dispatch.workgroup_size_x = 64; + packet.dispatch.workgroup_size_y = 1; + packet.dispatch.workgroup_size_z = 1; + packet.dispatch.grid_size_x = 64; + packet.dispatch.grid_size_y = 1; + packet.dispatch.grid_size_z = 1; + + packet.dispatch.group_segment_size = copyA.group; + packet.dispatch.private_segment_size = copyA.scratch; + packet.dispatch.kernel_object = copyA.handle; + + packet.dispatch.kernarg_address = args; + packet.dispatch.completion_signal = completion_signal_1; + + args->a = a; + args->b = b; + // Tell packet processor of A to launch the first kernel dispatch packet + obj.SubmitPacket(queue1, packet); + } + + // Create a signal with a value of 1 and attach it to the second kernel + // dispatch packet + hsa_signal_t completion_signal_2; + status = hsa_signal_create(1, 0, NULL, &completion_signal_2); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + hsa_signal_t completion_signal_3; + status = hsa_signal_create(1, 0, NULL, &completion_signal_3); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + // Create barrier-AND packet that is enqueued in queue 1 + { + MQDependencyTest::Aql packet{}; + packet.header.type = HSA_PACKET_TYPE_BARRIER_AND; + packet.header.barrier = 1; + packet.header.acquire = HSA_FENCE_SCOPE_SYSTEM; + packet.header.release = HSA_FENCE_SCOPE_SYSTEM; + + packet.barrier_and.dep_signal[0] = completion_signal_2; + obj.SubmitPacket(queue1, packet); + } + + // Second dispath packet on queue 1, Kernel C + { + MQDependencyTest::Aql packet{}; + packet.header.type = HSA_PACKET_TYPE_KERNEL_DISPATCH; + packet.header.barrier = 1; + packet.header.acquire = HSA_FENCE_SCOPE_SYSTEM; + packet.header.release = HSA_FENCE_SCOPE_SYSTEM; + + packet.dispatch.setup = 1; + packet.dispatch.workgroup_size_x = 64; + packet.dispatch.workgroup_size_y = 1; + packet.dispatch.workgroup_size_z = 1; + packet.dispatch.grid_size_x = 64; + packet.dispatch.grid_size_y = 1; + packet.dispatch.grid_size_z = 1; + + packet.dispatch.group_segment_size = copyC.group; + packet.dispatch.private_segment_size = copyC.scratch; + packet.dispatch.kernel_object = copyC.handle; + packet.dispatch.completion_signal = completion_signal_3; + packet.dispatch.kernarg_address = args; + + args->a = a; + args->b = b; + // Tell packet processor to launch the second kernel dispatch packet + obj.SubmitPacket(queue1, packet); + } + + // Create queue 2 + hsa_queue_t* queue2; + status = hsa_queue_create(gpu[0].agent, 1024, HSA_QUEUE_TYPE_SINGLE, NULL, + NULL, UINT32_MAX, UINT32_MAX, &queue2); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + // Create barrier-AND packet that is enqueued in queue 2 + { + MQDependencyTest::Aql packet{}; + packet.header.type = HSA_PACKET_TYPE_BARRIER_AND; + packet.header.barrier = 1; + packet.header.acquire = HSA_FENCE_SCOPE_SYSTEM; + packet.header.release = HSA_FENCE_SCOPE_SYSTEM; + + packet.barrier_and.dep_signal[0] = completion_signal_1; + obj.SubmitPacket(queue2, packet); + } + + // Third dispath packet on queue 2, Kernel B + { + MQDependencyTest::Aql packet{}; + packet.header.type = HSA_PACKET_TYPE_KERNEL_DISPATCH; + packet.header.barrier = 1; + packet.header.acquire = HSA_FENCE_SCOPE_SYSTEM; + packet.header.release = HSA_FENCE_SCOPE_SYSTEM; + + packet.dispatch.setup = 1; + packet.dispatch.workgroup_size_x = 64; + packet.dispatch.workgroup_size_y = 1; + packet.dispatch.workgroup_size_z = 1; + packet.dispatch.grid_size_x = 64; + packet.dispatch.grid_size_y = 1; + packet.dispatch.grid_size_z = 1; + + packet.dispatch.group_segment_size = copyB.group; + packet.dispatch.private_segment_size = copyB.scratch; + packet.dispatch.kernel_object = copyB.handle; + + packet.dispatch.kernarg_address = args; + packet.dispatch.completion_signal = completion_signal_2; + + args->a = a; + args->b = b; + // Tell packet processor to launch the third kernel dispatch packet + obj.SubmitPacket(queue2, packet); + } + + // Wait on the completion signal + hsa_signal_wait_relaxed(completion_signal_1, HSA_SIGNAL_CONDITION_EQ, 0, + UINT64_MAX, HSA_WAIT_STATE_BLOCKED); + + // Wait on the completion signal + hsa_signal_wait_relaxed(completion_signal_2, HSA_SIGNAL_CONDITION_EQ, 0, + UINT64_MAX, HSA_WAIT_STATE_BLOCKED); + + // Wait on the completion signal + hsa_signal_wait_relaxed(completion_signal_3, HSA_SIGNAL_CONDITION_EQ, 0, + UINT64_MAX, HSA_WAIT_STATE_BLOCKED); + + for (int i = 0; i < 64; i++) { + if (a[i] != b[i]) { + printf("error at %d: expected %d, got %d\n", i, b[i], a[i]); + abort(); + } + } + + // Clearing data structures and memory + status = hsa_signal_destroy(completion_signal_1); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + status = hsa_signal_destroy(completion_signal_2); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + status = hsa_signal_destroy(completion_signal_3); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + if (queue1 != nullptr) { + status = hsa_queue_destroy(queue1); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + } + + if (queue2 != nullptr) { + status = hsa_queue_destroy(queue2); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + } + + status = hsa_memory_free(a); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + status = hsa_memory_free(b); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + status = hsa_executable_destroy(code_object.executable); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + + status = hsa_code_object_reader_destroy(code_object.code_obj_rdr); + ASSERT_EQ(status, HSA_STATUS_SUCCESS); + close(code_object.file); +} diff --git a/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.h b/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.h new file mode 100644 index 00000000..4593366b --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multiqueue_testapp.h @@ -0,0 +1,343 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_PROFILER_DISCRETETESTS_BINARY_MULTIQUEUE_TESTAPP_H_ +#define TESTS_FEATURETESTS_PROFILER_DISCRETETESTS_BINARY_MULTIQUEUE_TESTAPP_H_ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "src/utils/exception.h" + +#define ASSERT_EQ(val1, val2) \ + do { \ + if ((val1) != val2) { \ + assert(false); \ + abort(); \ + } \ + } while (false) + +struct Device { + struct Memory { + hsa_amd_memory_pool_t pool; + bool fine; + bool kernarg; + size_t size; + size_t granule; + }; + + hsa_agent_t agent; + char name[64]; + std::vector pools; + uint32_t fine; + uint32_t coarse; + static std::vector all_devices; +}; + +std::vector cpu, gpu; +Device::Memory kernarg; + +class MQDependencyTest { + public: + MQDependencyTest() { hsa_init(); } + ~MQDependencyTest() { hsa_shut_down(); } + + struct CodeObject { + hsa_file_t file; + hsa_code_object_reader_t code_obj_rdr; + hsa_executable_t executable; + }; + + struct Kernel { + uint64_t handle; + uint32_t scratch; + uint32_t group; + uint32_t kernarg_size; + uint32_t kernarg_align; + }; + + union AqlHeader { + struct { + uint16_t type : 8; + uint16_t barrier : 1; + uint16_t acquire : 2; + uint16_t release : 2; + uint16_t reserved : 3; + }; + uint16_t raw; + }; + + struct BarrierValue { + AqlHeader header; + uint8_t AmdFormat; + uint8_t reserved; + uint32_t reserved1; + hsa_signal_t signal; + hsa_signal_value_t value; + hsa_signal_value_t mask; + uint32_t cond; + uint32_t reserved2; + uint64_t reserved3; + uint64_t reserved4; + hsa_signal_t completion_signal; + }; + + union Aql { + AqlHeader header; + hsa_kernel_dispatch_packet_t dispatch; + hsa_barrier_and_packet_t barrier_and; + hsa_barrier_or_packet_t barrier_or; + BarrierValue barrier_value; + }; + + struct OCLHiddenArgs { + uint64_t offset_x; + uint64_t offset_y; + uint64_t offset_z; + void *printf_buffer; + void *enqueue; + void *enqueue2; + void *multi_grid; + }; + + bool LoadCodeObject(std::string filename, hsa_agent_t agent, + CodeObject &code_object) { + hsa_status_t err; + + code_object.file = open(filename.c_str(), O_RDONLY); + if (code_object.file == -1) { + abort(); + return false; + } + + err = hsa_code_object_reader_create_from_file(code_object.file, + &code_object.code_obj_rdr); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_executable_create_alt(HSA_PROFILE_FULL, + HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, + nullptr, &code_object.executable); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_executable_load_agent_code_object(code_object.executable, agent, + code_object.code_obj_rdr, + nullptr, nullptr); + if (err != HSA_STATUS_SUCCESS) return false; + + err = hsa_executable_freeze(code_object.executable, nullptr); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + return true; + } + + bool GetKernel(const CodeObject &code_object, std::string kernel, + hsa_agent_t agent, Kernel &kern) { + hsa_executable_symbol_t symbol; + hsa_status_t err = hsa_executable_get_symbol_by_name( + code_object.executable, kernel.c_str(), &agent, &symbol); + if (err != HSA_STATUS_SUCCESS) { + err = hsa_executable_get_symbol_by_name( + code_object.executable, (kernel + ".kd").c_str(), &agent, &symbol); + if (err != HSA_STATUS_SUCCESS) { + return false; + } + } + // printf("\nkernel-name: %s\n", kernel.c_str()); + err = hsa_executable_symbol_get_info( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kern.handle); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_executable_symbol_get_info( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, + &kern.scratch); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + // printf("Scratch: %d\n", kern.scratch); + + err = hsa_executable_symbol_get_info( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, + &kern.group); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + // printf("LDS: %d\n", kern.group); + + // Remaining needs code object v2 or comgr. + err = hsa_executable_symbol_get_info( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE, + &kern.kernarg_size); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + // printf("Kernarg Size: %d\n", kern.kernarg_size); + + err = hsa_executable_symbol_get_info( + symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT, + &kern.kernarg_align); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + // printf("Kernarg Align: %d\n", kern.kernarg_align); + + return true; + } + + // Not for parallel insertion. + bool SubmitPacket(hsa_queue_t *queue, Aql &pkt) { + size_t mask = queue->size - 1; + Aql *ring = static_cast(queue->base_address); + + uint64_t write = hsa_queue_load_write_index_relaxed(queue); + uint64_t read = hsa_queue_load_read_index_relaxed(queue); + if (write - read + 1 > queue->size) return false; + + Aql &dst = ring[write & mask]; + + uint16_t header = pkt.header.raw; + pkt.header.raw = dst.header.raw; + dst = pkt; + __atomic_store_n(&dst.header.raw, header, __ATOMIC_RELEASE); + pkt.header.raw = header; + + hsa_queue_store_write_index_release(queue, write + 1); + hsa_signal_store_screlease(queue->doorbell_signal, write); + + return true; + } + + void *hsaMalloc(size_t size, const Device::Memory &mem) { + void *ret; + hsa_status_t err = hsa_amd_memory_pool_allocate(mem.pool, size, 0, &ret); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_amd_agents_allow_access(Device::all_devices.size(), + &Device::all_devices[0], nullptr, ret); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + return ret; + } + + void *hsaMalloc(size_t size, const Device &dev, bool fine) { + uint32_t index = fine ? dev.fine : dev.coarse; + assert(index != -1u && "Memory type unavailable."); + return hsaMalloc(size, dev.pools[index]); + } + + bool DeviceDiscovery() { + hsa_status_t err; + err = hsa_iterate_agents( + [](hsa_agent_t agent, void *) { + hsa_status_t err; + + Device dev; + dev.agent = agent; + + dev.fine = -1u; + dev.coarse = -1u; + + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, dev.name); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + hsa_device_type_t type; + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &type); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_amd_agent_iterate_memory_pools( + agent, + [](hsa_amd_memory_pool_t pool, void *data) { + std::vector &pools = + *reinterpret_cast *>(data); + hsa_status_t err; + + hsa_amd_segment_t segment; + err = hsa_amd_memory_pool_get_info( + pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &segment); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + if (segment != HSA_AMD_SEGMENT_GLOBAL) + return HSA_STATUS_SUCCESS; + + uint32_t flags; + err = hsa_amd_memory_pool_get_info( + pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &flags); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + Device::Memory mem; + mem.pool = pool; + mem.fine = + (flags & HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_FINE_GRAINED); + mem.kernarg = + (flags & HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_KERNARG_INIT); + + err = hsa_amd_memory_pool_get_info( + pool, HSA_AMD_MEMORY_POOL_INFO_SIZE, &mem.size); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + err = hsa_amd_memory_pool_get_info( + pool, HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_GRANULE, + &mem.granule); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + pools.push_back(mem); + return HSA_STATUS_SUCCESS; + }, + static_cast(&dev.pools)); + + if (!dev.pools.empty()) { + for (size_t i = 0; i < dev.pools.size(); i++) { + if (dev.pools[i].fine && dev.pools[i].kernarg && dev.fine == -1u) + dev.fine = i; + if (dev.pools[i].fine && !dev.pools[i].kernarg) dev.fine = i; + if (!dev.pools[i].fine) dev.coarse = i; + } + + if (type == HSA_DEVICE_TYPE_CPU) + cpu.push_back(dev); + else + gpu.push_back(dev); + + Device::all_devices.push_back(dev.agent); + } + + return HSA_STATUS_SUCCESS; + }, + nullptr); + + []() { + for (auto &dev : cpu) { + for (auto &mem : dev.pools) { + if (mem.fine && mem.kernarg) { + kernarg = mem; + return; + } + } + } + }(); + ASSERT_EQ(err, HSA_STATUS_SUCCESS); + + if (cpu.empty() || gpu.empty() || kernarg.pool.handle == 0) return false; + return true; + } +}; +#endif // TESTS_FEATURETESTS_PROFILER_DISCRETETESTS_BINARY_MULTIQUEUE_TESTAPP_H_ diff --git a/tests/featuretests/profiler/discretetests/binary/multithreaded_test.cpp b/tests/featuretests/profiler/discretetests/binary/multithreaded_test.cpp new file mode 100644 index 00000000..49c26121 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multithreaded_test.cpp @@ -0,0 +1,103 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +/** \mainpage ROC Profiler Binary Test + * + * \section introduction Introduction + * + * The goal of this test is to test ROC profiler as a binary against a + * multithreaded application.Test application launches an empty kernel + * on multiple threads. + * + * The test then parses the csv and verifies if the nuber of kernel dispatches + * are equal to number of threads launched in test application. + * + * Test also does some basic verification if counter values are non-negative + */ + +#include +#include +#include +#include + +#include "utils/csv_parser.h" +#include "utils/test_utils.h" + +// kernel dispatch count test +int DispatchCountTest(std::string profiler_output) { + CSVParser parser; + parser.ParseCSV(profiler_output); + countermap counter_map = parser.GetCounterMap(); + + int dispatch_counter = 0; + for (auto i = 0; i < counter_map.size(); i++) { + std::string* dispatch_id = parser.ReadCounter(i, 1); + if (dispatch_id != nullptr) { + if (dispatch_id->find("dispatch") != std::string::npos) { + dispatch_counter++; + } + } + } + + // dispatch count test: Number of dispatches must be equal to + // number of kernel launches in test_app + if (dispatch_counter == GetNumberOfCores()) { + return 0; + } + return -1; +} + +std::string ReadProfilerBuffer(const char* cmd) { + std::vector buffer(1028); + std::string profiler_output; + std::unique_ptr pipe(popen(cmd, "r"), pclose); + if (!pipe) { + throw std::runtime_error("popen() failed!"); + } + while (fgets(buffer.data(), buffer.size(), pipe.get()) != nullptr) { + profiler_output += buffer.data(); + } + return profiler_output; +} + +std::string InitCounterTest() { + std::string input_path = GetRunningPath("profiler_multithreaded_test"); + std::string rocprofv2_path = GetRunningPath( + "build/tests/featuretests/profiler/profiler_multithreaded_test"); + std::stringstream command; + command << rocprofv2_path + "./rocprofv2 -i " + << input_path + "basic_metrics.txt " + << input_path + "multithreaded_testapp"; + std::string result = ReadProfilerBuffer(command.str().c_str()); + return result; +} + +int main(int argc, char** argv) { + int test_status = -1; + + // initialize kernel dispatch test + std::string profiler_output = InitCounterTest(); + // kernel dispatch count test + test_status = DispatchCountTest(profiler_output); + + return test_status; +} diff --git a/tests/featuretests/profiler/discretetests/binary/multithreaded_testapp.cpp b/tests/featuretests/profiler/discretetests/binary/multithreaded_testapp.cpp new file mode 100644 index 00000000..5654f173 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/binary/multithreaded_testapp.cpp @@ -0,0 +1,83 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ + +/** \mainpage ROC Profiler Multi-Threaded Test Application + * + * \section introduction Introduction + * + * Test application launches an empty kernel on multiple threads. + * + * In subsequent tests, ROC profiler is run against this applicaiton + * to confirm if collected contexts are valid. + * + */ + +#include + +#include +#include +#include + +#include "utils/test_utils.h" + +/** \mainpage ROC Profiler Test APplication + * + * \section introduction Introduction + * + * The goal of this test application is to launch an empty kernel + * on multiple threads and multiple gpu's. + * + * Number of threads are caluculated based on the cores in the system + * Number of gpus's are calculated based on the gpu's in the system + */ + +// empty kernel +__global__ void kernel() {} + +// launches kernel on multiple gpu's +void KernelLaunch() { + // Multi-GPU + int gpu_count = 0; + hipGetDeviceCount(&gpu_count); + + for (uint32_t gpu_id = 0; gpu_id < gpu_count; gpu_id++) { + // run empty kernel + kernel<<<1, 1>>>(); + } +} + +int main(int argc, char** argv) { + // create as many threads as number of cores in system + int threads_count = GetNumberOfCores(); + + // create a pool of thrads + std::vector threads(threads_count); + + // launch kernel on each thread + for (int n = 0; n < threads_count; ++n) { + threads[n] = std::thread(KernelLaunch); + } + // wait for all kernel launches to complete + for (int n = 0; n < threads_count; ++n) { + threads[n].join(); + } +} diff --git a/tests/featuretests/profiler/discretetests/run_discrete_tests.sh b/tests/featuretests/profiler/discretetests/run_discrete_tests.sh new file mode 100755 index 00000000..be2db2d5 --- /dev/null +++ b/tests/featuretests/profiler/discretetests/run_discrete_tests.sh @@ -0,0 +1,87 @@ +#!/bin/bash + +################################################################################ +# Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +################################################################################ + +# test filter input +test_filter=-1 +if [ -n "$1" ] ; then + test_filter=$1 +fi + +# test check routine +test_status=0 +test_runnum=0 +test_number=0 +failed_tests="Failed tests:" + +xeval_test() { + test_number=$test_number +} + +ncolors=$(tput colors || echo 0) +if [ -n "$ncolors" ] && [ $ncolors -ge 8 ]; then + bright="$(tput bold || echo)" + red="$(tput setaf 1 || echo)" + green="$(tput setaf 2 || echo)" + blue="$(tput setaf 4 || echo)" + normal="$(tput sgr0 || echo)" +fi + +eval_test() { + label=$1 + cmdline=$2 + test_name=$3 + if [ $test_filter = -1 -o $test_filter = $test_number ] ; then + echo "$label: \"$cmdline\"" + test_runnum=$((test_runnum + 1)) + eval "$cmdline" > /dev/null 2>&1 + if [ $? != 0 ] ; then + echo "${bright:-}${blue:-}$test_name: ${red:-}FAILED${normal:-}" + failed_tests="$failed_tests\n $test_number: \"$label\"" + test_status=$(($test_status + 1)) + else + echo "${bright:-}${blue:-}$test_name: ${green:-}PASSED${normal:-}" + fi + fi + test_number=$((test_number + 1)) +} + +CURRENT_DIR="$( dirname -- "$0"; )"; + +## Discrete multi-threaded/multi-gpu api test +eval_test "${bright:-}${green:-}running multi-threaded api test..."${normal:-} ${CURRENT_DIR}/profiler_api_test api_test + +## Discrete multi-process binary test +eval_test "${bright:-}${green:-}running multi-process binary test..."${normal:-} ${CURRENT_DIR}/profiler_multiprocess_test multiprocess_test + +## Discrete multi-threaded binary test +eval_test "${bright:-}${green:-}running multi-threaded binary test..."${normal:-} ${CURRENT_DIR}/profiler_multithreaded_test multithreaded_test + +## Discrete multi-queue binary test +#eval_test "${bright:-}${green:-}running multi-queue binary test..."${normal:-} ${CURRENT_DIR}/profiler_multiqueue_test multiqueue_test + +echo "$test_number tests total / $test_runnum tests run / $test_status tests failed" +if [ $test_status != 0 ] ; then + echo $failed_tests +fi +exit $test_status diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt new file mode 100755 index 00000000..df4ec4bc --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt @@ -0,0 +1,20 @@ +0x5faaa0 agent cpu +0x5fbb30 agent gpu +844346791235313 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +Enabling Counter Collection + System minor 0 + System major 9 + agent prop name AMD Radeon VII +input string: +GdkknVnqkc + +output string: +HelloWorld +Passed! +dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1531373), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140646043297024), kernel-name("helloworld"), start_time(844346969235689), end_time(844346969239689) +, GRBM_COUNT (20292) diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/hip_vectoradd_golden_traces.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/hip_vectoradd_golden_traces.txt new file mode 100755 index 00000000..fd3a6be1 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/hip_vectoradd_golden_traces.txt @@ -0,0 +1,16 @@ +0x2147aa0 agent cpu +0x2148b30 agent gpu +844395826761899 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +Enabling Counter Collection + System minor 0 + System major 9 + agent prop name AMD Radeon VII +hip Device prop succeeded +PASSED! +dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1531435), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140553153857024), kernel-name("vectoradd_float"), start_time(844396006072252), end_time(844396006104732) +, GRBM_COUNT (67002) diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/hsa_async_mem_copy_golden_traces.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/hsa_async_mem_copy_golden_traces.txt new file mode 100644 index 00000000..32d32097 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/hsa_async_mem_copy_golden_traces.txt @@ -0,0 +1,16 @@ +0xd1eeb0 agent cpu +0xd4b380 agent gpu +844434431085362 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +Enabling Counter Collection +Only 1 GPU found with required VRAM. Peer-to-Peer copy will be skipped. +CPU is "AMD Ryzen 9 5950X 16-Core Processor" +GPU1 is "gfx906" +Copying 4096 bytes from gpu1 memory to system memory... +Success! +Copying 4096 bytes from system memory to gpu1 memory... +Success! diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/input.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/input.txt new file mode 100644 index 00000000..2b59272b --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/input.txt @@ -0,0 +1 @@ +pmc: GRBM_COUNT \ No newline at end of file diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/mpi_vectoradd_golden_traces.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/mpi_vectoradd_golden_traces.txt new file mode 100755 index 00000000..82689621 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/mpi_vectoradd_golden_traces.txt @@ -0,0 +1,37 @@ +0x55e2aaab1540 agent cpu +0x55e2aab9f700 agent gpu +844463523587280 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +Enabling Counter Collection +0x12f9580 agent cpu +0x1340580 agent gpu +844463824808245 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +0xdcb320 agent cpu +0xe122e0 agent gpu +844463824808355 + + +ROCMTools: Collecting the following counters: +- GRBM_COUNT + +Enabling Counter Collection +Enabling Counter Collection +device count and rank is1: 2 +Rank Id: 0 | Device Id : 0 | Num Devices: 1 +device count and rank is1: 2 +Rank Id: 1 | Device Id : 0 | Num Devices: 1 +Max error: 0.000000 +Max error: 0.000000 +dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1531660), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140604145903232), kernel-name("add"), start_time(844464004374381), end_time(844464006775011) +, GRBM_COUNT (3724176) +dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1531661), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140242024941184), kernel-name("add"), start_time(844464004374753), end_time(844464006776661) +, GRBM_COUNT (3724418) diff --git a/tests/featuretests/profiler/gtests/apps/goldentraces/openmp_helloworld_golden_traces.txt b/tests/featuretests/profiler/gtests/apps/goldentraces/openmp_helloworld_golden_traces.txt new file mode 100755 index 00000000..769a2371 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/goldentraces/openmp_helloworld_golden_traces.txt @@ -0,0 +1,5 @@ +ROCMTools: Collecting the following counters: +- GRBM_COUNT +PASSED! +dispatch[2], gpu-id(0), kernel-name("hip_helloworld"), time(7853273641921013,7853273641924568) + GRBM_COUNT (21840) \ No newline at end of file diff --git a/tests/featuretests/profiler/gtests/apps/hip/hello_world.cpp b/tests/featuretests/profiler/gtests/apps/hip/hello_world.cpp new file mode 100755 index 00000000..a3842dba --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hip/hello_world.cpp @@ -0,0 +1,84 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +#include +#include +#include + +#include + +#include +#include +#include + +#define SUCCESS 0 +#define FAILURE 1 + +__global__ void helloworld(char *in, char *out) { + int num = hipThreadIdx_x + hipBlockDim_x * hipBlockIdx_x; + out[num] = in[num] + 1; +} + +int main(int argc, char *argv[]) { + hipDeviceProp_t devProp; + hipGetDeviceProperties(&devProp, 0); + std::cout << " System minor " << devProp.minor << std::endl; + std::cout << " System major " << devProp.major << std::endl; + std::cout << " agent prop name " << devProp.name << std::endl; + + /* Initial input,output for the host and create memory objects for the + * kernel*/ + const char *input = "GdkknVnqkc"; + size_t strlength = strlen(input); + std::cout << "input string:" << std::endl; + std::cout << input << std::endl; + char *output = reinterpret_cast(malloc(strlength + 1)); + + char *inputBuffer; + char *outputBuffer; + hipMalloc(reinterpret_cast(&inputBuffer), + (strlength + 1) * sizeof(char)); + hipMalloc(reinterpret_cast(&outputBuffer), + (strlength + 1) * sizeof(char)); + + hipMemcpy(inputBuffer, input, (strlength + 1) * sizeof(char), + hipMemcpyHostToDevice); + + hipLaunchKernelGGL(helloworld, dim3(1), dim3(strlength), 0, 0, inputBuffer, + outputBuffer); + + hipMemcpy(output, outputBuffer, (strlength + 1) * sizeof(char), + hipMemcpyDeviceToHost); + + hipFree(inputBuffer); + hipFree(outputBuffer); + + output[strlength] = '\0'; // Add the terminal character to the end of output. + std::cout << "\noutput string:" << std::endl; + std::cout << output << std::endl; + + free(output); + + std::cout << "Passed!\n"; + + return SUCCESS; +} diff --git a/tests/featuretests/profiler/gtests/apps/hip/hello_world_gtest.cpp b/tests/featuretests/profiler/gtests/apps/hip/hello_world_gtest.cpp new file mode 100755 index 00000000..e3294e2d --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hip/hello_world_gtest.cpp @@ -0,0 +1,88 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include + +#include "gtests/apps/profiler_gtest.h" + +constexpr auto kGoldenOutputHelloworld = "hip_helloworld_golden_traces.txt"; + +class HelloWorldTest : public ProfilerTest { + protected: + std::vector golden_kernel_info; + void SetUp() { + ProfilerTest::SetUp("hip_helloworld"); + GetKernelInfoForGoldenOutput("hip_helloworld", kGoldenOutputHelloworld, &golden_kernel_info); + } +}; + +// Test:1 Compares total num of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenKernelNumbersMatchWithGoldenOutput) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info.size(), current_kernel_info.size()); +} + +// Test:2 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenKernelNamessMatchWithGoldenOutput) { + // kernel info in current profiler run + std::vector current_kernel_info; + GetKernelInfoForRunningApplication(¤t_kernel_info); + + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info[0].kernel_name, current_kernel_info[0].kernel_name); + EXPECT_EQ(golden_kernel_info[1].kernel_name, current_kernel_info[1].kernel_name); +} + +// Test:3 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenKernelDurationShouldBePositive) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_GT(current_kernel_info.size(), 0); +} + +// Test:4 Compares end-time is greater than start-time in current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenEndTimeIsGreaterThenStartTime) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + for (auto& itr : current_kernel_info) { + if (!(itr.start_time).empty() && !(itr.end_time).empty()) { + EXPECT_GT(itr.end_time, itr.start_time); + } + } +} diff --git a/tests/featuretests/profiler/gtests/apps/hip/vector_add.cpp b/tests/featuretests/profiler/gtests/apps/hip/vector_add.cpp new file mode 100755 index 00000000..a12a641e --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hip/vector_add.cpp @@ -0,0 +1,130 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include +#include +#include + +#include +#include + +#include "hip/hip_runtime.h" + +#define HIP_ASSERT(x) (assert((x) == hipSuccess)) + +#define WIDTH 1024 +#define HEIGHT 1024 + +#define NUM (WIDTH * HEIGHT) + +#define THREADS_PER_BLOCK_X 16 +#define THREADS_PER_BLOCK_Y 16 +#define THREADS_PER_BLOCK_Z 1 + +__global__ void vectoradd_float(float *__restrict__ a, + const float *__restrict__ b, + const float *__restrict__ c, int width, + int height) { + int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x; + int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y; + + int i = y * width + x; + if (i < (width * height)) { + a[i] = b[i] + c[i]; + } +} + +int main() { + float *hostA; + float *hostB; + float *hostC; + + float *deviceA; + float *deviceB; + float *deviceC; + + hipDeviceProp_t devProp; + hipGetDeviceProperties(&devProp, 0); + std::cout << " System minor " << devProp.minor << std::endl; + std::cout << " System major " << devProp.major << std::endl; + std::cout << " agent prop name " << devProp.name << std::endl; + + std::cout << "hip Device prop succeeded " << std::endl; + + int i; + int errors; + + hostA = reinterpret_cast(malloc(NUM * sizeof(float))); + hostB = reinterpret_cast(malloc(NUM * sizeof(float))); + hostC = reinterpret_cast(malloc(NUM * sizeof(float))); + + // initialize the input data + for (i = 0; i < NUM; i++) { + hostB[i] = static_cast(i); + hostC[i] = static_cast(i) * 100.0f; + } + + HIP_ASSERT( + hipMalloc(reinterpret_cast(&deviceA), NUM * sizeof(float))); + HIP_ASSERT( + hipMalloc(reinterpret_cast(&deviceB), NUM * sizeof(float))); + HIP_ASSERT( + hipMalloc(reinterpret_cast(&deviceC), NUM * sizeof(float))); + + HIP_ASSERT( + hipMemcpy(deviceB, hostB, NUM * sizeof(float), hipMemcpyHostToDevice)); + HIP_ASSERT( + hipMemcpy(deviceC, hostC, NUM * sizeof(float), hipMemcpyHostToDevice)); + + hipLaunchKernelGGL( + vectoradd_float, + dim3(WIDTH / THREADS_PER_BLOCK_X, HEIGHT / THREADS_PER_BLOCK_Y), + dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0, 0, deviceA, deviceB, + deviceC, WIDTH, HEIGHT); + + HIP_ASSERT( + hipMemcpy(hostA, deviceA, NUM * sizeof(float), hipMemcpyDeviceToHost)); + + // verify the results + errors = 0; + for (i = 0; i < NUM; i++) { + if (hostA[i] != (hostB[i] + hostC[i])) { + errors++; + } + } + if (errors != 0) { + printf("FAILED: %d errors\n", errors); + } else { + printf("PASSED!\n"); + } + + HIP_ASSERT(hipFree(deviceA)); + HIP_ASSERT(hipFree(deviceB)); + HIP_ASSERT(hipFree(deviceC)); + + free(hostA); + free(hostB); + free(hostC); + + // hipResetDefaultAccelerator(); + + return errors; +} diff --git a/tests/featuretests/profiler/gtests/apps/hip/vector_add_gtest.cpp b/tests/featuretests/profiler/gtests/apps/hip/vector_add_gtest.cpp new file mode 100755 index 00000000..b929df45 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hip/vector_add_gtest.cpp @@ -0,0 +1,86 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include + +#include "gtests/apps/profiler_gtest.h" + +constexpr auto kGoldenOutputVectorAdd = "hip_vectoradd_golden_traces.txt"; + +class VectorAddTest : public ProfilerTest { + protected: + std::vector golden_kernel_info; + void SetUp() { + ProfilerTest::SetUp("hip_vectoradd"); + GetKernelInfoForGoldenOutput("hip_vectoradd", kGoldenOutputVectorAdd, &golden_kernel_info); + } +}; + +// Test:1 Compares total num of kernel-names in golden output against current +// profiler output +TEST_F(VectorAddTest, WhenRunningProfilerWithAppThenKernelNumbersMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info.size(), current_kernel_info.size()); +} + +// Test:2 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(VectorAddTest, WhenRunningProfilerWithAppThenKernelNamessMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info[0].kernel_name, current_kernel_info[0].kernel_name); + EXPECT_EQ(golden_kernel_info[1].kernel_name, current_kernel_info[1].kernel_name); +} + +// Test:3 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(VectorAddTest, WhenRunningProfilerWithAppThenKernelDurationShouldBePositive) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_GT(current_kernel_info.size(), 0); +} + +// Test:4 Compares end-time is greater than start-time in current +// profiler output +TEST_F(VectorAddTest, WhenRunningProfilerWithAppThenEndTimeIsGreaterThenStartTime) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + for (auto& itr : current_kernel_info) { + if (!(itr.start_time).empty() && !(itr.end_time).empty()) { + EXPECT_GT(itr.end_time, itr.start_time); + } + } +} diff --git a/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy.cpp b/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy.cpp new file mode 100644 index 00000000..4f079587 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy.cpp @@ -0,0 +1,387 @@ +/* + * ============================================================================= + * ROC Runtime Conformance Release License + * ============================================================================= + * The University of Illinois/NCSA + * Open Source License (NCSA) + * + * Copyright (c) 2017, Advanced Micro Devices, Inc. + * All rights reserved. + * + * Developed by: + * + * AMD Research and AMD ROC Software Development + * + * Advanced Micro Devices, Inc. + * + * www.amd.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal with the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * - Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimers. + * - Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimers in + * the documentation and/or other materials provided with the distribution. + * - Neither the names of , + * nor the names of its contributors may be used to endorse or promote + * products derived from this Software without specific prior written + * permission. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS WITH THE SOFTWARE. + * + */ +#include +#include + +#include "hsa/hsa.h" +#include "hsa/hsa_ext_amd.h" + +#define RET_IF_HSA_ERR(err) \ + { \ + if ((err) != HSA_STATUS_SUCCESS) { \ + const char *msg = 0; \ + hsa_status_string(err, &msg); \ + std::cout << "hsa api call failure at line " << __LINE__ \ + << ", file: " << __FILE__ << ". Call returned " << err \ + << std::endl; \ + std::cout << msg << std::endl; \ + return (err); \ + } \ + } + +static const uint32_t kTestFillValue1 = 0xabcdef12; +static const uint32_t kTestFillValue2 = 0xba5eba11; +static const uint32_t kTestFillValue3 = 0xfeed5a1e; +static const uint32_t kTestInitValue = 0xbaadf00d; + +// This structure holds an agent pointer and associated memory pool to be used +// for this test program. +struct async_mem_cpy_agent { + hsa_agent_t dev; + hsa_amd_memory_pool_t pool; + size_t granule; + void *ptr; +}; +struct async_mem_cpy_pool_query { + async_mem_cpy_agent *pool_info; + hsa_agent_t peer_device; +}; +struct callback_args { + struct async_mem_cpy_agent cpu; + struct async_mem_cpy_agent gpu1; + struct async_mem_cpy_agent gpu2; +}; +// Find the least common multiple of 2 numbers +static uint32_t lcm(uint32_t a, uint32_t b) { + int tmp_a; + int tmp_b; + tmp_a = a; + tmp_b = b; + while (tmp_a != tmp_b) { + if (tmp_a < tmp_b) { + tmp_a = tmp_a + a; + } else { + tmp_b = tmp_b + b; + } + } + return tmp_a; +} +// This function is a callback for hsa_amd_agent_iterate_memory_pools() +// and will test whether the provided memory pool is 1) in the GLOBAL +// segment, 2) allows allocation and 3) is accessible by the provided +// agent. The "data" input parameter is assumed to be pointing to a +// struct async_mem_cpy_agent. If the provided pool meets these criteria, +// HSA_STATUS_INFO_BREAK is returned. +static hsa_status_t FindPool(hsa_amd_memory_pool_t in_pool, void *data) { + hsa_amd_segment_t segment; + hsa_status_t err; + if (nullptr == data) { + return HSA_STATUS_ERROR_INVALID_ARGUMENT; + } + struct async_mem_cpy_pool_query *args = + (struct async_mem_cpy_pool_query *)data; + err = hsa_amd_memory_pool_get_info(in_pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, + &segment); + RET_IF_HSA_ERR(err); + if (segment != HSA_AMD_SEGMENT_GLOBAL) { + return HSA_STATUS_SUCCESS; + } + bool canAlloc; + err = hsa_amd_memory_pool_get_info( + in_pool, HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED, &canAlloc); + RET_IF_HSA_ERR(err); + if (!canAlloc) { + return HSA_STATUS_SUCCESS; + } + if (args->peer_device.handle != 0) { + hsa_amd_memory_pool_access_t access = + HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED; + err = hsa_amd_agent_memory_pool_get_info( + args->peer_device, in_pool, HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS, + &access); + RET_IF_HSA_ERR(err); + if (access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) { + return HSA_STATUS_SUCCESS; + } + } + err = hsa_amd_memory_pool_get_info( + in_pool, HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_GRANULE, + &args->pool_info->granule); + RET_IF_HSA_ERR(err); + args->pool_info->pool = in_pool; + return HSA_STATUS_INFO_BREAK; +} +// This function is meant to be a callback to hsa_iterate_agents. For each +// input agent the iterator provides as input, this function will check to +// see if the input agent is a CPU agent. If so, it will update the +// async_mem_cpy_agent structure pointed to by the input parameter "data". +// Return values: +// HSA_STATUS_INFO_BREAK -- CPU agent has been found and stored. Iterator +// should stop iterating +// HSA_STATUS_SUCCESS -- CPU agent has not yet been found; iterator +// should keep iterating +// Other -- Some error occurred +static hsa_status_t FindCPUDevice(hsa_agent_t agent, void *data) { + if (data == NULL) { + return HSA_STATUS_ERROR_INVALID_ARGUMENT; + } + hsa_device_type_t hsa_device_type; + hsa_status_t err = + hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &hsa_device_type); + RET_IF_HSA_ERR(err); + if (hsa_device_type == HSA_DEVICE_TYPE_CPU) { + struct async_mem_cpy_agent *args = (struct async_mem_cpy_agent *)data; + args->dev = agent; + async_mem_cpy_pool_query pool_query; + pool_query.peer_device.handle = 0; + pool_query.pool_info = args; + err = hsa_amd_agent_iterate_memory_pools(agent, FindPool, &pool_query); + if (err == HSA_STATUS_INFO_BREAK) { // we found what we were looking for + return HSA_STATUS_INFO_BREAK; + } else { + args->dev = {0}; + return err; + } + } + // Returning HSA_STATUS_SUCCESS tells the calling iterator to keep iterating + return HSA_STATUS_SUCCESS; +} +// This function is meant to be a callback to hsa_iterate_agents. It will +// attempt to find 2, or at least 1 GPU agent suitable for our test. The data +// input parameter should point to a callback_args struct. The 2 GPU fields +// will be updated as GPUs are discovered. +// Return values: +// HSA_STATUS_INFO_BREAK -- 2 GPU agents have been found and stored. Iterator +// should stop iterating +// HSA_STATUS_SUCCESS -- 2 GPU agents have not yet been found; 0 or 1 may +// have been found; iterator function should keep iterating +// Other -- Some error occurred +static hsa_status_t FindGPUs(hsa_agent_t agent, void *data) { + if (data == NULL) { + return HSA_STATUS_ERROR_INVALID_ARGUMENT; + } + hsa_device_type_t hsa_device_type; + hsa_status_t err = + hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &hsa_device_type); + RET_IF_HSA_ERR(err); + if (hsa_device_type != HSA_DEVICE_TYPE_GPU) { + return HSA_STATUS_SUCCESS; + } + struct callback_args *args = (struct callback_args *)data; + struct async_mem_cpy_agent *gpu; + async_mem_cpy_pool_query pool_query = {0, 0}; + if (args->gpu1.dev.handle == 0) { + gpu = &args->gpu1; + } else { + gpu = &args->gpu2; + // Check that gpu1 has peer access into the selected pool. + pool_query.peer_device = args->gpu1.dev; + } + // Make sure GPU device has pool host can access + gpu->dev = agent; + pool_query.pool_info = gpu; + err = hsa_amd_agent_iterate_memory_pools(agent, FindPool, &pool_query); + if (err == HSA_STATUS_INFO_BREAK) { + if (gpu == &args->gpu2) { + // We found 2 gpu's + return HSA_STATUS_INFO_BREAK; + } else { + // Keep looking for another gpu + return HSA_STATUS_SUCCESS; + } + } else { + gpu->dev = {0}; + } + RET_IF_HSA_ERR(err); + // Returning HSA_STATUS_SUCCESS tells the calling iterator to keep iterating + return HSA_STATUS_SUCCESS; +} +// This is the main test, showing various paths of async. copy. Source and +// destination agents and their respective pools should already be discovered. +// Additionally, buffer from the pools should already be allocated and availble +// from the input parameters. +static hsa_status_t AsyncCpyTest(async_mem_cpy_agent *dst, + async_mem_cpy_agent *src, callback_args *args, + size_t sz, uint32_t val) { + hsa_status_t err; + hsa_signal_t copy_signal; + // Initialize the system and destination buffers with a value so we can later + // validate it has been overwritten + void *sysPtr = args->cpu.ptr; + err = hsa_amd_memory_fill(sysPtr, kTestInitValue, sz / sizeof(uint32_t)); + RET_IF_HSA_ERR(err); + if (dst->ptr != sysPtr) { + err = hsa_amd_memory_fill(dst->ptr, kTestInitValue, sz / sizeof(uint32_t)); + RET_IF_HSA_ERR(err); + } + // Fill the source buffer with the provided uint32_t value + err = hsa_amd_memory_fill(src->ptr, val, sz / sizeof(uint32_t)); + RET_IF_HSA_ERR(err); + // Make sure the target and destination agents have access to the buffer. + hsa_agent_t ag_list[2] = {dst->dev, src->dev}; + err = hsa_amd_agents_allow_access(2, ag_list, NULL, dst->ptr); + RET_IF_HSA_ERR(err); + // Create a signal that will be used to inform us when the copy is done + err = hsa_signal_create(1, 0, NULL, ©_signal); + RET_IF_HSA_ERR(err); + // Do the copy... + err = hsa_amd_memory_async_copy(dst->ptr, dst->dev, src->ptr, src->dev, sz, 0, + NULL, copy_signal); + RET_IF_HSA_ERR(err); + // Here we do a blocking wait. Alternatively, we could also use a + // non-blocking wait in a loop, and do other work while waiting. + if (hsa_signal_wait_relaxed(copy_signal, HSA_SIGNAL_CONDITION_LT, 1, -1, + HSA_WAIT_STATE_BLOCKED) != 0) { + printf("Async copy returned error value.\n"); + return HSA_STATUS_ERROR; + } + // Verify the copy was successful; copy from the dst buffer to the sysBuf, + // (if the result is not already in sys. mem.) and check the sysBuf values + if (dst->ptr != sysPtr) { + if (src->ptr != sysPtr) { + // In this case, we need to give the gpu dev that owns dst->ptr access + // to the system memory we are going to copy to. + hsa_agent_t ag_list_ck[2] = {dst->dev, args->cpu.dev}; + err = hsa_amd_agents_allow_access(2, ag_list_ck, NULL, sysPtr); + RET_IF_HSA_ERR(err); + } + // Reset signal to 1 + hsa_signal_store_screlease(copy_signal, 1); + err = hsa_amd_memory_async_copy(sysPtr, args->cpu.dev, dst->ptr, dst->dev, + sz, 0, NULL, copy_signal); + RET_IF_HSA_ERR(err); + if (hsa_signal_wait_relaxed(copy_signal, HSA_SIGNAL_CONDITION_LT, 1, -1, + HSA_WAIT_STATE_BLOCKED) != 0) { + printf("Async copy returned error value.\n"); + return HSA_STATUS_ERROR; + } + } + // Check that the contents of the buffer are what is expected. + for (uint32_t i = 0; i < sz / sizeof(uint32_t); ++i) { + if (reinterpret_cast(sysPtr)[i] != val) { + fprintf(stdout, "Expected 0x%x but got 0x%x in buffer at index %d.\n", + val, reinterpret_cast(sysPtr)[i], i); + return HSA_STATUS_ERROR; + } + } + return HSA_STATUS_SUCCESS; +} +// This program illustrates the usage of the asynchronous copy capability of +// the RocR runtime library. The program will create a system memory buffer and +// a local buffer for each GPU, up to 2 GPUs, if the system has at least 2 +// GPUs. The program will copy data to/from the host from/to the GPU. If 2 +// GPUs are available, the program will also copy data from one to the other. +int main() { + hsa_status_t err; + struct callback_args args; + bool twoGPUs = false; + err = hsa_init(); + RET_IF_HSA_ERR(err); + // First, find the cpu agent and associated pool + args.cpu = {0, 0, 0}; + err = hsa_iterate_agents(FindCPUDevice, reinterpret_cast(&args.cpu)); + assert(err == HSA_STATUS_INFO_BREAK); + if (err != HSA_STATUS_INFO_BREAK) { + return -1; + } + // Now, find 1 or 2 (if possible) GPUs and associated pool(s) for our test + args.gpu1 = {0, 0, 0}; + args.gpu2 = {0, 0, 0}; + err = hsa_iterate_agents(FindGPUs, &args); + if (err == HSA_STATUS_INFO_BREAK) { + twoGPUs = true; + } else { + // See if we at least have 1 GPU + if (args.gpu1.dev.handle == 0) { + fprintf( + stdout, + "GPU with accessible VRAM not found; at least 1 required. Exiting\n"); + return -1; + } + fprintf(stdout, "Only 1 GPU found with required VRAM. " + "Peer-to-Peer copy will be skipped.\n"); + } + // We will use the smallest amount of allocatable memory that works for all + // potential sources and destinations of the copy + size_t sz = lcm(args.cpu.granule, args.gpu1.granule); + // Allocate memory on each source/destination + if (twoGPUs) { + sz = lcm(sz, args.gpu2.granule); + err = hsa_amd_memory_pool_allocate( + args.gpu2.pool, sz, 0, reinterpret_cast(&args.gpu2.ptr)); + RET_IF_HSA_ERR(err); + } + err = hsa_amd_memory_pool_allocate(args.cpu.pool, sz, 0, + reinterpret_cast(&args.cpu.ptr)); + RET_IF_HSA_ERR(err); + err = hsa_amd_memory_pool_allocate(args.gpu1.pool, sz, 0, + reinterpret_cast(&args.gpu1.ptr)); + RET_IF_HSA_ERR(err); + char name[64]; + err = hsa_agent_get_info(args.cpu.dev, HSA_AGENT_INFO_NAME, &name); + fprintf(stdout, "CPU is \"%s\"\n", name); + err = hsa_agent_get_info(args.gpu1.dev, HSA_AGENT_INFO_NAME, &name); + fprintf(stdout, "GPU1 is \"%s\"\n", name); + if (twoGPUs) { + err = hsa_agent_get_info(args.gpu2.dev, HSA_AGENT_INFO_NAME, &name); + fprintf(stdout, "GPU2 is \"%s\"\n", name); + } + fprintf(stdout, "Copying %lu bytes from gpu1 memory to system memory...\n", + sz); + err = AsyncCpyTest(&args.cpu, &args.gpu1, &args, sz, kTestFillValue1); + RET_IF_HSA_ERR(err); + fprintf(stdout, "Success!\n"); + fprintf(stdout, "Copying %lu bytes from system memory to gpu1 memory...\n", + sz); + err = AsyncCpyTest(&args.gpu1, &args.cpu, &args, sz, kTestFillValue2); + RET_IF_HSA_ERR(err); + fprintf(stdout, "Success!\n"); + if (twoGPUs) { + fprintf(stdout, "Copying %lu bytes from gpu1 memory to gpu2 memory...\n", + sz); + err = AsyncCpyTest(&args.gpu2, &args.gpu1, &args, sz, kTestFillValue3); + RET_IF_HSA_ERR(err); + fprintf(stdout, "Success!\n"); + } + // Clean up + err = hsa_amd_memory_pool_free(args.cpu.ptr); + RET_IF_HSA_ERR(err); + err = hsa_amd_memory_pool_free(args.gpu1.ptr); + RET_IF_HSA_ERR(err); + if (twoGPUs) { + err = hsa_amd_memory_pool_free(args.gpu2.ptr); + RET_IF_HSA_ERR(err); + } +} diff --git a/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy_gtest.cpp b/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy_gtest.cpp new file mode 100755 index 00000000..1a645322 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/hsa/async_mem_copy_gtest.cpp @@ -0,0 +1,54 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include + +#include +#include + +#include "gtests/apps/profiler_gtest.h" + +constexpr auto kGOldenOutputAsyncCopy = "hsa_async_mem_copy_golden_traces.txt"; + +class HSATest : public ProfilerTest { + protected: + std::vector golden_kernel_info; + void SetUp() { + ProfilerTest::SetUp("hsa_async_mem_copy"); + GetKernelInfoForGoldenOutput("hsa_async_mem_copy", kGOldenOutputAsyncCopy, + &golden_kernel_info); + } +}; + +// Test:1 Given profiler don't intercept any hsa calls in this app +// we dont collect any counters by default. Expectation is, both vectors are +// empty +TEST_F(HSATest, + WhenRunningProfilerWithAppThenKernelNumbersMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + + EXPECT_EQ(current_kernel_info.size(), 0); + EXPECT_EQ(golden_kernel_info.size(), 0); + + EXPECT_EQ(golden_kernel_info.size(), current_kernel_info.size()); +} diff --git a/tests/featuretests/profiler/gtests/apps/mpi/mpi_run.sh b/tests/featuretests/profiler/gtests/apps/mpi/mpi_run.sh new file mode 100755 index 00000000..7a2ff0b6 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/mpi/mpi_run.sh @@ -0,0 +1,37 @@ +' +-------------------------------------------------------------------------- +Running as root is *strongly* discouraged as any mistake (e.g., in +defining TMPDIR) or bug can result in catastrophic damage to the OS +file system, leaving your system in an unusable state. + +We strongly suggest that you run mpirun as a non-root user. + +You can override this protection by adding the --allow-run-as-root option +to the cmd line or by setting two environment variables in the following way: +the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this +protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and +add one more layer of certainty that you want to do so. +We reiterate our advice against doing so - please proceed at your own risk. +-------------------------------------------------------------------------- +' + +MPIRUN=mpirun +if ! command -v $MPIRUN &> /dev/null +then + echo "$MPIRUN could not be found. checking libs" + if [ -f "/usr/lib64/openmpi/bin/mpirun" ] + then + MPIRUN=/usr/lib64/openmpi/bin/mpirun + else + if [ -f "/usr/lib64/mpi/gcc/openmpi2/bin/mpirun" ] + then + MPIRUN=/usr/lib64/mpi/gcc/openmpi2/bin/mpirun + else + echo "$MPIRUN could not be found. exiting" + exit + fi + fi +fi +SCRIPT=$(realpath "$0") +SCRIPTPATH=$(dirname "$SCRIPT") +$MPIRUN --allow-run-as-root -np 2 $SCRIPTPATH/mpi_vectoradd mdrun -pin on -nsteps 10 -resetstep 9 -ntomp 64 -noconfout -nb gpu -bonded gpu -pme gpu -v -gpu_id 0 diff --git a/tests/featuretests/profiler/gtests/apps/mpi/vector_add.cpp b/tests/featuretests/profiler/gtests/apps/mpi/vector_add.cpp new file mode 100755 index 00000000..f0d406a6 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/mpi/vector_add.cpp @@ -0,0 +1,132 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include + +#include +#include + +#include + +#include "hip/hip_runtime.h" + +#define HIP_RC(call) \ + do { \ + hipError_t err = call; \ + if (hipSuccess != err) { \ + printf("HIP ERROR (code = %d, %s) at %s:%d\n", err, \ + hipGetErrorString(err), __FILE__, __LINE__); \ + assert(0); \ + exit(1); \ + } \ + } while (0) + +#define HIP_KL(call) \ + do { \ + call; \ + hipError_t err = hipGetLastError(); \ + if (hipSuccess != err) { \ + printf("HIP ERROR (code = %d, %s) at %s:%d\n", err, \ + hipGetErrorString(err), __FILE__, __LINE__); \ + assert(0); \ + exit(1); \ + } \ + } while (0) + +// CUDA kernel to add elements of two arrays +__global__ void add(int n, float *x, float *y) { + int index = blockIdx.x * blockDim.x + threadIdx.x; + int stride = blockDim.x * gridDim.x; + for (int i = index; i < n; i += stride) + y[i] = x[i] + y[i]; +} + +int main(int argc, char *argv[]) { + int N = 1 << 20; + float *x = new float[N]; + float *y = new float[N]; + float *d_x; + float *d_y; + + int myId; + int devId; + int numRank; + int deviceCount; + + // init MPI + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &myId); + MPI_Comm_size(MPI_COMM_WORLD, &numRank); + hipGetDeviceCount(&deviceCount); + + std::cout << "device count and rank is" << deviceCount << ": " << numRank + << std::endl; + + // set the device ID to the rank ID mod deviceCount (in this case 4 since + // there are 4 devices on a node) + devId = myId % deviceCount; + // set the device ID + hipSetDevice(devId); + + printf("Rank Id: %d | Device Id : %d | Num Devices: %d\n", myId, devId, + deviceCount); + fflush(stdout); + + // Allocate Unified Memory -- accessible from CPU or GPU + hipMallocManaged(&d_x, N * sizeof(float)); + hipMallocManaged(&d_y, N * sizeof(float)); + + // initialize x and y arrays on the host + for (int i = 0; i < N; i++) { + x[i] = 1.0f; + y[i] = 2.0f; + } + + HIP_RC(hipMemcpy(d_x, x, N * sizeof(float), hipMemcpyHostToDevice)); + HIP_RC(hipMemcpy(d_y, y, N * sizeof(float), hipMemcpyHostToDevice)); + + // Launch kernel on 1M elements on the GPU + int blockSize = 256; + int numBlocks = (N + blockSize - 1) / blockSize; + HIP_KL(hipLaunchKernelGGL(add, numBlocks, blockSize, 0, 0, N, d_x, d_y)); + + // Wait for GPU to finish before accessing on host + HIP_RC(hipDeviceSynchronize()); + + HIP_RC(hipMemcpy(x, d_x, N * sizeof(float), hipMemcpyDeviceToHost)); + HIP_RC(hipMemcpy(y, d_y, N * sizeof(float), hipMemcpyDeviceToHost)); + + // Check for errors (all values should be 3.0f) + float maxError = 0.0f; + for (int i = 0; i < N; i++) + maxError = fmax(maxError, fabs(y[i] - 3.0f)); + printf("Max error: %f\n", maxError); + + // Free memory + HIP_RC(hipFree(d_x)); + HIP_RC(hipFree(d_y)); + + delete[] x; + delete[] y; + + MPI_Finalize(); + return 0; +} diff --git a/tests/featuretests/profiler/gtests/apps/mpi/vector_add_gtest.cpp b/tests/featuretests/profiler/gtests/apps/mpi/vector_add_gtest.cpp new file mode 100755 index 00000000..14073ef3 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/mpi/vector_add_gtest.cpp @@ -0,0 +1,94 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include +#include + +#include "gtests/apps/profiler_gtest.h" +#include "utils/test_utils.h" + +constexpr auto kGoldenOutputMpi = "mpi_vectoradd_golden_traces.txt"; + +class MPITest : public ProfilerTest { + protected: + void ProcessMPIApplication(const char *app_name); + void ExecuteAndParseApplication(std::stringstream &ss); + + void SetUp() { + /*To supress No protocol found prints*/ + setenv("HWLOC_COMPONENTS", "-gl", 1); + + // run as standalone test + ProfilerTest::SetUp("mpi_vectoradd"); + + // run mpirun script + // ProcessMPIApplication("mpi_run.sh"); + } +}; + +void MPITest::ProcessMPIApplication(const char *app_name) { + std::string app_path = + GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + std::string lib_path = app_path; + + std::stringstream hsa_tools_lib_path; + + hsa_tools_lib_path << app_path << "librocprofiler_tool.so"; + setenv("LD_PRELOAD", hsa_tools_lib_path.str().c_str(), true); + + std::stringstream os; + os << app_path << "tests/featuretests/profiler/gtests/apps/" << app_name; + ExecuteAndParseApplication(os); +} + +void MPITest::ExecuteAndParseApplication(std::stringstream &ss) { + FILE *handle = popen(ss.str().c_str(), "r"); + ASSERT_NE(handle, nullptr); + char *ln{NULL}; + std::string temp{""}; + size_t len{0}; + + while (getline(&ln, &len, handle) != -1) { + temp = temp + std::string(ln); + } + + free(ln); + size_t pos{0}; + std::string delimiter{"\n"}; + while ((pos = temp.find(delimiter)) != std::string::npos) { + output_lines.push_back(temp.substr(0, pos)); + temp.erase(0, pos + delimiter.length()); + } + + pclose(handle); +} + +// Test:1 Compares total num of kernel-names in golden output against current +// profiler output +TEST_F(MPITest, + WhenRunningProfilerWithAppThenKernelNumbersMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_GT(current_kernel_info.size(), 0); +} diff --git a/tests/featuretests/profiler/gtests/apps/openmp/hello_world.cpp b/tests/featuretests/profiler/gtests/apps/openmp/hello_world.cpp new file mode 100755 index 00000000..d1e7ec62 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/openmp/hello_world.cpp @@ -0,0 +1,91 @@ +/* +Copyright (c) 2020-present Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +// OpenMP program to print Hello World +// using C language is supported by HIP + +// OpenMP header +#include + +#include +#include + +// HIP header +#include + +#define NUM_THREADS 16 +#define CHECK(cmd) \ + { \ + hipError_t error = cmd; \ + if (error != hipSuccess) { \ + fprintf(stderr, "error: '%s'(%d) at %s:%d\n", hipGetErrorString(error), \ + error, __FILE__, __LINE__); \ + exit(EXIT_FAILURE); \ + } \ + } + +__global__ void hip_helloworld(unsigned omp_id, int *A_d) { + // Note: the printf command will only work if printf is enabled in your build. + // printf("Hello World... from HIP thread = %u\n", omp_id); + + A_d[omp_id] = omp_id; +} + +int main(int argc, char *argv[]) { + int *A_h, *A_d; + size_t Nbytes = NUM_THREADS * sizeof(int); + + hipDeviceProp_t props; + CHECK(hipGetDeviceProperties(&props, 0 /*deviceID*/)); + // printf("info: running on device %s\n", props.name); + + A_h = reinterpret_cast(malloc(Nbytes)); + CHECK(hipMalloc(&A_d, Nbytes)); + for (int i = 0; i < NUM_THREADS; i++) { + A_h[i] = 0; + } + CHECK(hipMemcpy(A_d, A_h, Nbytes, hipMemcpyHostToDevice)); + +// Beginning of parallel region +#pragma omp parallel num_threads(NUM_THREADS) + { + // fprintf(stderr, "Hello World... from OMP thread = %d\n", + // omp_get_thread_num()); + + hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, + omp_get_thread_num(), A_d); + } + // Ending of parallel region + + hipStreamSynchronize(0); + CHECK(hipMemcpy(A_h, A_d, Nbytes, hipMemcpyDeviceToHost)); + // printf("Device Results:\n"); + for (int i = 0; i < NUM_THREADS; i++) { + // printf(" A_d[%d] = %d\n", i, A_h[i]); + } + + printf("PASSED!\n"); + + free(A_h); + CHECK(hipFree(A_d)); + return 0; +} diff --git a/tests/featuretests/profiler/gtests/apps/openmp/hello_world_gtest.cpp b/tests/featuretests/profiler/gtests/apps/openmp/hello_world_gtest.cpp new file mode 100755 index 00000000..8fb38749 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/openmp/hello_world_gtest.cpp @@ -0,0 +1,94 @@ +/* +Copyright (c) 2020-present Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +#include + +#include "gtests/apps/profiler_gtest.h" + +constexpr auto kGoldenOutputOpenMP = "openmp_helloworld_golden_traces.txt"; + +class OpenMPTest : public ProfilerTest { + protected: + std::vector golden_kernel_info; + void SetUp() { + ProfilerTest::SetUp("openmp_helloworld"); + GetKernelInfoForGoldenOutput("openmp_helloworld", kGoldenOutputOpenMP, + &golden_kernel_info); + } +}; + +// Test:1 Compares total num of kernel-names in golden output against current +// profiler output +TEST_F(OpenMPTest, + WhenRunningProfilerWithAppThenKernelNumbersMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info.size(), current_kernel_info.size()); +} + +// Test:2 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(OpenMPTest, + WhenRunningProfilerWithAppThenKernelNamessMatchWithGoldenOutput) { + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info[0].kernel_name, + current_kernel_info[0].kernel_name); + EXPECT_EQ(golden_kernel_info[1].kernel_name, + current_kernel_info[1].kernel_name); +} + +// Test:3 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(OpenMPTest, + WhenRunningProfilerWithAppThenKernelDurationShouldBePositive) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_GT(current_kernel_info.size(), 0); +} + +// Test:4 Compares end-time is greater than start-time in current +// profiler output +TEST_F(OpenMPTest, + WhenRunningProfilerWithAppThenEndTimeIsGreaterThenStartTime) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + for (auto &itr : current_kernel_info) { + if (!(itr.end_time).empty()) { + EXPECT_GT(itr.end_time, itr.start_time); + } + } +} diff --git a/tests/featuretests/profiler/gtests/apps/profiler_gtest.cpp b/tests/featuretests/profiler/gtests/apps/profiler_gtest.cpp new file mode 100644 index 00000000..6a8496c0 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/profiler_gtest.cpp @@ -0,0 +1,187 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include "gtests/apps/profiler_gtest.h" +#include "utils/test_utils.h" + +/** + * Sets application enviornment by seting HSA_TOOLS_LIB. + */ +void ApplicationParser::SetApplicationEnv(const char* app_name) { + std::string app_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + + std::stringstream counter_path; + counter_path << app_path << "tests/featuretests/profiler/gtests/apps/goldentraces/input.txt"; + setenv("COUNTERS_PATH", counter_path.str().c_str(), true); + + std::stringstream hsa_tools_lib_path; + hsa_tools_lib_path << app_path << "librocprofiler_tool.so"; + setenv("LD_PRELOAD", hsa_tools_lib_path.str().c_str(), true); + + std::stringstream os; + os << app_path << "tests/featuretests/profiler/gtests/apps/" << app_name; + + ProcessApplication(os); +} + +/** + * Parses kernel-info after running profiler against curent application + * and saves them in a vector. + */ +void ApplicationParser::GetKernelInfoForRunningApplication( + std::vector* kernel_info_output) { + KernelInfo kinfo; + for (std::string line : output_lines) { + if (std::regex_match(line, std::regex("(dispatch)(.*)"))) { + int spos = line.find("["); + int epos = line.find("]", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.dispatch_id = sub; + kernel_info_output->push_back(kinfo); + + // Kernel-Name + size_t found = line.find("kernel-name"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(")", spos); + int length = std::string("kernel-name").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + + kinfo.kernel_name = sub; + kernel_info_output->push_back(kinfo); + } + // Start-Time + found = line.find("start_time"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(",", spos); + int length = std::string("start_time").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + kinfo.start_time = sub; + kernel_info_output->push_back(kinfo); + } + // End-Time + found = line.find("end_time"); + if (found != std::string::npos) { + int spos = line.find(",", found); + int epos = line.find(")", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.end_time = sub; + kernel_info_output->push_back(kinfo); + } + } + } +} + +/** + * Parses kernel-names from a pre-saved golden out files + * and saves them in a vector. + */ +void ApplicationParser::GetKernelInfoForGoldenOutput(const char* app_name, std::string file_name, + std::vector* kernel_info_output) { + std::string entry; + std::string path = GetRunningPath("runFeatureTests"); + entry = path.append("gtests/apps/goldentraces/") + file_name; + // parse kernel info fields for golden output + ParseKernelInfoFields(entry, kernel_info_output); +} + +/** + * Runs a given appllication and saves profiler output. + * These output lines can be letter passed for kernel informations + * i.e: kernel_names + */ +void ApplicationParser::ProcessApplication(std::stringstream& ss) { + FILE* handle = popen(ss.str().c_str(), "r"); + ASSERT_NE(handle, nullptr); + + char* ln{NULL}; + std::string temp{""}; + size_t len{0}; + + while (getline(&ln, &len, handle) != -1) { + temp = temp + std::string(ln); + } + + free(ln); + size_t pos{0}; + std::string delimiter{"\n"}; + while ((pos = temp.find(delimiter)) != std::string::npos) { + output_lines.push_back(temp.substr(0, pos)); + temp.erase(0, pos + delimiter.length()); + } + + pclose(handle); +} + +/** + * Parses kernel-info for golden output file + * and saves them in a vector. + */ +void ApplicationParser::ParseKernelInfoFields(const std::string& s, + std::vector* kernel_info_output) { + std::string line; + KernelInfo kinfo; + + std::ifstream golden_file(s); + while (!golden_file.eof()) { + getline(golden_file, line); + if (std::regex_match(line, std::regex("(dispatch)(.*)"))) { + int spos = line.find("["); + int epos = line.find("]", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.dispatch_id = sub; + kernel_info_output->push_back(kinfo); + + // Kernel-Name + size_t found = line.find("kernel-name"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(")", spos); + int length = std::string("kernel-name").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + + kinfo.kernel_name = sub; + kernel_info_output->push_back(kinfo); + } + // Start-Time + found = line.find("start_time"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(",", spos); + int length = std::string("start_time").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + kinfo.start_time = sub; + kernel_info_output->push_back(kinfo); + } + // End-Time + found = line.find("end_time"); + if (found != std::string::npos) { + int spos = line.find(",", found); + int epos = line.find(")", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.end_time = sub; + kernel_info_output->push_back(kinfo); + } + } + } + golden_file.close(); +} diff --git a/tests/featuretests/profiler/gtests/apps/profiler_gtest.h b/tests/featuretests/profiler/gtests/apps/profiler_gtest.h new file mode 100644 index 00000000..c6e123c0 --- /dev/null +++ b/tests/featuretests/profiler/gtests/apps/profiler_gtest.h @@ -0,0 +1,107 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_PROFILER_GTESTS_APPS_PROFILER_GTEST_H_ +#define TESTS_FEATURETESTS_PROFILER_GTESTS_APPS_PROFILER_GTEST_H_ + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +/* --------------------------------------------------------------------------*/ +/** + * @Synopsis Implementation of a Parser class for Profiler output + * Parses pre-saved golden output for kernel info and saves them in a vector + * Executes appliaction(passed as param:app_name) and saves parsed kernel info + * in a vector. + * Subsequent tests can use this to parse different applications + */ +/* --------------------------------------------------------------------------*/ + +class ApplicationParser : public ::testing::Test { + protected: + virtual void SetUp(const char *app_name) { SetApplicationEnv(app_name); } + virtual void TearDown() {} + //!< This can be appended for other kernel info fields; eg: Agent-Name etc. + struct KernelInfo { + std::string dispatch_id; + std::string gpu_id; + std::string queue_id; + std::string queue_index; + std::string pid; + std::string tid; + std::string obj; + std::string kernel_name; + std::string start_time; + std::string end_time; + }; + + //!< saves lines of profiler output + std::vector output_lines; + + public: + //!< Sets application enviornment by seting HSA_TOOLS_LIB. + void SetApplicationEnv(const char *app_name); + + //!< Parses kernel-info from a pre-saved golden out files + // and saves them in a vector. + void GetKernelInfoForGoldenOutput( + const char *app_name, std::string filename, + std::vector *kernel_info_output); + + //!< Parses kernel-info after running profiler against curent application + // and saves them in a vector. + void GetKernelInfoForRunningApplication( + std::vector *kernel_info_output); + + private: + //!< Runs a given appllication and saves profiler output. + // These output lines can be letter passed for kernel informations + // i.e: kernel_names + void ProcessApplication(std::stringstream &ss); + + //!< Parses kernel info fields from given input + // i.e: kernel_names, kernel_duration + void ParseKernelInfoFields(const std::string &s, + std::vector *kernel_info_output); +}; + +/* --------------------------------------------------------------------------*/ +/** + * @Synopsis Implementation of a ProfilerTest + * Subsequent tests can use this to parse different applications + */ +/* --------------------------------------------------------------------------*/ + +class ProfilerTest : public ApplicationParser { + protected: + virtual void SetUp(const char *app_name) { + ApplicationParser::SetUp(app_name); + } +}; +#endif // TESTS_FEATURETESTS_PROFILER_GTESTS_APPS_PROFILER_GTEST_H_ diff --git a/tests/featuretests/profiler/gtests/functional/loadunload_gtest.cpp b/tests/featuretests/profiler/gtests/functional/loadunload_gtest.cpp new file mode 100644 index 00000000..f5678206 --- /dev/null +++ b/tests/featuretests/profiler/gtests/functional/loadunload_gtest.cpp @@ -0,0 +1,67 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ +#include +#include + +// Run 2 loops of {hsa_init(); hsa_iterate_agents(); hsa_shut_down()} to test +// that the profiler tool correctly unloaded after the 1st iteration and then +// reloaded for the 2nd iteration. +class LoadUnloadTest : public ::testing::Test { + protected: + virtual void SetUp() { + // start basic app + hsa_init(); + } + + virtual void TearDown() { + // stop basic app and unset tools lib + hsa_shut_down(); + } +}; + +TEST_F(LoadUnloadTest, WhenLoadingFirstTimeThenToolLoadsUnloadsSuccessfully) { + // Tool loaded in the setup + // Tool unloaded in teardown + + // iterate for gpu's + hsa_status_t status = hsa_iterate_agents( + [](hsa_agent_t agent, void *) { + hsa_device_type_t type; + return hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &type); + }, + nullptr); + + EXPECT_EQ(HSA_STATUS_SUCCESS, status); +} + +TEST_F(LoadUnloadTest, WhenLoadingSecondTimeThenToolLoadsUnloadsSuccessfully) { + // Tool loaded in the setup + // Tool unloaded in teardown + + // iterate for gpu's + hsa_status_t status = hsa_iterate_agents( + [](hsa_agent_t agent, void *) { + hsa_device_type_t type; + return hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &type); + }, + nullptr); + + EXPECT_EQ(HSA_STATUS_SUCCESS, status); +} diff --git a/tests/featuretests/profiler/gtests/functional/multithread_gtest.cpp b/tests/featuretests/profiler/gtests/functional/multithread_gtest.cpp new file mode 100644 index 00000000..96a273b9 --- /dev/null +++ b/tests/featuretests/profiler/gtests/functional/multithread_gtest.cpp @@ -0,0 +1,152 @@ +/****************************************************************************** +Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*******************************************************************************/ +#include +#include +#include + +#include +#include +#include +#include + +#include "src/utils/helper.h" +#include "utils/test_utils.h" + +/** \mainpage ROC Profiler API Test + * + * \section introduction Introduction + * + * The goal of this test is to check ROCmTools APIs from multiple threads + * and verify if each API succeeds and multiple contexts are collected and + * verified. + * + * An empty kernel is launched on multiple threads and profiling context is + * collected and verified from each thread. + */ + +// empty kernel +__global__ void kernel() {} + +class ProfilerAPITest : public ::testing::Test { + protected: + // function to check profiler API status + static void CheckApi(rocprofiler_status_t status) { + ASSERT_EQ(status, ROCPROFILER_STATUS_SUCCESS); + }; + + // launches an empty kernel in profiler context + static void KernelLaunch() { + // run empty kernel + kernel<<<1, 1>>>(); + hipDeviceSynchronize(); + } + + // callback function to dump profiler data + static void FlushCallback(const rocprofiler_record_header_t* record, + const rocprofiler_record_header_t* end_record, + int64_t session_id) { + while (record < end_record) { + if (!record) break; + if (record->kind == ROCPROFILER_PROFILER_RECORD) { + const rocprofiler_record_profiler_t* profiler_record = + reinterpret_cast(record); + size_t name_length; + CheckApi(rocprofiler_query_kernel_info_size( + ROCPROFILER_KERNEL_NAME, profiler_record->kernel_id, &name_length)); + const char* kernel_name_c = + static_cast(malloc(name_length * sizeof(char))); + CheckApi(rocprofiler_query_kernel_info(ROCPROFILER_KERNEL_NAME, + profiler_record->kernel_id, + &kernel_name_c)); + int gpu_index = profiler_record->gpu_id.handle; + uint64_t start_time = profiler_record->timestamps.begin.value; + uint64_t end_time = profiler_record->timestamps.end.value; + + // Check for each kernel if endtime > starttime + ASSERT_GT(end_time, start_time); + + // Check for each kernel name_length is +ve + ASSERT_GT(name_length, 0); + + // Check kernel name + ASSERT_EQ( + rocmtools::truncate_name(rocmtools::cxx_demangle(kernel_name_c)), + "kernel"); + } + CheckApi(rocprofiler_next_record(record, &record)); + } + } +}; + +TEST_F(ProfilerAPITest, WhenRunningMultipleThreadsProfilerAPIsWorkFine) { + // Get the system cores + int num_cpu_cores = GetNumberOfCores(); + + // create as many threads as number of cores in system + std::vector threads(num_cpu_cores); + + // inititalize profiler by creating rocmtool object + CheckApi(rocprofiler_initialize()); + + // Counter Collection with timestamps + rocprofiler_session_id_t session_id; + std::vector counters; + counters.emplace_back("SQ_WAVES"); + CheckApi(rocprofiler_create_session(ROCPROFILER_APPLICATION_REPLAY_MODE, + &session_id)); + CheckApi(rocprofiler_add_session_mode(session_id, ROCPROFILER_ASYNC_FLUSH, + ROCPROFILER_COUNTERS_COLLECTION)); + CheckApi(rocprofiler_set_session_async_callback( + session_id, ROCPROFILER_COUNTERS_COLLECTION, + rocprofiler_session_buffer_size_t{0x8000}, FlushCallback, + rocprofiler_flush_buffer_interval_t{0})); + rocprofiler_filter_t filter{ROCPROFILER_FILTER_PROFILER_COUNTER_NAMES, + &counters[0], + rocprofiler_filter_data_count_t{counters.size()}}; + CheckApi(rocprofiler_session_set_filters(ROCPROFILER_COUNTERS_COLLECTION, + &filter, rocprofiler_filters_count_t{1}, + session_id)); + + // activating profiler session + CheckApi(rocprofiler_start_session(session_id)); + + // launch kernel on each thread + for (int n = 0; n < num_cpu_cores; ++n) { + threads[n] = std::thread(KernelLaunch); + } + + // wait for all kernel launches to complete + for (int n = 0; n < num_cpu_cores; ++n) { + threads[n].join(); + } + // dump profiler data + CheckApi(rocprofiler_flush_data(session_id)); + + // deactivate session + CheckApi(rocprofiler_terminate_session(session_id)); + + // destroy session + CheckApi(rocprofiler_destroy_session(session_id)); + + // finalize profiler by destroying rocmtool object + CheckApi(rocprofiler_finalize()); +} diff --git a/tests/featuretests/profiler/gtests/gtests_main.cpp b/tests/featuretests/profiler/gtests/gtests_main.cpp new file mode 100644 index 00000000..7017cb06 --- /dev/null +++ b/tests/featuretests/profiler/gtests/gtests_main.cpp @@ -0,0 +1,10 @@ +#include + +// Entry Point for Gtests Infra + +int main(int argc, char **argv) { + testing::InitGoogleTest(&argc, argv); + testing::FLAGS_gtest_death_test_style = "threadsafe"; + //testing::GTEST_FLAG(filter)="-HSATest.*"; + return RUN_ALL_TESTS(); +} diff --git a/tests/featuretests/profiler/profiler_gtest.cpp b/tests/featuretests/profiler/profiler_gtest.cpp index 7f942359..8a033c8d 100644 --- a/tests/featuretests/profiler/profiler_gtest.cpp +++ b/tests/featuretests/profiler/profiler_gtest.cpp @@ -25,7 +25,7 @@ THE SOFTWARE. #include #include #include -#include +#include "rocprofiler.h" #include #include #include @@ -34,23 +34,53 @@ THE SOFTWARE. #include "utils/test_utils.h" #include "utils/csv_parser.h" + +std::string running_path; +std::string lib_path; +std::string golden_trace_path; +std::string test_app_path; +std::string metrics_path; +std::string binary_path; + +static void init_test_path() { + if (is_installed_path()) { + running_path = "share/rocprofiler/tests/runFeatureTests"; + lib_path = "lib/librocprofiler_tool.so"; + golden_trace_path = "share/rocprofiler/tests/featuretests/profiler/apps/goldentraces/"; + test_app_path = "share/rocprofiler/tests/featuretests/profiler/apps/"; + metrics_path = "lib/rocprofiler/gfx_metrics.xml"; + binary_path = "bin/rocprofv2"; + } else { + running_path = "tests/featuretests/profiler/runFeatureTests"; + lib_path = "librocprofiler_tool.so"; + golden_trace_path = "tests/featuretests/profiler/apps/goldentraces/"; + test_app_path = "tests/featuretests/profiler/apps/"; + metrics_path = "gfx_metrics.xml"; + binary_path = "rocprofv2"; + } +} + /** * Sets application enviornment by seting HSA_TOOLS_LIB. */ void ApplicationParser::SetApplicationEnv(const char* app_name) { - std::string app_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + std::string app_path; + + // set global path + init_test_path(); + + app_path = GetRunningPath(running_path); std::stringstream counter_path; - std::stringstream metrics_path; - counter_path << app_path << "tests/featuretests/profiler/apps/goldentraces/input.txt"; + counter_path << app_path << golden_trace_path << "input.txt"; setenv("COUNTERS_PATH", counter_path.str().c_str(), true); std::stringstream hsa_tools_lib_path; - hsa_tools_lib_path << app_path << "librocprofiler_tool.so"; + hsa_tools_lib_path << app_path << lib_path; setenv("LD_PRELOAD", hsa_tools_lib_path.str().c_str(), true); std::stringstream os; - os << app_path << "tests/featuretests/profiler/apps/" << app_name; + os << app_path << test_app_path << app_name; ProcessApplication(os); } @@ -111,9 +141,9 @@ void ApplicationParser::GetKernelInfoForRunningApplication( void ApplicationParser::GetKernelInfoForGoldenOutput(const char* app_name, std::string file_name, std::vector* kernel_info_output) { std::string entry; - std::string path = GetRunningPath("runFeatureTests"); - entry = path.append("apps/goldentraces/") + file_name; - // parse kernel info fields for golden output + std::string path = GetRunningPath(running_path); + entry = path.append(golden_trace_path) + file_name; + // parse kernel info fields for golden output ParseKernelInfoFields(entry, kernel_info_output); } @@ -459,7 +489,7 @@ class MPITest : public ProfilerTest { }; void MPITest::ProcessMPIApplication(const char* app_name) { - std::string app_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + std::string app_path = GetRunningPath(running_path); std::string lib_path = app_path; std::stringstream hsa_tools_lib_path; @@ -658,11 +688,6 @@ class ATTCollection : public ::testing::Test { hipDeviceProp_t devProp; hipGetDeviceProperties(&devProp, 0); - // std::cout << " System minor " << devProp.minor << std::endl; - // std::cout << " System major " << devProp.major << std::endl; - // std::cout << " agent prop name " << devProp.name << std::endl; - // std::cout << "hip Device prop succeeded " << std::endl; - int i; int errors; @@ -835,10 +860,14 @@ class ProfilerAPITest : public ::testing::Test { }; TEST_F(ProfilerAPITest, WhenRunningMultipleThreadsProfilerAPIsWorkFine) { - std::string app_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); - std::stringstream metrics_path; - metrics_path << app_path << "gfx_metrics.xml"; - setenv("ROCPROFILER_METRICS_PATH", metrics_path.str().c_str(), true); + // set global path + init_test_path(); + + std::string app_path = GetRunningPath(running_path); + std::stringstream gfx_path; + gfx_path << app_path << metrics_path; + + setenv("ROCPROFILER_METRICS_PATH", gfx_path.str().c_str(), true); // Get the system cores int num_cpu_cores = GetNumberOfCores(); @@ -963,11 +992,6 @@ class ProfilerSPMTest : public ::testing::Test { hipDeviceProp_t devProp; hipGetDeviceProperties(&devProp, 0); - // std::cout << " System minor " << devProp.minor << std::endl; - // std::cout << " System major " << devProp.major << std::endl; - // std::cout << " agent prop name " << devProp.name << std::endl; - // std::cout << "hip Device prop succeeded " << std::endl; - int i; int errors; @@ -1100,7 +1124,7 @@ class MTBinaryTest : public ::testing::Test { } } } - + // clear entries counter_map.clear(); @@ -1110,7 +1134,7 @@ class MTBinaryTest : public ::testing::Test { return 0; } - return 0; // Fix CSV parser, until return 0 + return 0; // Fix CSV parser, until return 0 } std::string ReadProfilerBuffer(const char* cmd) { @@ -1129,11 +1153,12 @@ class MTBinaryTest : public ::testing::Test { std::string InitCounterTest() { std::string input_path; - std::string rocprofv2_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + std::string app_path = GetRunningPath(running_path); std::stringstream command; - input_path = rocprofv2_path + "tests/featuretests/profiler/apps/"; - command << rocprofv2_path + "./rocprofv2 -i " << input_path + "basic_metrics.txt " - << input_path + "multithreaded_testapp"; + input_path = app_path + golden_trace_path; + command << app_path + binary_path + " -i " << input_path + "basic_metrics.txt " + << app_path + test_app_path + "multithreaded_testapp"; + std::string result = ReadProfilerBuffer(command.str().c_str()); return result; } @@ -1182,7 +1207,7 @@ class ProfilerMQTest : public ::testing::Test { if (dispatch_counter == dispatch_count) { return 0; } - return 0; //Fix CSV parser, until return 0 + return 0; // Fix CSV parser, until return 0 } std::string ReadProfilerBuffer(const char* cmd) { @@ -1200,20 +1225,21 @@ class ProfilerMQTest : public ::testing::Test { } std::string InitMultiQueueTest() { - std::string rocprofv2_path = GetRunningPath("tests/featuretests/profiler/runFeatureTests"); + std::string app_path = GetRunningPath(running_path); std::string input_path; - input_path = rocprofv2_path + "tests/featuretests/profiler/apps/"; + input_path = app_path + "share/rocprofiler/tests/featuretests/profiler/apps/goldentraces/"; std::stringstream command; - command << rocprofv2_path + "./rocprofv2 -i " << input_path + "input.txt " - << input_path + "multiqueue_testapp"; + command << app_path + binary_path + " -i " << input_path + "basic_metrics.txt " + << app_path + test_app_path + "multiqueue_testapp"; + std::string result = ReadProfilerBuffer(command.str().c_str()); return result; } }; -TEST_F(ProfilerMQTest, WhenRunningMultiProcessTestItPasses) { +TEST_F(ProfilerMQTest, DISBALED_WhenRunningMultiProcessTestItPasses) { int test_status = -1; std::string profiler_output; @@ -1234,8 +1260,8 @@ TEST_F(ProfilerMQTest, WhenRunningMultiProcessTestItPasses) { void KernelLaunch() { // run empty kernel - //kernel<<<1, 1>>>(); //TODO: Check the hang - //hipDeviceSynchronize(); + // kernel<<<1, 1>>>(); //TODO: Check the hang + // hipDeviceSynchronize(); } TEST(ProfilerMPTest, WhenRunningMultiProcessTestItPasses) { diff --git a/tests/featuretests/profiler/utils/csv_parser.cpp b/tests/featuretests/profiler/utils/csv_parser.cpp new file mode 100644 index 00000000..133342c7 --- /dev/null +++ b/tests/featuretests/profiler/utils/csv_parser.cpp @@ -0,0 +1,170 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +#include "utils/csv_parser.h" + +namespace rocmtools { +namespace tests { +namespace utility { + +// Tokenize a comma separated string and saves result in vector +int CSVParser::GetTockenizedString(std::string str, + std::vector &stringVec, + char delim) { + char *token = strtok(const_cast(str.c_str()), &delim); + while (token) { + std::string temp = token; + stringVec.push_back(temp); + token = strtok(NULL, &delim); + } + + return stringVec.size(); +} + +// Get map for collected counters +countermap &CSVParser::GetCounterMap() { return counter_map_; } + +// Read counter value based on row-column +std::string *CSVParser::ReadCounter(uint32_t row, uint32_t col) { + auto itr = counter_map_.find(row); + + if (itr != counter_map_.end()) { + std::map &rowmap = itr->second; + auto it = rowmap.find(col); + if (it != rowmap.end()) { + return &(it->second); + } else { + return nullptr; + } + } else { + return nullptr; + } +} + +// Parses CSV file and saves result in a map +void CSVParser::ParseCSV(const char *path) { + FILE *pFile = fopen(path, "r"); + + if (pFile) { + // Relocate the file pointer on the stream. + fseek(pFile, 0, SEEK_END); + // returns the current file position of the specified stream + // with respect to the starting of the file + uint32_t uSize = ftell(pFile); + // Make the position pointer of the file pFile point + // to the beginning of the file + rewind(pFile); + + char *fileBuffer = new char[uSize]; + size_t size = fread(fileBuffer, 1, uSize, pFile); + if (size < 0) std::cerr << "Incorrect File!" << std::endl; + + std::map rowmap; + uint32_t uiIndex = 1; + + char *pBegin = fileBuffer; + char *pEnd = strchr(pBegin, '\n'); + + // The beginning of the second line, discarding the first line + pBegin = pEnd + 1; + pEnd = strchr(pBegin, '\n'); + + while (pEnd) { + std::string tmp; + tmp.insert(0, pBegin, pEnd - pBegin); + assert(!tmp.empty()); + // Store the string of each line in the map, + // the key is the sequence number, + // and the value is the string + rowmap[uiIndex++] = tmp; + + pBegin = pEnd + 1; + pEnd = strchr(pBegin, '\n'); + } + // clear buffers + delete[] fileBuffer; + fileBuffer = nullptr; + pBegin = nullptr; + pEnd = nullptr; + + auto itr = rowmap.begin(); + for (; itr != rowmap.end(); ++itr) { + std::vector countervec; + std::map rowmap_tmp; + assert(GetTockenizedString(itr->second, countervec) > 0); + + std::vector::size_type idx = 0; + for (; idx != countervec.size(); ++idx) { + rowmap_tmp[idx + 1] = countervec[idx]; + } + counter_map_[itr->first] = rowmap_tmp; + } + } + + fclose(pFile); +} + +// Parses profiler output buffer and saves result in a map +void CSVParser::ParseCSV(std::string buffer) { + std::vector buff(buffer.begin(), buffer.end()); + char *pBegin = &buff[0]; + char *pEnd = strchr(pBegin, '\n'); + + // The beginning of the second line, discarding the first line + pBegin = pEnd + 1; + pEnd = strchr(pBegin, '\n'); + std::map rowmap; + uint32_t uiIndex = 1; + + while (pEnd) { + std::string tmp; + tmp.insert(0, pBegin, pEnd - pBegin); + // Store the string of each line in the map, + // the key is the sequence number, + // and the value is the string + rowmap[uiIndex++] = tmp; + + pBegin = pEnd + 1; + pEnd = strchr(pBegin, '\n'); + } + + // clear buffers + buff.clear(); + pBegin = nullptr; + pEnd = nullptr; + + auto itr = rowmap.begin(); + for (; itr != rowmap.end(); ++itr) { + std::vector countervec; + std::map rowmap_tmp; + GetTockenizedString(itr->second, countervec); + + std::vector::size_type idx = 0; + for (; idx != countervec.size(); ++idx) { + rowmap_tmp[idx + 1] = countervec[idx]; + } + counter_map_[itr->first] = rowmap_tmp; + } +} +} // namespace utility +} // namespace tests +} // namespace rocmtools diff --git a/tests/featuretests/profiler/utils/csv_parser.h b/tests/featuretests/profiler/utils/csv_parser.h new file mode 100644 index 00000000..65ee4412 --- /dev/null +++ b/tests/featuretests/profiler/utils/csv_parser.h @@ -0,0 +1,118 @@ + +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_PROFILER_UTILS_CSV_PARSER_H_ +#define TESTS_FEATURETESTS_PROFILER_UTILS_CSV_PARSER_H_ + +#include +#include + +#include +#include +#include +#include +#include + +/* --------------------------------------------------------------------------*/ +/** + * @Synopsis Implementation of a CSV Parser class for Profiler output + * when running to collect counters.Subsequent counter collection tests + * can use this to parse different applications. + */ +/* --------------------------------------------------------------------------*/ +namespace rocmtools { +namespace tests { +namespace utility { +using countermap = std::map>; + +class CSVParser { + public: + CSVParser() {} + + /** + * Parses CSV file and saves result in a map + * + * Stores csv data as a 2-D array in row-column format + * Skips first line of input csv as it only contains field names + * + * @param[in] path Pointer to the csv path. + * + * @return Returns 0 on success and -1 on error. + */ + void ParseCSV(const char *path); + + /** + * Parses profiler output buffer and saves result in a map + * + * Stores csv data as a 2-D array in row-column format + * Skips first line of input csv as it only contains field names + * + * @param[in] path Pointer to the csv path. + * + * @return Returns 0 on success and -1 on error. + */ + void ParseCSV(std::string buffer); + + /** + * Read counter value based on row-column + * + * @param[in] row row number to be read in csv. + * + * @param[in] col column to be read in csv.(i.e: counter value) + * + * @return If found, returns counter value as a string pointer, nullptr + * otherwise. + */ + std::string *ReadCounter(uint32_t row, uint32_t col); + + /** + * Tokenize a comma separated string and saves result in vector + * + * @param[in] str input string to be tokenized + * + * @param[in] csvtable vector to store tokenized values + * + * @param[in] delim delimitre used for tokenizing + * + * @return returns vector size of delimitd values + */ + int GetTockenizedString(std::string str, std::vector &csvtable, + char delim = ','); + + /** + * A getter for a map of collected counters + * * + * @return Returns a map of collected counters. + */ + countermap &GetCounterMap(); + + private: + // map for counter collection + countermap counter_map_; +}; +} // namespace utility +} // namespace tests +} // namespace rocmtools + +using rocmtools::tests::utility::countermap; +using rocmtools::tests::utility::CSVParser; +#endif // TESTS_FEATURETESTS_PROFILER_UTILS_CSV_PARSER_H_ diff --git a/tests/featuretests/profiler/utils/test_utils.cpp b/tests/featuretests/profiler/utils/test_utils.cpp new file mode 100644 index 00000000..6712d2c8 --- /dev/null +++ b/tests/featuretests/profiler/utils/test_utils.cpp @@ -0,0 +1,64 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include "utils/test_utils.h" + +namespace rocmtools { +namespace tests { +namespace utility { + +// This function returns the running path of executable +std::string GetRunningPath(std::string string_to_erase) { + std::string path; + char *real_path; + Dl_info dl_info; + + if (0 != dladdr(reinterpret_cast(main), &dl_info)) { + std::string to_erase = string_to_erase; + path = dl_info.dli_fname; + real_path = realpath(path.c_str(), NULL); + if (real_path == nullptr) { + throw(std::string("Error! in extracting real path")); + } + path.clear(); // reset path + path.append(real_path); + + size_t pos = path.find(to_erase); + if (pos != std::string::npos) path.erase(pos, to_erase.length()); + } else { + throw(std::string("Error! in extracting real path")); + } + return path; +} + +// This function returns number of cores +// available in system +int GetNumberOfCores() { + std::ifstream cpuinfo("/proc/cpuinfo"); + const int num_cpu_cores = std::count( + std::istream_iterator(cpuinfo), + std::istream_iterator(), std::string("processor")); + return num_cpu_cores; +} + +} // namespace utility +} // namespace tests +} // namespace rocmtools diff --git a/tests/featuretests/profiler/utils/test_utils.h b/tests/featuretests/profiler/utils/test_utils.h new file mode 100644 index 00000000..2388d937 --- /dev/null +++ b/tests/featuretests/profiler/utils/test_utils.h @@ -0,0 +1,57 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_PROFILER_UTILS_TEST_UTILS_H_ +#define TESTS_FEATURETESTS_PROFILER_UTILS_TEST_UTILS_H_ + +#include // for __cxa_demangle +#include // for dladdr +#include // for backtrace + +#include +#include +#include +#include +#include +#include + +namespace rocmtools { +namespace tests { +namespace utility { + +// Get current running path +std::string GetRunningPath(std::string string_to_erase); + +// Get Number of cores in the system +int GetNumberOfCores(); + +} // namespace utility +} // namespace tests +} // namespace rocmtools + +// used for dl_addr to locate the running +// path for executable +int main(int argc, char** argv); + +using rocmtools::tests::utility::GetNumberOfCores; +using rocmtools::tests::utility::GetRunningPath; + +#endif // TESTS_FEATURETESTS_PROFILER_UTILS_TEST_UTILS_H_ diff --git a/tests/featuretests/tracer/apps/goldentraces/hip_helloworld_golden_traces.txt b/tests/featuretests/tracer/apps/goldentraces/hip_helloworld_golden_traces.txt index 113691f2..5568e4b1 100755 --- a/tests/featuretests/tracer/apps/goldentraces/hip_helloworld_golden_traces.txt +++ b/tests/featuretests/tracer/apps/goldentraces/hip_helloworld_golden_traces.txt @@ -2,13 +2,23 @@ 0x2fcc70 agent gpu 9598333364898937 Enabling API Tracing -Record [1], Domain(HIP_API_DOMAIN), Begin(9598333370773048), End(9598333370809238), Correlation ID( 1), ROCTX ID(2), Function(hipGetDeviceProperties) -Record [3], Domain(HIP_API_DOMAIN), Begin(9598333370859109), End(9598333370912969), Correlation ID( 2), ROCTX ID(0), Function(hipMalloc) -Record [5], Domain(HIP_API_DOMAIN), Begin(9598333370924259), End(9598333370928959), Correlation ID( 3), ROCTX ID(0), Function(hipMalloc) -Record [7], Domain(HIP_API_DOMAIN), Begin(9598333370940519), End(9598333575713652), Correlation ID( 4), ROCTX ID(0), Function(hipMemcpy) -Record [9], Domain(HIP_API_DOMAIN), Begin(9598333575750182), End(9598333575768982), Correlation ID( 5), ROCTX ID(0), Function(__hipPushCallConfiguration) -Record [11], Domain(HIP_API_DOMAIN), Begin(9598333575779902), End(9598333575783532), Correlation ID( 6), ROCTX ID(0), Function(__hipPopCallConfiguration) -Record [13], Domain(HIP_API_DOMAIN), Begin(9598333575791112), End(9598333576071305), Correlation ID( 7), ROCTX ID(0), Function(hipLaunchKernel), Kernel Name(helloworld(char*, char*)) -Record [15], Domain(HIP_API_DOMAIN), Begin(9598333576098325), End(9598333576567689), Correlation ID( 8), ROCTX ID(0), Function(hipMemcpy) -Record [17], Domain(HIP_API_DOMAIN), Begin(9598333576581979), End(9598333576591729), Correlation ID( 9), ROCTX ID(0), Function(hipFree) -Record [19], Domain(HIP_API_DOMAIN), Begin(9598333576600359), End(9598333576603379), Correlation ID( 10), ROCTX ID(0), Function(hipFree) +Record(1), Domain(HIP_API_DOMAIN), Function(hipGetDeviceProperties), Begin(2995593944218577), Correlation_ID(1) +Record(2), Domain(HIP_API_DOMAIN), Function(hipGetDeviceProperties), End(2995593944228886), Correlation_ID(1) +Record(4), Domain(HIP_API_DOMAIN), Function(hipMalloc), Begin(2995593944238565), Correlation_ID(2) +Record(5), Domain(HIP_API_DOMAIN), Function(hipMalloc), End(2995593944266920), Correlation_ID(2) +Record(7), Domain(HIP_API_DOMAIN), Function(hipMalloc), Begin(2995593944271769), Correlation_ID(3) +Record(8), Domain(HIP_API_DOMAIN), Function(hipMalloc), End(2995593944277100), Correlation_ID(3) +Record(10), Domain(HIP_API_DOMAIN), Function(hipMemcpy), Begin(2995593944284394), Correlation_ID(4) +Record(11), Domain(HIP_API_DOMAIN), Function(hipMemcpy), End(2995594191690241), Correlation_ID(4) +Record(13), Domain(HIP_API_DOMAIN), Function(__hipPushCallConfiguration), Begin(2995594191704198), Correlation_ID(5) +Record(14), Domain(HIP_API_DOMAIN), Function(__hipPushCallConfiguration), End(2995594191707104), Correlation_ID(5) +Record(16), Domain(HIP_API_DOMAIN), Function(__hipPopCallConfiguration), Begin(2995594191710731), Correlation_ID(6) +Record(17), Domain(HIP_API_DOMAIN), Function(__hipPopCallConfiguration), End(2995594191713486), Correlation_ID(6) +Record(19), Domain(HIP_API_DOMAIN), Function(hipLaunchKernel), Kernel_Name(helloworld(char*, char*)), Begin(2995594191738064), Correlation_ID(7) +Record(21), Domain(HIP_API_DOMAIN), Function(hipLaunchKernel), Kernel_Name(helloworld(char*, char*)), End(2995594192197542), Correlation_ID(7) +Record(23), Domain(HIP_API_DOMAIN), Function(hipMemcpy), Begin(2995594192204856), Correlation_ID(8) +Record(24), Domain(HIP_API_DOMAIN), Function(hipMemcpy), End(2995594192228011), Correlation_ID(8) +Record(26), Domain(HIP_API_DOMAIN), Function(hipFree), Begin(2995594192237078), Correlation_ID(9) +Record(27), Domain(HIP_API_DOMAIN), Function(hipFree), End(2995594192256085), Correlation_ID(9) +Record(29), Domain(HIP_API_DOMAIN), Function(hipFree), Begin(2995594192259622), Correlation_ID(10) +Record(30), Domain(HIP_API_DOMAIN), Function(hipFree), End(2995594192264101), Correlation_ID(10) diff --git a/tests/featuretests/tracer/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt b/tests/featuretests/tracer/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt new file mode 100755 index 00000000..113691f2 --- /dev/null +++ b/tests/featuretests/tracer/gtests/apps/goldentraces/hip_helloworld_golden_traces.txt @@ -0,0 +1,14 @@ +0x2fbdf0 agent cpu +0x2fcc70 agent gpu +9598333364898937 +Enabling API Tracing +Record [1], Domain(HIP_API_DOMAIN), Begin(9598333370773048), End(9598333370809238), Correlation ID( 1), ROCTX ID(2), Function(hipGetDeviceProperties) +Record [3], Domain(HIP_API_DOMAIN), Begin(9598333370859109), End(9598333370912969), Correlation ID( 2), ROCTX ID(0), Function(hipMalloc) +Record [5], Domain(HIP_API_DOMAIN), Begin(9598333370924259), End(9598333370928959), Correlation ID( 3), ROCTX ID(0), Function(hipMalloc) +Record [7], Domain(HIP_API_DOMAIN), Begin(9598333370940519), End(9598333575713652), Correlation ID( 4), ROCTX ID(0), Function(hipMemcpy) +Record [9], Domain(HIP_API_DOMAIN), Begin(9598333575750182), End(9598333575768982), Correlation ID( 5), ROCTX ID(0), Function(__hipPushCallConfiguration) +Record [11], Domain(HIP_API_DOMAIN), Begin(9598333575779902), End(9598333575783532), Correlation ID( 6), ROCTX ID(0), Function(__hipPopCallConfiguration) +Record [13], Domain(HIP_API_DOMAIN), Begin(9598333575791112), End(9598333576071305), Correlation ID( 7), ROCTX ID(0), Function(hipLaunchKernel), Kernel Name(helloworld(char*, char*)) +Record [15], Domain(HIP_API_DOMAIN), Begin(9598333576098325), End(9598333576567689), Correlation ID( 8), ROCTX ID(0), Function(hipMemcpy) +Record [17], Domain(HIP_API_DOMAIN), Begin(9598333576581979), End(9598333576591729), Correlation ID( 9), ROCTX ID(0), Function(hipFree) +Record [19], Domain(HIP_API_DOMAIN), Begin(9598333576600359), End(9598333576603379), Correlation ID( 10), ROCTX ID(0), Function(hipFree) diff --git a/tests/featuretests/tracer/gtests/apps/hip/hello_world.cpp b/tests/featuretests/tracer/gtests/apps/hip/hello_world.cpp new file mode 100755 index 00000000..def003bb --- /dev/null +++ b/tests/featuretests/tracer/gtests/apps/hip/hello_world.cpp @@ -0,0 +1,71 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ + +#include +#include +#include + +#include + +#include +#include +#include + +#define SUCCESS 0 +#define FAILURE 1 + +__global__ void helloworld(char* in, char* out) { + int num = hipThreadIdx_x + hipBlockDim_x * hipBlockIdx_x; + out[num] = in[num] + 1; +} + +int main(int argc, char* argv[]) { + hipDeviceProp_t devProp; + hipGetDeviceProperties(&devProp, 0); + + /* Initial input,output for the host and create memory objects for the + * kernel*/ + const char* input = "GdkknVnqkc"; + size_t strlength = strlen(input); + + char* output = reinterpret_cast(malloc(strlength + 1)); + + char* inputBuffer; + char* outputBuffer; + hipMalloc(reinterpret_cast(&inputBuffer), (strlength + 1) * sizeof(char)); + hipMalloc(reinterpret_cast(&outputBuffer), (strlength + 1) * sizeof(char)); + + hipMemcpy(inputBuffer, input, (strlength + 1) * sizeof(char), hipMemcpyHostToDevice); + + hipLaunchKernelGGL(helloworld, dim3(1), dim3(strlength), 0, 0, inputBuffer, outputBuffer); + + hipMemcpy(output, outputBuffer, (strlength + 1) * sizeof(char), hipMemcpyDeviceToHost); + + hipFree(inputBuffer); + hipFree(outputBuffer); + + output[strlength] = '\0'; // Add the terminal character to the end of output. + + free(output); + + return SUCCESS; +} diff --git a/tests/featuretests/tracer/gtests/apps/hip/hello_world_gtest.cpp b/tests/featuretests/tracer/gtests/apps/hip/hello_world_gtest.cpp new file mode 100755 index 00000000..0e960aec --- /dev/null +++ b/tests/featuretests/tracer/gtests/apps/hip/hello_world_gtest.cpp @@ -0,0 +1,73 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include + +#include "gtests/apps/tracer_gtest.h" + +constexpr auto kGoldenOutputHelloworld = "hip_helloworld_golden_traces.txt"; + +class HelloWorldTest : public ProfilerTest { + protected: + std::vector golden_kernel_info; + void SetUp() { + ProfilerTest::SetUp("tracer_hip_helloworld", "--hip-api "); + GetKernelInfoForGoldenOutput("tracer_hip_helloworld", kGoldenOutputHelloworld, + &golden_kernel_info); + } +}; + +// Test:1 Compares total num of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningTracerWithAppThenKernelInfoMatchWithGoldenOutput) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info.size(), current_kernel_info.size()); +} + +// Test:2 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenFunctionNamessMatchWithGoldenOutput) { + // kernel info in current profiler run + std::vector current_kernel_info; + GetKernelInfoForRunningApplication(¤t_kernel_info); + + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_EQ(golden_kernel_info[0].function, current_kernel_info[0].function); + EXPECT_EQ(golden_kernel_info[1].function, current_kernel_info[1].function); +} + +// Test:3 Compares order of kernel-names in golden output against current +// profiler output +TEST_F(HelloWorldTest, WhenRunningProfilerWithAppThenKernelDurationShouldBePositive) { + // kernel info in current profiler run + std::vector current_kernel_info; + + GetKernelInfoForRunningApplication(¤t_kernel_info); + ASSERT_TRUE(current_kernel_info.size()); + + EXPECT_GT(current_kernel_info.size(), 0); +} diff --git a/tests/featuretests/tracer/gtests/apps/tracer_gtest.cpp b/tests/featuretests/tracer/gtests/apps/tracer_gtest.cpp new file mode 100644 index 00000000..a18bb828 --- /dev/null +++ b/tests/featuretests/tracer/gtests/apps/tracer_gtest.cpp @@ -0,0 +1,148 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include "gtests/apps/tracer_gtest.h" +#include "utils/test_utils.h" + +/** + * Sets application enviornment by seting HSA_TOOLS_LIB. + */ +void ApplicationParser::SetApplicationEnv(const char* app_name, const char* trace_option) { + std::string app_path = GetRunningPath("tests/featuretests/tracer/runTracerFeatureTests"); + + std::stringstream hsa_tools_lib_path; + hsa_tools_lib_path << app_path << "librocprofiler_tool.so"; + setenv("LD_PRELOAD", hsa_tools_lib_path.str().c_str(), true); + + // set --hip-api option + setenv("ROCPROFILER_HIP_API_TRACE", "1", true); + + std::stringstream os; + os << app_path << "tests/featuretests/tracer/gtests/apps/" << app_name; + ProcessApplication(os); +} + +/** + * Parses kernel-info after running profiler against curent application + * and saves them in a vector. + */ +void ApplicationParser::GetKernelInfoForRunningApplication( + std::vector* kernel_info_output) { + KernelInfo kinfo; + for (std::string line : output_lines) { + if (std::regex_match(line, std::regex("(Record)(.*)"))) { + int spos = line.find("["); + int epos = line.find("]", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.record_id = sub; + kernel_info_output->push_back(kinfo); + + // Kernel-Name + size_t found = line.find("Function"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(")", spos); + int length = std::string("kernel-name").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + + kinfo.function = sub; + kernel_info_output->push_back(kinfo); + } + } + } +} + +/** + * Parses kernel-names from a pre-saved golden out files + * and saves them in a vector. + */ +void ApplicationParser::GetKernelInfoForGoldenOutput(const char* app_name, std::string file_name, + std::vector* kernel_info_output) { + std::string entry; + std::string path = GetRunningPath("runTracerFeatureTests"); + entry = path.append("gtests/apps/goldentraces/") + file_name; + + // parse kernel info fields for golden output + ParseKernelInfoFields(entry, kernel_info_output); +} + +/** + * Runs a given appllication and saves profiler output. + * These output lines can be letter passed for kernel informations + * i.e: kernel_names + */ +void ApplicationParser::ProcessApplication(std::stringstream& ss) { + FILE* handle = popen(ss.str().c_str(), "r"); + ASSERT_NE(handle, nullptr); + + char* ln{NULL}; + std::string temp{""}; + size_t len{0}; + + while (getline(&ln, &len, handle) != -1) { + temp = temp + std::string(ln); + } + + free(ln); + size_t pos{0}; + std::string delimiter{"\n"}; + while ((pos = temp.find(delimiter)) != std::string::npos) { + output_lines.push_back(temp.substr(0, pos)); + temp.erase(0, pos + delimiter.length()); + } + + pclose(handle); +} + +/** + * Parses kernel-info for golden output file + * and saves them in a vector. + */ +void ApplicationParser::ParseKernelInfoFields(const std::string& s, + std::vector* kernel_info_output) { + std::string line; + KernelInfo kinfo; + + std::ifstream golden_file(s); + while (!golden_file.eof()) { + getline(golden_file, line); + if (std::regex_match(line, std::regex("(Record)(.*)"))) { + int spos = line.find("["); + int epos = line.find("]", spos); + std::string sub = line.substr(spos + 1, epos - spos - 1); + kinfo.record_id = sub; + kernel_info_output->push_back(kinfo); + + // Kernel-Name + size_t found = line.find("Function"); + if (found != std::string::npos) { + int spos = found; + int epos = line.find(")", spos); + int length = std::string("kernel-name").length(); + std::string sub = line.substr(spos + length + 1, epos - spos - length - 1); + + kinfo.function = sub; + kernel_info_output->push_back(kinfo); + } + } + } + golden_file.close(); +} diff --git a/tests/featuretests/tracer/gtests/apps/tracer_gtest.h b/tests/featuretests/tracer/gtests/apps/tracer_gtest.h new file mode 100644 index 00000000..ba45b45e --- /dev/null +++ b/tests/featuretests/tracer/gtests/apps/tracer_gtest.h @@ -0,0 +1,103 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_TRACER_GTESTS_APPS_TRACER_GTEST_H_ +#define TESTS_FEATURETESTS_TRACER_GTESTS_APPS_TRACER_GTEST_H_ + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +/* --------------------------------------------------------------------------*/ +/** + * @Synopsis Implementation of a Parser class for Profiler output + * Parses pre-saved golden output for kernel info and saves them in a vector + * Executes appliaction(passed as param:app_name) and saves parsed kernel info + * in a vector. + * Subsequent tests can use this to parse different applications + */ +/* --------------------------------------------------------------------------*/ + +class ApplicationParser : public ::testing::Test { + protected: + virtual void SetUp(const char* app_name, const char* trace_option) { + SetApplicationEnv(app_name, trace_option); + } + virtual void TearDown() {} + //!< This can be appended for other kernel info fields; eg: Agent-Name etc. + struct KernelInfo { + std::string record_id; + std::string domain; + std::string begin_time; + std::string end_time; + std::string corelation_id; + std::string roctx_id; + std::string function; + }; + + //!< saves lines of profiler output + std::vector output_lines; + + public: + //!< Sets application enviornment by seting HSA_TOOLS_LIB. + void SetApplicationEnv(const char* app_name, const char* trace_option); + + //!< Parses kernel-info from a pre-saved golden out files + // and saves them in a vector. + void GetKernelInfoForGoldenOutput(const char* app_name, std::string filename, + std::vector* kernel_info_output); + + //!< Parses kernel-info after running profiler against curent application + // and saves them in a vector. + void GetKernelInfoForRunningApplication(std::vector* kernel_info_output); + + private: + //!< Runs a given appllication and saves profiler output. + // These output lines can be letter passed for kernel informations + // i.e: kernel_names + void ProcessApplication(std::stringstream& ss); + + //!< Parses kernel info fields from given input + // i.e: kernel_names, kernel_duration + void ParseKernelInfoFields(const std::string& s, std::vector* kernel_info_output); +}; + +/* --------------------------------------------------------------------------*/ +/** + * @Synopsis Implementation of a ProfilerTest + * Subsequent tests can use this to parse different applications + */ +/* --------------------------------------------------------------------------*/ + +class ProfilerTest : public ApplicationParser { + protected: + virtual void SetUp(const char* app_name, const char* trace_option) { + ApplicationParser::SetUp(app_name, trace_option); + } +}; +#endif // TESTS_FEATURETESTS_TRACER_GTESTS_APPS_TRACER_GTEST_H_ diff --git a/tests/featuretests/tracer/gtests/gtests_main.cpp b/tests/featuretests/tracer/gtests/gtests_main.cpp new file mode 100644 index 00000000..6c73a90c --- /dev/null +++ b/tests/featuretests/tracer/gtests/gtests_main.cpp @@ -0,0 +1,9 @@ +#include + +// Entry Point for Gtests Infra + +int main(int argc, char** argv) { + testing::InitGoogleTest(&argc, argv); + testing::FLAGS_gtest_death_test_style = "threadsafe"; + return RUN_ALL_TESTS(); +} diff --git a/tests/featuretests/tracer/utils/test_utils.cpp b/tests/featuretests/tracer/utils/test_utils.cpp new file mode 100644 index 00000000..244f1565 --- /dev/null +++ b/tests/featuretests/tracer/utils/test_utils.cpp @@ -0,0 +1,54 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#include "utils/test_utils.h" + +namespace rocmtools { +namespace tests { +namespace utility { + +// This function returns the running path of executable +std::string GetRunningPath(std::string string_to_erase) { + std::string path; + char* real_path; + Dl_info dl_info; + + if (0 != dladdr(reinterpret_cast(main), &dl_info)) { + std::string to_erase = string_to_erase; + path = dl_info.dli_fname; + real_path = realpath(path.c_str(), NULL); + if (real_path == nullptr) { + throw(std::string("Error! in extracting real path")); + } + path.clear(); // reset path + path.append(real_path); + + size_t pos = path.find(to_erase); + if (pos != std::string::npos) path.erase(pos, to_erase.length()); + } else { + throw(std::string("Error! in extracting real path")); + } + return path; +} + +} // namespace utility +} // namespace tests +} // namespace rocmtools diff --git a/tests/featuretests/tracer/utils/test_utils.h b/tests/featuretests/tracer/utils/test_utils.h new file mode 100644 index 00000000..1b1e5987 --- /dev/null +++ b/tests/featuretests/tracer/utils/test_utils.h @@ -0,0 +1,53 @@ +/* +Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +*/ +#ifndef TESTS_FEATURETESTS_TRACER_UTILS_TEST_UTILS_H_ +#define TESTS_FEATURETESTS_TRACER_UTILS_TEST_UTILS_H_ + +#include // for __cxa_demangle +#include // for dladdr +#include // for backtrace + +#include +#include +#include +#include +#include +#include + +namespace rocmtools { +namespace tests { +namespace utility { + +// Get current running path +std::string GetRunningPath(std::string string_to_erase); + +} // namespace utility +} // namespace tests +} // namespace rocmtools + +// used for dl_addr to locate the running +// path for executable +int main(int argc, char** argv); + +using rocmtools::tests::utility::GetRunningPath; + +#endif // TESTS_FEATURETESTS_TRACER_UTILS_TEST_UTILS_H_ diff --git a/tests/featuretests/utils/test_utils.cpp b/tests/featuretests/utils/test_utils.cpp index 1057bb6a..912ea9cb 100644 --- a/tests/featuretests/utils/test_utils.cpp +++ b/tests/featuretests/utils/test_utils.cpp @@ -28,10 +28,10 @@ namespace utility { // This function returns the running path of executable std::string GetRunningPath(std::string string_to_erase) { std::string path; - char *real_path; + char* real_path; Dl_info dl_info; - if (0 != dladdr(reinterpret_cast(main), &dl_info)) { + if (0 != dladdr(reinterpret_cast(main), &dl_info)) { std::string to_erase = string_to_erase; path = dl_info.dli_fname; real_path = realpath(path.c_str(), NULL); @@ -41,6 +41,9 @@ std::string GetRunningPath(std::string string_to_erase) { path.clear(); // reset path path.append(real_path); + //std::cout << path << std::endl; + + size_t pos = path.find(to_erase); if (pos != std::string::npos) path.erase(pos, to_erase.length()); } else { @@ -53,12 +56,32 @@ std::string GetRunningPath(std::string string_to_erase) { // available in system int GetNumberOfCores() { std::ifstream cpuinfo("/proc/cpuinfo"); - const int num_cpu_cores = std::count( - std::istream_iterator(cpuinfo), - std::istream_iterator(), std::string("processor")); + const int num_cpu_cores = + std::count(std::istream_iterator(cpuinfo), std::istream_iterator(), + std::string("processor")); return num_cpu_cores; } +bool is_installed_path() { + std::string path; + char* real_path; + Dl_info dl_info; + + if (0 != dladdr(reinterpret_cast(main), &dl_info)) { + path = dl_info.dli_fname; + real_path = realpath(path.c_str(), NULL); + if (real_path == nullptr) { + throw(std::string("Error! in extracting real path")); + } + path.clear(); // reset path + path.append(real_path); + if (path.find("/opt") != std::string::npos) { + return true; + } + } + return false; +} + } // namespace utility } // namespace tests } // namespace rocmtools diff --git a/tests/featuretests/utils/test_utils.h b/tests/featuretests/utils/test_utils.h index 2388d937..27c1ef4b 100644 --- a/tests/featuretests/utils/test_utils.h +++ b/tests/featuretests/utils/test_utils.h @@ -43,6 +43,8 @@ std::string GetRunningPath(std::string string_to_erase); // Get Number of cores in the system int GetNumberOfCores(); +bool is_installed_path(); + } // namespace utility } // namespace tests } // namespace rocmtools @@ -53,5 +55,6 @@ int main(int argc, char** argv); using rocmtools::tests::utility::GetNumberOfCores; using rocmtools::tests::utility::GetRunningPath; +using rocmtools::tests::utility::is_installed_path; #endif // TESTS_FEATURETESTS_PROFILER_UTILS_TEST_UTILS_H_ diff --git a/tests/microbenchmarks/CMakeLists.txt b/tests/microbenchmarks/CMakeLists.txt new file mode 100644 index 00000000..524b4f8b --- /dev/null +++ b/tests/microbenchmarks/CMakeLists.txt @@ -0,0 +1,20 @@ + # Set the HIP language runtime link flags as FindHIP does not set them. +set(CMAKE_INSTALL_TESTDIR test/${PROJECT_NAME}) +set(CMAKE_EXECUTABLE_RUNTIME_HIP_FLAG ${CMAKE_SHARED_LIBRARY_RUNTIME_CXX_FLAG}) +set(CMAKE_EXECUTABLE_RUNTIME_HIP_FLAG_SEP ${CMAKE_SHARED_LIBRARY_RUNTIME_CXX_FLAG_SEP}) +set(CMAKE_EXECUTABLE_RPATH_LINK_HIP_FLAG ${CMAKE_SHARED_LIBRARY_RPATH_LINK_CXX_FLAG}) + +set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${ROCM_PATH}/lib/cmake/hip") +set(CMAKE_HIP_ARCHITECTURES OFF) +find_package(HIP REQUIRED MODULE) + +set(TEST_DIR ${PROJECT_SOURCE_DIR}/tests/microbenchmarks) +file(GLOB TEST_SRC_FILE ${TEST_DIR}/*.cpp) + +set_source_files_properties(${TEST_SRC_FILE} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1) +hip_add_executable(pcie_bw_test ${TEST_SRC_FILE}) + +target_link_libraries(pcie_bw_test PRIVATE rocm_smi64) +target_link_options(pcie_bw_test PRIVATE "-Wl,--build-id=md5") +set_target_properties(pcie_bw_test PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/tests/microbenchmarks") +install(TARGETS pcie_bw_test RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests/microbenchmarks COMPONENT tests) \ No newline at end of file diff --git a/tests/microbenchmarks/pcie_bw_test.cpp b/tests/microbenchmarks/pcie_bw_test.cpp new file mode 100644 index 00000000..5b5c9d7c --- /dev/null +++ b/tests/microbenchmarks/pcie_bw_test.cpp @@ -0,0 +1,187 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rocm_smi/rocm_smi.h" + + +/** + * @brief Test SMI APIs for PCIe + * + * rsmi_status_t rsmi_dev_pci_bandwidth_get (uint32_t dev, rsmi_pcie_bandwidth_t* bw) + Retrieve PCIe link speeds for given device + + * rsmi_status_t rsmi_dev_pci_throughput_get (uint32_t dev, u64* sent, u64* received, u64* + max_packet_size) Retrieve number of packets sent via PCIe to/from device, and the max packet size + in bytes. +*/ + + +#define DISPLAY_RSMI_ERR(RET) \ + { \ + if (RET != RSMI_STATUS_SUCCESS) { \ + const char* err_str; \ + std::cout << "\t===> ERROR: RSMI call returned " << (RET) << std::endl; \ + rsmi_status_string((RET), &err_str); \ + std::cout << "\t===> (" << err_str << ")" << std::endl; \ + std::cout << "\t===> at " << __FILE__ << ":" << std::dec << __LINE__ << std::endl; \ + } \ + } + +#define CHK_ERR_ASRT(RET) \ + { \ + if (((RET) != RSMI_STATUS_SUCCESS)) { \ + std::cout << std::endl << "\t===> TEST FAILURE." << std::endl; \ + DISPLAY_RSMI_ERR(RET); \ + std::cout << "\t===> Abort is over-ridden due to dont_fail command line option." \ + << std::endl; \ + } \ + } + +#define HANDLE_ERROR CHK_ERR_ASRT(ret); +#define HIP_ASSERT(x) (assert((x) == hipSuccess)) + +#define SEND_DATA() \ + HIP_ASSERT(hipMemcpyAsync(dst, src, SIZE * sizeof(int), hipMemcpyDefault, stream)); + +static float burn_hip(int dev, int* dst, int* src, size_t SIZE, + std::atomic* transfer_started) { + hipSetDevice(dev); + hipStream_t stream; + hipStreamCreate(&stream); + hipEvent_t events[3]; + + for (int i = 0; i < 3; i++) { + hipEventCreate(events + i); + SEND_DATA(); + hipEventRecord(events[i], stream); + } + SEND_DATA(); + hipEventSynchronize(events[0]); + transfer_started->store(true); + + float elapsed = 0; + uint64_t counter = 0; + while (elapsed < 1500.0f) { // Transfer data for 1.5 seconds = 1500 ms + float out; + + hipEventSynchronize(events[(counter + 1) % 3]); + hipEventElapsedTime(&out, events[counter % 3], events[(counter + 1) % 3]); + elapsed += out; + + hipEventRecord(events[counter % 3], stream); + SEND_DATA(); + counter += 1; + } + hipStreamSynchronize(stream); + + for (int i = 0; i < 3; i++) hipEventDestroy(events[i]); + hipStreamDestroy(stream); + + return float(SIZE * sizeof(int) * counter) / elapsed / 1E6; +} + +int main() { + const size_t SIZE = 3 << 28; + rsmi_status_t ret; + uint16_t dev_id; + + int* h_ptr = new int[SIZE]; + hipHostRegister(h_ptr, SIZE * sizeof(int), 0); + for (size_t i = 0; i < SIZE; i++) h_ptr[i] = i; + + ret = rsmi_init(0); + HANDLE_ERROR; + uint32_t num_devices; + ret = rsmi_num_monitor_devices(&num_devices); + HANDLE_ERROR; + std::cout << "Num devices: " << num_devices << std::endl; + + for (uint32_t dev = 0; dev < num_devices; dev++) { + hipSetDevice(dev); + int* d_ptr; + HIP_ASSERT(hipMalloc((void**)&d_ptr, SIZE * sizeof(int))); + + std::cout << ">>> Device " << dev << std::endl; + ret = rsmi_dev_id_get(dev, &dev_id); + HANDLE_ERROR; + + rsmi_pcie_bandwidth_t bandwidth; + ret = rsmi_dev_pci_bandwidth_get(dev, &bandwidth); + HANDLE_ERROR; + + for (uint32_t i = 0; i < bandwidth.transfer_rate.num_supported; i++) { + std::cout << "State " << i << ": " << bandwidth.transfer_rate.frequency[i] << " at " + << bandwidth.lanes[i] << " lanes.\n"; + } + std::cout << "Current: " << bandwidth.transfer_rate.frequency[bandwidth.transfer_rate.current] + << '\n'; + + uint64_t sent = 0, received = 0, max_pkt_sz = 0; + std::atomic transfer_started; + transfer_started.store(false); + auto thread = + std::async(std::launch::async, burn_hip, dev, d_ptr, h_ptr, SIZE, &transfer_started); + + while (!transfer_started.load()) usleep(1); + + ret = rsmi_dev_pci_throughput_get(dev, &sent, &received, &max_pkt_sz); + HANDLE_ERROR; + + std::cout << "Data sent: " << sent << std::endl; + std::cout << "Data received: " << received << std::endl; + std::cout << "Max packet size: " << max_pkt_sz << std::endl; + + std::cout << "HtD BW: " << 0.1 * int(10 * thread.get() + 0.5f) << " GB/s" << std::endl; + + transfer_started.store(false); + thread = std::async(std::launch::async, burn_hip, dev, h_ptr, d_ptr, SIZE, &transfer_started); + + while (!transfer_started.load()) usleep(1); + + ret = rsmi_dev_pci_throughput_get(dev, &sent, &received, &max_pkt_sz); + HANDLE_ERROR; + + std::cout << "Data sent: " << sent << std::endl; + std::cout << "Data received: " << received << std::endl; + std::cout << "Max packet size: " << max_pkt_sz << std::endl; + + std::cout << "DtH BW: " << 0.1 * int(10 * thread.get() + 0.5f) << " GB/s" << std::endl; + HIP_ASSERT(hipFree(d_ptr)); + } + + hipHostUnregister(h_ptr); + delete[] h_ptr; + ret = rsmi_shut_down(); + return 0; +} \ No newline at end of file diff --git a/tests/unittests/core/CMakeLists.txt b/tests/unittests/core/CMakeLists.txt new file mode 100644 index 00000000..5efc6254 --- /dev/null +++ b/tests/unittests/core/CMakeLists.txt @@ -0,0 +1,107 @@ +set (OLD_LIB_SRC + ${LIB_DIR}/core/rocprofiler.cpp + ${LIB_DIR}/core/gpu_command.cpp + ${LIB_DIR}/core/proxy_queue.cpp + ${LIB_DIR}/core/simple_proxy_queue.cpp + ${LIB_DIR}/core/intercept_queue.cpp + ${LIB_DIR}/core/metrics.cpp + ${LIB_DIR}/core/activity.cpp + ${LIB_DIR}/util/hsa_rsrc_factory.cpp +) + +# Setup unit testing env +find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED) + +enable_testing() +find_package(GTest REQUIRED) + +# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils +set(CORE_MEMORY_DIR ${PROJECT_SOURCE_DIR}/src/core/memory) +file(GLOB CORE_MEMORY_SRC_FILES ${CORE_MEMORY_DIR}/*.cpp) + +set(CORE_SESSION_DIR ${PROJECT_SOURCE_DIR}/src/core/session) +file(GLOB CORE_SESSION_SRC_FILES ${CORE_SESSION_DIR}/session.cpp) +file(GLOB CORE_FILTER_SRC_FILES ${CORE_SESSION_DIR}/filter.cpp) +file(GLOB CORE_DEVICE_PROFILING_SRC_FILES ${CORE_SESSION_DIR}/device_profiling.cpp) + +set(CORE_HW_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware) +file(GLOB CORE_HW_SRC_FILES ${CORE_HW_DIR}/hsa_info.cpp) + +set(CORE_HW_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware) +file(GLOB CORE_HW_SRC_FILES ${CORE_HW_DIR}/hsa_info.cpp) + +set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils) +file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp) + +set(CORE_HSA_PACKETS_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa/packets) +file(GLOB CORE_HSA_PACKETS_SRC_FILES ${CORE_HSA_PACKETS_DIR}/packets_generator.cpp) + +file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp) + +file(GLOB ROCPROFILER_SRC_API_FILES ${PROJECT_SOURCE_DIR}/src/api/*.cpp) + +file(GLOB ROCPROFILER_SRC_PROFILER_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp) +file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp) +file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp) +file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp) +file(GLOB ROCPROFILER_SRC_CLASS_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocmtool.cpp) +file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp) + +set(ROCPROFILER_SRC_FILES ${ROCPROFILER_SRC_API_FILES} ${ROCPROFILER_SRC_CLASS_FILES} ${ROCPROFILER_SRC_PROFILER_FILES} ${ROCPROFILER_ATT_SRC_FILES}) + +set(CORE_HSA_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa) +file(GLOB CORE_HSA_SRC_FILES ${CORE_HSA_DIR}/*.cpp) + +set(CORE_HSA_QUEUES_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa/queues) +file(GLOB CORE_HSA_QUEUES_SRC_FILES ${CORE_HSA_QUEUES_DIR}/*.cpp) + +file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp) +file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp) + +set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler) +file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp) + +# Compiling gtests +add_executable(runCoreUnitTests ${CMAKE_CURRENT_SOURCE_DIR}/gtests_main.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/session/session_gtest.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/memory/memory_gtest.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/hardware/hsa_info_gtest.cpp + ${CORE_MEMORY_SRC_FILES} + ${CORE_SESSION_SRC_FILES} + ${CORE_FILTER_SRC_FILES} + ${CORE_DEVICE_PROFILING_SRC_FILES} + ${CORE_HW_SRC_FILES} + ${CORE_UTILS_SRC_FILES} + ${ROCPROFILER_SRC_FILES} + ${CORE_HSA_SRC_FILES} + ${ROCPROFILER_SPM_SRC_FILES} + ${CORE_HSA_PACKETS_SRC_FILES} + ${CORE_COUNTERS_SRC_FILES} + ${CORE_HSA_QUEUES_SRC_FILES} + ${ROCPROFILER_TRACER_SRC_FILES} + ${ROCPROFILER_ROCTRACER_SRC_FILES} + ${CORE_COUNTERS_METRICS_SRC_FILES} + ${CORE_COUNTERS_PARENT_SRC_FILES} + ${CORE_PC_SAMPLING_FILES} + ${OLD_LIB_SRC}) + +target_include_directories(runCoreUnitTests PRIVATE ${PROJECT_SOURCE_DIR} + ${LIB_DIR} ${ROOT_DIR} + ${PROJECT_SOURCE_DIR}/src + ${PROJECT_SOURCE_DIR}/inc + ${PROJECT_SOURCE_DIR}/tests/unittests/profiler + ${PROJECT_BINARY_DIR} + ${PROJECT_BINARY_DIR}/rocprofiler) + +target_compile_definitions(runCoreUnitTests + PUBLIC AMD_INTERNAL_BUILD + PRIVATE PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1) + +# Link test executable against gtest & gtest_main +target_link_libraries(runCoreUnitTests PRIVATE ${ROCPROFILER_TARGET} ${AQLPROFILE_LIB} + hsa-runtime64::hsa-runtime64 c stdc++ + GTest::gtest GTest::gtest_main stdc++fs dl ${PCIACCESS_LIBRARIES}) + +add_dependencies(tests runCoreUnitTests) +install(TARGETS runCoreUnitTests RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests COMPONENT tests) +add_test(AllTests runCoreUnitTests) diff --git a/tests/unittests/core/gtests_main.cpp b/tests/unittests/core/gtests_main.cpp new file mode 100644 index 00000000..c0ec8012 --- /dev/null +++ b/tests/unittests/core/gtests_main.cpp @@ -0,0 +1,9 @@ +#include + +// Entry Point for Gtests Infra + +int main(int argc, char **argv) { + testing::InitGoogleTest(&argc, argv); + testing::FLAGS_gtest_death_test_style = "threadsafe"; + return RUN_ALL_TESTS(); +} diff --git a/tests/unittests/core/hardware/hsa_info_gtest.cpp b/tests/unittests/core/hardware/hsa_info_gtest.cpp new file mode 100644 index 00000000..218ec005 --- /dev/null +++ b/tests/unittests/core/hardware/hsa_info_gtest.cpp @@ -0,0 +1,39 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include + +#include +#include "core/hardware/hsa_info.h" + +TEST(WhenTestingAgentInfoGetterSetters, TestRunsSuccessfully) { + Agent::AgentInfo agent_info = Agent::AgentInfo(); + char gpu_name[] = "gfx10"; + agent_info.setName(gpu_name); + agent_info.setIndex(0); + agent_info.setType(hsa_device_type_t::HSA_DEVICE_TYPE_GPU); + + EXPECT_EQ(agent_info.getName(), gpu_name); + EXPECT_EQ(agent_info.getIndex(), 0); + EXPECT_EQ(agent_info.getType(), hsa_device_type_t::HSA_DEVICE_TYPE_GPU); + + Agent::CounterHardwareInfo hw_info(0, "GRBM"); + EXPECT_TRUE(getHardwareInfo(0, "GRBM", &hw_info)); +} diff --git a/tests/unittests/core/memory/memory_gtest.cpp b/tests/unittests/core/memory/memory_gtest.cpp new file mode 100644 index 00000000..8ac0e83b --- /dev/null +++ b/tests/unittests/core/memory/memory_gtest.cpp @@ -0,0 +1,55 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include + +#include +#include + +#include "core/memory/generic_buffer.h" + +void buffer_callback_fun(const rocprofiler_record_header_t* begin, + const rocprofiler_record_header_t* end, rocprofiler_session_id_t session_id, + rocprofiler_buffer_id_t buffer_id) { + std::cout << "buffer callback" << std::endl; +} +// A lot have changed in the class, since this test was written +// Need to rewrite all the test cases again. +TEST(WhenAddingARecordToBuffer, DISABLED_RecordGetsAddedSuccefully) { + Memory::GenericBuffer* buffer = new Memory::GenericBuffer( + rocprofiler_session_id_t{0}, rocprofiler_buffer_id_t{0}, 0x8000, buffer_callback_fun); + + uint64_t start_time = 0; + uint64_t end_time = 10; + + uint64_t kernel_object = 123456789; + uint64_t gpu_name_descriptor = 1234565789; + rocprofiler_record_profiler_t record = rocprofiler_record_profiler_t{ + rocprofiler_record_header_t{ROCPROFILER_PROFILER_RECORD, rocprofiler_record_id_t{0}}, + rocprofiler_kernel_id_t{kernel_object}, + rocprofiler_agent_id_t{gpu_name_descriptor}, + rocprofiler_queue_id_t{0}, + rocprofiler_record_header_timestamp_t{start_time, end_time}, + nullptr, + 0}; + + EXPECT_TRUE(buffer->AddRecord(record)); + delete buffer; +} \ No newline at end of file diff --git a/tests/unittests/core/session/session_gtest.cpp b/tests/unittests/core/session/session_gtest.cpp new file mode 100644 index 00000000..00de6622 --- /dev/null +++ b/tests/unittests/core/session/session_gtest.cpp @@ -0,0 +1,206 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include + +#include +#include + +#include "core/memory/generic_buffer.h" +#include "core/session/session.h" + +void (*buffer_callback_fun)(const rocprofiler_record_header_t* begin, + const rocprofiler_record_header_t* end, rocprofiler_session_id_t session_id, + rocprofiler_buffer_id_t buffer_id); + +/** + * @brief This class creates a single timestamp session + * + */ + +class TimeStampSession : public ::testing::Test { + protected: + rocprofiler_session_id_t session_id{1234}; + std::unique_ptr session_ptr = + std::make_unique(ROCPROFILER_NONE_REPLAY_MODE, session_id); + + void SetUp() { + rocprofiler_filter_id_t filter_id = + session_ptr->CreateFilter(ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, + rocprofiler_filter_data_t{}, 0, rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = session_ptr->CreateBuffer(buffer_callback_fun, 0x9999); + session_ptr->GetFilter(filter_id)->SetBufferId(buffer_id); + session_ptr->Start(); + } + void TearDown() { session_ptr->Terminate(); } +}; + +TEST_F(TimeStampSession, NewlyActivatedSessionIsActive) { + // check if session is inactive + EXPECT_TRUE(session_ptr->IsActive()); +} + +TEST_F(TimeStampSession, DeactivatingNewlyCreatedSessionPasses) { + // check if session is inactive + EXPECT_TRUE(session_ptr->IsActive()); + + // Activate session + session_ptr->Terminate(); + + // check if session is active + EXPECT_FALSE(session_ptr->IsActive()); +} + +TEST_F(TimeStampSession, DeactivatingAnActivatedSessionPasses) { + // activate the session + session_ptr->Start(); + + // check if session is active + EXPECT_TRUE(session_ptr->IsActive()); + + // deactivate the session + session_ptr->Terminate(); + + // check if session is inactive + EXPECT_FALSE(session_ptr->IsActive()); +} + +TEST_F(TimeStampSession, ForANewlyCreatedSessionValidSessionIdIsReturned) { + // get session id + rocprofiler_session_id_t session_id = session_ptr->GetId(); + + // check for the valid id + EXPECT_EQ(1234, session_id.handle); +} + +/** + * @brief This class creates multiple time stamp sessions + * + */ +class TestingMultipleSessions : public ::testing::Test { + protected: + std::vector> session_list; + uint64_t number_of_sessions = 5; + void SetUp() { + for (uint64_t id = 0; id < number_of_sessions; id++) { + std::unique_ptr timestamp_session = std::make_unique( + ROCPROFILER_NONE_REPLAY_MODE, rocprofiler_session_id_t{id}); + + rocprofiler_filter_id_t filter_id = timestamp_session->CreateFilter( + ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0, + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + timestamp_session->CreateBuffer(buffer_callback_fun, 0x9999); + timestamp_session->GetFilter(filter_id)->SetBufferId(buffer_id); + timestamp_session->Start(); + session_list.push_back(std::move(timestamp_session)); + } + } + void TearDown() { + for (uint64_t id = 0; id < number_of_sessions; id++) { + session_list[id]->Terminate(); + } + } +}; + +TEST_F(TestingMultipleSessions, AllSessionsAreCreatedSuccessfully) { + // check if sessions are inactive + for (uint64_t id = 0; id < number_of_sessions; id++) { + EXPECT_TRUE(session_list[id]->IsActive()); + } +} + +TEST_F(TestingMultipleSessions, AllSessionsAreActivatedSuccessfully) { + // Activate all sessions + for (uint64_t id = 0; id < number_of_sessions; id++) { + session_list[id]->Start(); + } + // Check if sessions are activated + for (uint64_t id = 0; id < number_of_sessions; id++) { + EXPECT_TRUE(session_list[id]->IsActive()); + } +} + +TEST_F(TestingMultipleSessions, DeactivatingAnActivatedSessionPasses) { + // Activate all sessions + for (uint64_t id = 0; id < number_of_sessions; id++) { + session_list[id]->Start(); + } + + // Check if sessions are activated + for (uint64_t id = 0; id < number_of_sessions; id++) { + EXPECT_TRUE(session_list[id]->IsActive()); + } + + // deactivate the sessions + for (uint64_t id = 0; id < number_of_sessions; id++) { + session_list[id]->Terminate(); + } + + // check if all sessions are deactivated + for (uint64_t id = 0; id < number_of_sessions; id++) { + EXPECT_FALSE(session_list[id]->IsActive()); + } +} + +// Createing sessions with 2 different profiling mode +TEST(WhenCreatingTwoSessionsWithDiffProfilingMode, BothSessionsAreCreated) { + std::map> sessions; + + sessions = std::map>(); + + { + // create a counter collection session + rocprofiler_session_id_t session_id{1}; + std::vector counters; + counters.emplace_back("SQ_WAVES"); + counters.emplace_back("GRBM_COUNT"); + sessions.emplace(session_id.handle, + std::make_unique(ROCPROFILER_NONE_REPLAY_MODE, session_id)); + + rocprofiler_filter_id_t filter_id = + sessions.at(session_id.handle) + ->CreateFilter(ROCPROFILER_COUNTERS_COLLECTION, + rocprofiler_filter_data_t{.counters_names = &counters[0]}, counters.size(), + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + sessions.at(session_id.handle)->CreateBuffer(buffer_callback_fun, 0x9999); + sessions.at(session_id.handle)->GetFilter(filter_id)->SetBufferId(buffer_id); + } + { + // create a timestamp collection session + rocprofiler_session_id_t session_id{2}; + sessions.emplace(session_id.handle, + std::make_unique(ROCPROFILER_NONE_REPLAY_MODE, session_id)); + rocprofiler_filter_id_t filter_id = + sessions.at(session_id.handle) + ->CreateFilter(ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0, + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + sessions.at(session_id.handle)->CreateBuffer(buffer_callback_fun, 0x9999); + sessions.at(session_id.handle)->GetFilter(filter_id)->SetBufferId(buffer_id); + } + + // check for correct profiling mode + EXPECT_TRUE(sessions.at(1)->FindFilterWithKind(ROCPROFILER_COUNTERS_COLLECTION)); + EXPECT_TRUE(sessions.at(2)->FindFilterWithKind(ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION)); + + sessions.clear(); +} \ No newline at end of file diff --git a/tests/unittests/profiler/CMakeLists.txt b/tests/unittests/profiler/CMakeLists.txt new file mode 100644 index 00000000..7e08ace6 --- /dev/null +++ b/tests/unittests/profiler/CMakeLists.txt @@ -0,0 +1,94 @@ +# Setup unit testing env + +find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED) + +enable_testing() +find_package(GTest REQUIRED) + +# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils +set(CORE_MEMORY_DIR ${PROJECT_SOURCE_DIR}/src/core/memory) +file(GLOB CORE_MEMORY_SRC_FILES ${CORE_MEMORY_DIR}/*.cpp) + +set(CORE_SESSION_DIR ${PROJECT_SOURCE_DIR}/src/core/session) +file(GLOB CORE_SESSION_SRC_FILES ${CORE_SESSION_DIR}/session.cpp) +file(GLOB CORE_FILTER_SRC_FILES ${CORE_SESSION_DIR}/filter.cpp) +file(GLOB CORE_DEVICE_PROFILING_SRC_FILES ${CORE_SESSION_DIR}/device_profiling.cpp) + +set(CORE_HW_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware) +file(GLOB CORE_HW_SRC_FILES ${CORE_HW_DIR}/hsa_info.cpp) + +set(CORE_HW_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware) +file(GLOB CORE_HW_SRC_FILES ${CORE_HW_DIR}/hsa_info.cpp) + +set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils) +file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp) + +set(CORE_HSA_PACKETS_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa/packets) +file(GLOB CORE_HSA_PACKETS_SRC_FILES ${CORE_HSA_PACKETS_DIR}/packets_generator.cpp) + +file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp) + +file(GLOB ROCPROFILER_SRC_PROFILER_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp) +file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp) +file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp) +file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp) +file(GLOB ROCPROFILER_SRC_CLASS_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocmtool.cpp) +file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp) +file(GLOB ROCPROFILER_SRC_API_FILES ${PROJECT_SOURCE_DIR}/src/api/*.cpp) + +set(ROCPROFILER_SRC_FILES ${ROCPROFILER_SRC_API_FILES} ${ROCPROFILER_SRC_CLASS_FILES} ${ROCPROFILER_SRC_PROFILER_FILES} ${ROCPROFILER_ATT_SRC_FILES}) + +set(CORE_HSA_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa) +file(GLOB CORE_HSA_SRC_FILES ${CORE_HSA_DIR}/*.cpp) + +set(CORE_HSA_QUEUES_DIR ${PROJECT_SOURCE_DIR}/src/core/hsa/queues) +file(GLOB CORE_HSA_QUEUES_SRC_FILES ${CORE_HSA_QUEUES_DIR}/*.cpp) + +set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler) +file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp) + +# Compiling gtests +file(GLOB ROCPROFILER_TOOL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/rocmtools/tools/tool.cpp) + +file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp) +file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp) + +add_executable(runProfilerUnitTests ${CMAKE_CURRENT_SOURCE_DIR}/tools/tool_gtest.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/api/rocmtool_gtest.cpp + ${CORE_MEMORY_SRC_FILES} + ${CORE_SESSION_SRC_FILES} + ${CORE_FILTER_SRC_FILES} + ${CORE_DEVICE_PROFILING_SRC_FILES} + ${CORE_HW_SRC_FILES} + ${CORE_UTILS_SRC_FILES} + ${ROCPROFILER_SPM_SRC_FILES} + ${ROCPROFILER_SRC_FILES} + ${CORE_HSA_SRC_FILES} + ${CORE_HSA_PACKETS_SRC_FILES} + ${CORE_COUNTERS_SRC_FILES} + ${CORE_HSA_QUEUES_SRC_FILES} + ${ROCPROFILER_TRACER_SRC_FILES} + ${ROCPROFILER_ROCTRACER_SRC_FILES} + ${CORE_COUNTERS_METRICS_SRC_FILES} + ${CORE_COUNTERS_PARENT_SRC_FILES} + ${CORE_PC_SAMPLING_FILES}) + +target_include_directories(runProfilerUnitTests PRIVATE ${PROJECT_SOURCE_DIR} + ${PROJECT_SOURCE_DIR}/src + ${PROJECT_SOURCE_DIR}/inc + ${CMAKE_CURRENT_SOURCE_DIR} + ${PROJECT_BINARY_DIR} + ${PROJECT_BINARY_DIR}/rocprofiler) + +target_compile_definitions(runProfilerUnitTests + PUBLIC AMD_INTERNAL_BUILD + PRIVATE PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1) + +target_link_libraries(runProfilerUnitTests PRIVATE rocprofiler_tool ${AQLPROFILE_LIB} + hsa-runtime64::hsa-runtime64 + GTest::gtest GTest::gtest_main stdc++fs + ${PCIACCESS_LIBRARIES}) + +add_dependencies(tests runProfilerUnitTests) +install(TARGETS runProfilerUnitTests RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/tests COMPONENT tests) +add_test(AllTests runProfilerUnitTests) diff --git a/tests/unittests/profiler/api/rocmtool_gtest.cpp b/tests/unittests/profiler/api/rocmtool_gtest.cpp new file mode 100644 index 00000000..881c57b8 --- /dev/null +++ b/tests/unittests/profiler/api/rocmtool_gtest.cpp @@ -0,0 +1,91 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include + +#include + +#include "core/session/session.h" +#include "api/rocmtool.h" + +void (*callback_fun)(const rocprofiler_record_header_t* begin, const rocprofiler_record_header_t* end, + rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id); + +TEST(WhenTestingCounterCollectionMode, TestSucceeds) { + rocprofiler_session_id_t session_id; + + rocmtools::rocmtool toolobj; + session_id = toolobj.CreateSession(ROCPROFILER_NONE_REPLAY_MODE); + rocprofiler_filter_id_t filter_id = + toolobj.GetSession(session_id) + ->CreateFilter(ROCPROFILER_COUNTERS_COLLECTION, rocprofiler_filter_data_t{}, 0, + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + toolobj.GetSession(session_id)->CreateBuffer(callback_fun, 0x9999); + toolobj.GetSession(session_id)->GetFilter(filter_id)->SetBufferId(buffer_id); + + + rocmtools::Session* session = toolobj.GetSession(session_id); + EXPECT_TRUE(session->FindFilterWithKind(ROCPROFILER_COUNTERS_COLLECTION)); + toolobj.DestroySession(session_id); +} + +TEST(WhenTestingTimeStampCollectionMode, TestSucceeds) { + rocprofiler_session_id_t session_id; + + rocmtools::rocmtool toolobj; + session_id = toolobj.CreateSession(ROCPROFILER_NONE_REPLAY_MODE); + rocprofiler_filter_id_t filter_id = + toolobj.GetSession(session_id) + ->CreateFilter(ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0, + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + toolobj.GetSession(session_id)->CreateBuffer(callback_fun, 0x9999); + toolobj.GetSession(session_id)->GetFilter(filter_id)->SetBufferId(buffer_id); + + + rocmtools::Session* session = toolobj.GetSession(session_id); + + EXPECT_TRUE(session->FindFilterWithKind(ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION)); + toolobj.DestroySession(session_id); +} + +TEST(WhenTestingApplicationReplayMode, TestSucceeds) { + std::vector counters; + counters.emplace_back("SQ_WAVES"); + rocprofiler_session_id_t session_id; + + rocmtools::rocmtool toolobj; + session_id = toolobj.CreateSession(ROCPROFILER_APPLICATION_REPLAY_MODE); + + rocprofiler_filter_id_t filter_id = + toolobj.GetSession(session_id) + ->CreateFilter(ROCPROFILER_COUNTERS_COLLECTION, + rocprofiler_filter_data_t{.counters_names = &counters[0]}, counters.size(), + rocprofiler_filter_property_t{}); + rocprofiler_buffer_id_t buffer_id = + toolobj.GetSession(session_id)->CreateBuffer(callback_fun, 0x8000); + toolobj.GetSession(session_id)->GetFilter(filter_id)->SetBufferId(buffer_id); + + rocmtools::Session* session = toolobj.GetSession(session_id); + + EXPECT_TRUE(session->FindFilterWithKind(ROCPROFILER_COUNTERS_COLLECTION)); + toolobj.DestroySession(session_id); +} \ No newline at end of file diff --git a/tests/unittests/profiler/tools/amdsys.cpp b/tests/unittests/profiler/tools/amdsys.cpp new file mode 100644 index 00000000..ca9abf51 --- /dev/null +++ b/tests/unittests/profiler/tools/amdsys.cpp @@ -0,0 +1,265 @@ +// TODO(aelwazir): To be checked + +#include "hip/hip_runtime.h" + +#include +#include +#include +#include + + +#define N 2560 +//change here to run this app longer +#define num_iters 1 + + +template +__global__ void kernel(double* x) { + for (int idx = threadIdx.x + blockIdx.x * blockDim.x; idx < N; idx += gridDim.x * blockDim.x) + { + #pragma unroll + for (int i = 0; i < n; ++i) + x[idx] += i * m; + } +} + +void cpuWork() { + // Do some CPU "work". + usleep(1000); +} + +inline void hip_assert(hipError_t err, const char *file, int line) +{ + if (err != hipSuccess) + { + fprintf(stderr,"HIP error: %s %s %d\n", hipGetErrorString(err), file, line); + exit(-1); + } +} + +#define hipErrorCheck(f) { hip_assert((f), __FILE__, __LINE__); } +#define kernelErrorCheck() { hipErrorCheck(hipPeekAtLastError()); } + +int main() { + + double* x; + double* x_h; + + size_t sz = N * sizeof(double); + std::cout << "running app....." << std::endl; + hipErrorCheck(hipHostMalloc(&x_h, sz)); + + memset(x_h, 0, sz); + hipErrorCheck(hipMallocManaged(&x, sz)); + hipErrorCheck(hipMemset(x, 0, sz)); + + hipStream_t stream; + hipErrorCheck(hipStreamCreate(&stream)); + + hipFuncAttributes attr; + + int blocks = 80; + int threads = 32; + int fact = 100; + for (int j = 0; j < num_iters; ++j) { + for (int n = 0; n < 25*fact; ++n) { + hipErrorCheck(hipMemcpyAsync(x, x_h, sz, hipMemcpyHostToDevice)); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,2>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,3>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,4>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,5>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,6>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,7>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,8>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,9>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,10>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,11>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,12>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,13>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,14>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,15>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,16>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,17>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,18>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,19>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,20>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,21>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,22>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,23>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,24>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,25>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,26>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,27>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,28>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,29>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,30>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,31>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,32>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,33>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,34>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,35>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,36>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,37>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,38>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,39>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<1,40>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipMemcpyAsync(x_h, x, sz, hipMemcpyDeviceToHost)); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 200*fact; ++n) { + hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast(kernel<10,1>))); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<10,1>), dim3(blocks), dim3(threads), 0, stream, x); + kernelErrorCheck(); + hipErrorCheck(hipStreamSynchronize(stream)); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 30*fact; ++n) { + for (int k = 0; k < 7; ++k) { + hipErrorCheck(hipFuncGetAttributes(&attr, reinterpret_cast(kernel<8,1>))); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<8,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + } + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 100*fact; ++n) { + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 100*fact; ++n) { + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,2>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,3>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,4>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<7,5>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 50*fact; ++n) { + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,2>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,3>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,4>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,5>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,6>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,7>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<6,8>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 50*fact; ++n) { + int val; + hipErrorCheck(hipDeviceGetAttribute(&val, hipDeviceAttributeMaxThreadsPerBlock, 0)); + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<4000,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + for (int n = 0; n < 50*fact; ++n) { + hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<5000,1>), dim3(blocks), dim3(threads), 0, 0, x); + kernelErrorCheck(); + hipErrorCheck(hipDeviceSynchronize()); + } + + hipErrorCheck(hipMemset(x, 0, sz)); + cpuWork(); + + hipErrorCheck(hipDeviceSynchronize()); + + } + + hipErrorCheck(hipHostFree(x_h)); + hipErrorCheck(hipFree(x)); + hipErrorCheck(hipStreamDestroy(stream)); + +} \ No newline at end of file diff --git a/tests/unittests/profiler/tools/tool_gtest.cpp b/tests/unittests/profiler/tools/tool_gtest.cpp new file mode 100644 index 00000000..e52ac8d2 --- /dev/null +++ b/tests/unittests/profiler/tools/tool_gtest.cpp @@ -0,0 +1,37 @@ +/* Copyright (c) 2022 Advanced Micro Devices, Inc. + + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. */ + +#include + +#include + +#include "utils/helper.h" + +TEST(WhenTrucatingLongKernelNames, KernelNameGetsTruncatedProperly) { + std::string long_kernel_name = + "void kernel_7r_3d_pml<32, 8, 4>(long long, long long, long long, int, " + "int, int, long long, long long, long long, long long, long long, long " + "long, long long, long long, long long, float, float, float, float " + "const*, float*, float const*, float*, float const*) [clone .kd]"; + + std::string trunkated_name = rocmtools::truncate_name(long_kernel_name); + + EXPECT_EQ("kernel_7r_3d_pml", trunkated_name); +}