Skip to content

Conversation

@edopao
Copy link
Contributor

@edopao edopao commented Jan 16, 2026

This PR does not add any new feature, it only changes the way to enable an existing feature introduced in #2345.

The COLLECT_METRICS_LEVEL is supposed to be a runtime configuration, to enable or disable collection of the metrics. Besides, the current GPU trace is not producing any metric.

The NVTX/ROC-TX traces require to introduce some calls in the generated code. Therefore we need a separate configuration variable, to be checked at lowering/compile time, to allow the user to introduce the GPU TX markers in the generated code.

Copy link
Contributor

@iomaganaris iomaganaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this properly 👍

@egparedes egparedes changed the title feat[next]: Add config variable GT4PY_ENABLE_GPU_TRACE refactor[next]: Add config variable GT4PY_ENABLE_GPU_TRACE Jan 16, 2026
)

if (config.COLLECT_METRICS_LEVEL == metrics.GPU_TX_MARKERS) and gpu:
if gpu and config.ENABLE_GPU_TRACE:
Copy link
Contributor

@egparedes egparedes Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with the workflow cache(s) key(s)? I mean the code generated with traces should have a key in the cached steps different than the code generated without traces, but it doesn't seem to be any state in the code generator tracking this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, it is a limitation the cache in gt4py-next. The same limitation applies to other configuration options such as CMAKE_BUILD_TYPE and UNSTRUCTURED_HORIZONTAL_HAS_UNIT_STRIDE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants