Skip to content

Conversation

@dimaqq
Copy link
Contributor

@dimaqq dimaqq commented Jul 3, 2025

This PR adds manual instrumentation to the apt charm library

  • wraps every subprocess calls in a span
  • wraps potentially large file operations in spans

The rationale for selecting these very points to instrument:

  • what's done to the system is recorded
  • if something even potentially takes a long time, it's recorded.

The charm library now requires opentelemetry-api (any version).

@dimaqq dimaqq marked this pull request as ready for review July 4, 2025 06:37
@dimaqq
Copy link
Contributor Author

dimaqq commented Jul 4, 2025

cc @Perfect5th and @chanchiwai-ray who contributed to the library.

Copy link
Contributor

@tonyandrewmeyer tonyandrewmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding spans around the top-level public functions? (Not the ones in classes, just the functions at the module level.) It feels like that's the equivalent of putting in an __init__ span in an Object style lib -- and that it would be useful to know when these things (adding a package, removing a package, etc) is happening - basically at the point(s) you're entering this lib.

I assume no new tests because it's awkward to do that when this doesn't use Scenario, and we're generally happy to trust that adding tracing spans like this is very low risk.

If I use this with an old version of ops, do the traces just silently stay in memory and then vanish after the hook? Do any warnings or anything show?

@dimaqq
Copy link
Contributor Author

dimaqq commented Jul 4, 2025

What about adding spans around the top-level public functions?

I've decided against that, because these functions lead directly to a subprocess call or file modification.

For a simple enough charm that only needs a single package from apt (hardcoded package name), it should be pretty obvious what happens. If some charm needs a couple of packages, it's still relatively easy to see what package was affected.

Only if some charm needs to handle many packages, or if there's another charm lib that wraps this (and package name is parametrised), would it really help.

I think that in those cases, that caller code should be instrumented instead.

Basically, I'm following the same pattern we've settled on in ops.

Do you buy this argument, @tonyandrewmeyer ?

@tonyandrewmeyer
Copy link
Contributor

Do you buy this argument, @tonyandrewmeyer ?

Sure.

Copy link
Contributor

@james-garner-canonical james-garner-canonical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about introducing a new dependency here.

For a charm library that already uses ops and expect a reasonably up-to-date ops version, adding tracing doesn't technically introduce a new dependency (since ops now uses opentelemetry-api).

However, this library previously had zero dependencies outside the stdlib, and didn't use ops at all. So in this case, we are introducing a dependency. This could be problematic, as this is a Charmhub-hosted lib, meaning that users will need to rely on docs / PYDEPS.

We know that charms sometimes pin older versions of ops, and that charmcraft fetch-libs is used to ensure Charmhub-hosted libraries are kept up-to-date. If they're not pinning apt by patch version (and they presumably aren't), this change could cause some friction.

Treating tracing as a soft dependency might be the best of both worlds, importing it if available, and falling back to a lightweight stub implementation otherwise. The downside is the extra setup code, but I think it would be worth it to avoid any breakages.

Perhaps something like:

import contextlib

class _FakeSpan:
    def set_attribute(self, *args):
        return

class _FakeTracer:
    @contextlib.contextmanager
    def start_as_current_span(self, *args):
        yield _FakeSpan()

try:
    import opentelemetry.trace
except ImportError:
    tracer = _FakeTracer()
else:
    tracer = opentelemetry.trace.get_tracer(__name__)

@dimaqq
Copy link
Contributor Author

dimaqq commented Jul 9, 2025

Re: extra dependency and shims.

If we wanted to do that, we'd do so in ops instead.

Recall the general direction is that eventually all charms should be traced.

Additionally, latest version of ops (same major series) should always be safe to use, thus pinning ops to some arbitrary version should not be a thing. If some charm requires that, let them come to us and explain why.

@dimaqq dimaqq merged commit b6bf7ae into canonical:main Jul 16, 2025
8 checks passed
@dimaqq dimaqq deleted the feat-instrument-the-apt-lib branch July 16, 2025 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants