OTEP: Allow multiple Resources in an SDK#4665
OTEP: Allow multiple Resources in an SDK#4665jsuereth wants to merge 26 commits intoopen-telemetry:mainfrom
Conversation
| - `Resource` *remains immutable* | ||
| - Building on [OTEP 264`](0264-resource-and-entities.md), identifying attirbutes | ||
| are clearly outlined in Resource going forward, addressing unclear real world usage of Resource attributes ([e,g, identifying attributes in OpAMP](https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionidentifying_attributes)). | ||
| - SDK will be given an explicit initialization stage where `Resource` is not in a complete state, addressing OpenTelemetry JS concerns around async resource detection. |
There was a problem hiding this comment.
I think this concern is relevant to any case where you need to begin collecting telemetry before an async resource detection is resolved. Even in languages which can block, it may be necessary in situations like lambda functions to begin immediately collecting telemetry without blocking startup in order to prevent adding latency to user applications.
There was a problem hiding this comment.
True - I think it's just IMPOSSIBLE in today's specification where you can't block.
I'll rephrase
| Internally, the entities discovered via resource detection MUST be merged in | ||
| the order provided to create the initial resource. |
There was a problem hiding this comment.
in the order provided does this refer to the order in the detectors list parameter?
There was a problem hiding this comment.
Yes - will clarify
| ### What is the expected impact on Collector components? | ||
|
|
||
| How should standard Collector processors (like the attributes processor, | ||
| resource processor, or filter processor) be updated to interact with Entities |
There was a problem hiding this comment.
Should this also include the transform processor?
There was a problem hiding this comment.
These are just examples, but YES - I think anything which interacts with OTTL is in-scope to be looked at.
|
|
||
| TODO: **In Progress** | ||
|
|
||
| ## Future possibilities |
There was a problem hiding this comment.
For me another future use case would be a way to describe connections between scopes. For instance this instrumentation scope is a continuation of session xyz which is a use case for browser/mobile.
This also applies to cicd where a pipeline task run scope could be a child of the pipeline run scope.
Potentially Having this described in the messages makes it easier for vendors to implement as well as enabling other signals/sigs to leverage it due to the generic approach.
| Going forward, these operations (`Get a Logger`, `Get a Meter`, `Get a Tracer`) | ||
| will be updated to include an additional optional parameter: | ||
|
|
||
| * `entities` (optional): Specifies the `Entity` set to associate with |
There was a problem hiding this comment.
Do I understand it correctly that every time an entity changes (including if only descriptive attributes change) you need to re-obtain a new Meter/Tracer? Or do we allow the entity to mutate and that mutation is reflected in subsequent exporting of data exported via existing Meter/Trace?
There was a problem hiding this comment.
So if the identity of the entitty is different, it's a different entity. You need a different Meter/Trace by identity.
What to do about descriptive attributes is an open question. I'm inclined to not update them right now, as I think this is what the Entity-Signal should be doing going forward, and Entities used in Resource or scope should only be leveraging descriptive attributes that are "unlikely to change, but valuable".
|
|
||
| - Reporting data against "mutable" or "changing" entities, where `Resource` | ||
| is currently defined as an "immutable" identity in the data model. | ||
| - Providing true multi-tenant capabilities, where, e.g. metrics about one tenant |
There was a problem hiding this comment.
I am confused by the usage of word "tenant" here. Do you refer to instrumentation libraries as tenants here?
There was a problem hiding this comment.
No, a tenant would be something beyond that.
E.g. let's look at older Java Application Servers. One AppServer may have multiple Applications.
The instrumentation library may be for "jboss" or "tomcat" or whatever the app server is. The tenant would be the application itself. I'd instantiate one scope for each tenant vs. one for the entire app server in this world.
|
Does this OTEP imply removing EntityRefs from the Resource in proto? |
|
I like this direction. |
| ### Protocol Details | ||
|
|
||
| The OpenTelemetry Protocol (OTLP) will be modified such that | ||
| `InstrumentationScope` will support carrying `Entity` objects. There are two |
There was a problem hiding this comment.
Why does this need to involve a data model change? Couldn't we just emit multiple resources?
There was a problem hiding this comment.
Great question. I'll actually add it to open questions.
Pros of data model change:
- Entity used in Scope can leverage context of Resource (e..g a process in the scope of a host), as entities only have locally-unique ids.
- We have less repeated data in OTLP
- We preserve the Resource -> InstrumentationScope -> Signal model for
{Signal}Provider -> {Signal} -> {operation}we have in our API today.
Cons of data model change:
- Many implementations ignore
scopetoday, and its usage for identity is still relatively new to the data model. This would increase the need for them to interact with scope. - Entity information would be split between resource + scope. When would two metric streams be the same, e.g.
- a Resource with Entity A + B and empty Instrumentation scope
- a Resource with Entity A and Instrumentation Scope with Entity B
I'm not sure I have strong opinion here, but I still lean towards the data model change.
There was a problem hiding this comment.
I'm also concerned about the additional complexity introduced in this OTEP. From the user’s perspective, recreating entities from EntityRefs is already not straightforward. Adding yet another source to retrieve them creates even more overhead for processing entities in the Collector and reconstructing them on the backend. I would therefore favor a solution that does not require adding EntityRef to the Instrumentation Scope.
smith
left a comment
There was a problem hiding this comment.
Just a few typo suggestions. There's a lot here but it's an understandable summary of the proposal.
| - `MeterProvider` MUST treat entity found on InstrumentationScope as identifying, | ||
| an aggregate reported events separately by scope. Note: this is the case in | ||
| the specification today, however many implementations do not yet respect | ||
| InstrumentationScope loose |
There was a problem hiding this comment.
I think something got lost here,
however many implementations do not yet respect InstrumentationScope [and] lose...
|
|
||
| The `ResourceListener` MUST provide the following operations in the SDK: | ||
|
|
||
| * `On Resource Initialize` |
There was a problem hiding this comment.
Why does this one have a space and the other two do not?
There was a problem hiding this comment.
Probably my fat fingers hitting space by accident, will fix :)
No. EntityRef in resource would interact with Entity on InstrumentationScope, and we need to sort out how. |
| - Building on [OTEP 264](0264-resource-and-entities.md), identifying attributes | ||
| are clearly outlined in Resource going forward, addressing unclear real world usage of Resource attributes ([e,g, identifying attributes in OpAMP](https://github.com/open-telemetry/opamp-spec/blob/main/specification.md#agentdescriptionidentifying_attributes)). | ||
| - SDK will be given an explicit initialization stage where `Resource` is not in a complete state, addressing OpenTelemetry JS concerns around async resource detection. | ||
| - `InstrumentationScope` is enhanced with `Entity` similar to `Resource`. |
There was a problem hiding this comment.
If there is an entity on the scope, is there a specific implied relationship to the entities on the resource? In the call Thursday you used “runs-on”, but there could be a more generic way to phrase it that makes it more clear to everyone when you should use a scope attribute? “Nested-within” is sort of what i’m going for, but i can’t think of a good generic relationship.
There was a problem hiding this comment.
I agree there's some term here. "runs-on", "in-context-of" or "nested-within" are all options. Yes - there's some kind of implicit relationship here.
|
|
||
| Should we consider this a failure or a feature? | ||
|
|
||
| For simplicity in modeling, we plan to prototype where this is *disallowed* |
There was a problem hiding this comment.
If an instrumentation creates a scope with an entity, how do they know if there is already that entity on the resource? It may be impossible for the instrumentation author to know if they are following this restriction or not.
For example, an instrumentation which monitors memory usage wants to ensure there is a process entity to report against and calls meter.for(processEntity) but the top-level resource detector may or may not have already added a process entity.
There was a problem hiding this comment.
That's the open question.
My naive initial proposal is - that would turn into a failure. We basically have two options:
- IT's a failure, we consider that use case problematic
- We create a "smart"
meter.for(entity), where any conflicts creates a new "identity" (combination of resource + scope) such that the conflicting higher-level is ignored for the lower-level. (So with Resource + Scope, the Scope entity overrides Resource, if we decide to flatten resource as @dashpole suggests, the Meter-level entity would override the global entity when reportingResource.
I'm actually fine with either, caveat if we go with (2) I'm leaning against having Entity on Scope and instead have a 'layered' Resource in the SDK.
dyladan
left a comment
There was a problem hiding this comment.
I've been thinking about the meeting yesterday and I want to write down my understanding of what we discussed and how I'm currently thinking about it. Below in no particular order are some of the discussion points we hit.
No entity on scope
Instead of adding entity to InstrumentationScope, we discussed adding it directly to the core resource as a "layer" and reporting multiple resources with their associated telemetry. One possibility for lifetime management is by creating new providers and using the existing shutdown methods.
// this reports metrics against the "core" resource
const meterProvider = new api.metrics.MeterProvider();
// Scope a provider to a session entity
// This "layers" a new resource on top of the core resource using the existing merge rules
const sessionEntity = getEntityForSession();
const sessionMeterProvider = meterProvider.for(sessionEntity);
// TODO: can providers be arbitrarily layered? e.g
// const pageViewMeterProvider = sessionMeterProvider.for(getPageViewEntity());
// Use the scoped provider to report metrics against the session entity
sessionMeterProvider.getMeter('example-meter').createCounter('example_counter').add(1);
// use the core provider to report metrics which are not specific to the session
meterProvider.getMeter('example-meter').createCounter('example_counter').add(1);Does the provider need to listen to lifetime events?
The current proposal assumes the *Provider receives the EntityInitialized event and does something. It's not clear to me what that something is intended to be. The current specification requires processors to be called synchronously with API method invocations. For example SpanProcessor.onStart() is invoked synchronously with tracer.StartSpan(). Because of this, any spans started before the resource is resolved still need to have some sort of resource attached.
In JS this is solved by allowing individual attributes of the resource to be a Promise<AttributeValue>. It is incumbent upon the processor to ensure all resource attributes are resolved before first export (by awaiting resource.waitForAsyncAttributes(). Accessing attributes before async resource is resolved results in a logged warning. Resources are merged together in the order their detectors are configured. One drawback of the current strategy is that you have to know all possible attribute keys in advance. This could be improved by allowing the key to also resolve asynchronously.
If both above suggestions are accepted, using provider.for() to merge entities and allowing async resource attributes, I believe there may be no need for a ResourceProvider or ResourceInitializer.
- Resources are merged in the SDK before being passed to the
*Providerconstructors, async resource attributes continue to resolve even after this occurs - New entities, including potentially async attribute values, are layered on the resource using
provider.for(...). - The export pipeline ensures all resource attributes are resolved before any export. If resource is resolved, wait is a noop. If a new entity was layered on top of the core resource with async attributes, this mechanism continues to work.
This is actually how the JS entity prototype currently works, except that *Provider.for is not yet implemented.
| `On Change` registers a `ResourceListener` to be called when a Resource has | ||
| been detected or detection has failed. | ||
|
|
||
| If the `EntityProvider` is already initialized, then it MUST call |
There was a problem hiding this comment.
EntityProvider is not defined anywhere?
|
|
||
| The SDK is updated to, explicitly, include three new components: | ||
|
|
||
| - `ResourceInitializer`: A formalized component that is responsible for |
There was a problem hiding this comment.
The name ResourceInitializer implies it handles only initialization and not the rest of the lifetime. Do we want to consider a name like ResourceManager to leave the door open for future extensions like change/add/remove entities?
type ResourceStatus = 'detecting' | 'initialized';
type ResourceChangeCallback = (
resource: Resource,
status: ResourceStatus
) => void;
interface ResourceProvider {
readonly status: ResourceStatus;
onResourceInitialize(callback: ResourceChangeCallback): void;
onResourceChange(callback: ResourceChangeCallback): void;
}|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
| The set of `Entity` provided to these operations MUST only include one `Entity` | ||
| per `type`. |
There was a problem hiding this comment.
What should the API/SDK do if this is not being followed? Consider rephrase it to "the API MUST" or "the SDK MUST" so the requirement is clear.
| - If `ForceFlush` or `Shutdown` is called on a `Provider` it MUST also flush all | ||
| of its child `Entity Bound Providers`. |
There was a problem hiding this comment.
What does this mean? Is "flush a child provider" a defined behavior?
| ### SDK Details | ||
|
|
||
| When `For Entity` operation is received by a provider, A new child | ||
| `Entity Bound Provider` of the same type MUST be created and returned with the |
There was a problem hiding this comment.
I think we should have this return an API provider and not an SDK provider (it is unclear which is meant here, and the restrictions below make it seem like it is an SDK provider). This is required to prevent the API's For Entity operation from depending on the SDK.
That would remove some of the more confusing restrictions below, since the API does not include ForceFlush or Shutdown.
There was a problem hiding this comment.
Agree, however it may be impractical to do so in code.
One major concern I have here is that we need something to "close" or otherwise release the "sub" provider so that memory can be reclaimed. This originally proposed using the shutdown method for that purpose. We may need to do something the API for this.
There was a problem hiding this comment.
One major concern I have here is that we need something to "close" or otherwise release the "sub" provider so that memory can be reclaimed. This originally proposed using the shutdown method for that purpose. We may need to do something the API for this.
My proposal was to model this after metric callback registrations, and return a mechanism to "UnBind" the entity binding. I'm a bit hesitant to call this shutdown, so that it isn't confused with the actual shutdown.
Co-authored-by: Nathan L Smith <nathan.smith@elastic.co>
1b501d1 to
7bf8a73
Compare
|
Go prototype: open-telemetry/opentelemetry-go#7723. I took some creative liberties, but the idea of an entity-bound provider works in the Go SDK from what I can tell. My main realization is that we can't include the Bind function in the API since it would add a dependency on resource, which is part of the SDK. |
|
This PR was marked stale. It will be closed in 14 days without additional activity. |
| # Multiple Resource in SDK | ||
|
|
||
| Allow multiple `Resource` instances per SDK. |
There was a problem hiding this comment.
Is this allowing multiple resources? It seems like this is still requiring one resource, but allowing entities to be associated with providers.
It seems like this title needs to be updated.
There was a problem hiding this comment.
When you create a new provider with a new entity, effectively you get a new resource you're reporting against.
@dashpole can you add this comment somewhere on the diff so we can keep track of it as a thread? If this is a deal breaker it is a big problem for the OTEP as written because instrumentations are expected to be able to bind Entities to *Providers with a dependency on only the API. The bind function need only accept an Entity, not a Resource. The Resource and how the Entity is merged into it is considered to be an SDK detail not exposed via API. Is it not possible to add a bind method to the API which accepts a new type? |
|
Approving this after reviewing the Java POC, where |
|
JS prototype now up to date open-telemetry/opentelemetry-js#6357 |
| usable as an identity independent of `Entity`. | ||
| - Consumers should now expect SDKs reporting multiple resources in the same | ||
| batch. Theoretically, this SHOULD already be supported due to how OTLP is | ||
| deisgned to allow aggregation / batching of data at any point. |
There was a problem hiding this comment.
| deisgned to allow aggregation / batching of data at any point. | |
| designed to allow aggregation / batching of data at any point. |
| the configuration object between `Providers`. | ||
| - A `Bound Provider` MUST NOT be directly configurable. | ||
| All configuration comes from its parent. | ||
| - If `ForceFlush` or `Shutdown` is called on a `Provider` it MUST also flush all |
There was a problem hiding this comment.
Does this mean that a Parent Provider needs to keep track of its child Entity Bound Providers?
| following restrictions: | ||
|
|
||
| - `Entity Bound Provider` MUST be associated with a newly created `Resource` | ||
| which is the result of the incoming `Entity` set merged into the original |
There was a problem hiding this comment.
What happens when the Entity is no longer valid? For example, you mention Session could be an entity in a Browser, but if the session ends, do you also need to close/flush the bounded provider? And related to https://github.com/open-telemetry/opentelemetry-specification/pull/4665/changes#r2858808978, do you also need to tell the parent provider that the child entity is no longer needed?
|
If I understand correctly, this proposal works when the caller of In addition, when a browser session is rotated, the new session entity needs to be updated for all instrumentations. Instrumentations (typically) use the globally-registered providers. They would all either need to be notified when it changes, or the global provider itself would need to allow the entity to be updated. Is this use case purposefully excluded from the proposal? Would this be part of the API or SDK? How would this work with multiple entities? Is the intent to chain |
jack-berg
left a comment
There was a problem hiding this comment.
Approve, but having learned that the browser SIG doesn't think this solves their problem, I don't think this is high enough priority to act on right now.
reyang
left a comment
There was a problem hiding this comment.
I support the direction of this OTEP. I've left several comments, but these are not blocking, we can sort these out when we actually hammer out the specification.
Alternative to #4316 -
We propose allowing
{Signal}Providerto be exended with additional Entity, instead of allowing Resource to be mutable as a forward path for modelling "session" in browsers/devices. Additionally, we believe this will help with multi-tenant telemetry use cases.E.g. This OTEP proposes the following
instead of: