Skip to content

docs: add 0033 agent registry#96

Open
zmstone wants to merge 18 commits intomainfrom
0033-agent-reg
Open

docs: add 0033 agent registry#96
zmstone wants to merge 18 commits intomainfrom
0033-agent-reg

Conversation

@zmstone
Copy link
Member

@zmstone zmstone commented Feb 3, 2026

No description provided.

@zmstone zmstone force-pushed the 0033-agent-reg branch 2 times, most recently from 41318a5 to 2c448be Compare February 3, 2026 11:10

2. **Schema Validation**: All Agent Cards are validated against a JSON schema
before acceptance, preventing malformed or malicious registrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could reuse the existing schema validation feature? Though, this specific validation would show up in the validations list. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

{allow, all, publish, ["a2a/v1/request/#"]}.

# Allow each client to receive only its own responses
{allow, all, subscribe, ["a2a/v1/response/${username}/#"]}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: this implies that username should be $org_id/$unit_id/$agent_id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


# Allow each client to receive only its own responses
{allow, all, subscribe, ["a2a/v1/response/${username}/#"]}.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each client should be restricted to registering only its username, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true.

Comment on lines +164 to +176
"oauth2": {
"oauth2SecurityScheme": {
"description": "OAuth2 for agent invocation.",
"flows": {
"clientCredentials": {
"tokenUrl": "https://id.example.com/oauth2/token",
"scopes": {
"a2a:invoke": "Invoke A2A operations."
}
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what kind of "auth" it is, and thus a bit hard to imagine how it's supposed to work. Is it agent-to-agent authentication? Why do we need out-of-band (e.g. OAuth / HTTPS) mechanism for that, to adopt to some emerging practices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for requester agent to authenticated by responder agent.

typical steps:

  1. Requester authenticate itself with EMQX
  2. Requester discvoer responder
  3. Requester authenticate itself with responder (out-of-band)
  4. Requester send requests
  5. and so on, and so on,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see use case 01

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
"protocolBinding": "MQTT5+JSONRPC",
"protocolVersion": "1.0",
"url": "mqtts://broker.example.com:8883/a2a/v1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Since this is a protocol for in-broker communication, I wonder why it's a full URL, which an agent needs to know beforehand?

Copy link
Member Author

@zmstone zmstone Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the /a2a/v1 suffix, clarified why url is needed in notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 reasons:

  1. maybe reroute traffic for affinity in a cluster (not a good reason for EMQX)
  2. the card can be discovered externally (e.g. in a http endpoint, then use EMQX as transport)

Comment on lines +141 to +150
{
"uri": "urn:a2a:mqtt-profile:v1",
"description": "Broker registry metadata extension.",
"required": false,
"params": {
"securityMetadata": {
"jwksUri": "https://keys.example.com/.well-known/jwks.json"
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raises a lot of questions. Maybe it's best to illustrate with a completely artificial "example" extension without focus on vague security concerns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those stories seem to concern only agents themselves rather than the registry, right?

Whether the agents encrypt their payloads or not, whether they authenticate with each other, etc., doesn't seem to affect the registry itself, if I'm understanding it correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cards (including extensions in the cards) are only used by agents.

broker should validate the cards agains the schema before accepting the registration. so the card schema is not completely transparent to broker.

4. **Automatic Lifecycle Reflection**: Broker-managed status attached via MQTT
v5 User Properties (`a2a-status`, `a2a-status-source`) when forwarding
discovery messages
5. **Security Metadata**: Agent Cards include public key / JWKS metadata for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what for, mostly "untrusted public broker" scenarios? Then the trust should be managed elsewhere I imagine: e.g. a trusted agent means that the recipient has pre-existing knowledge of this agent's identity, perhaps a certificate or public key. Another option I guess is to perform challenge-response authentication during Agent Card registration, if the agent runs on a well-known domain.

Anyway I feel that might be better separated into its own section in the document or even a separate EIP.

Copy link
Member Author

@zmstone zmstone Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about trust between agents after they discover each other and authenticate each other.

It's the sensitive payload being plaintext to a broker, which may have a subscriber subscribed to a2a/#.

```

Where:
- `a2a/v1/discovery`: A2A discovery prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: More of a general question, if we want agent to discover other "actors" running on this broker, for example sensors, bridges, control and/or feedback channels, do we need another prefix and separate convention ("profile") for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not in this EIP proposal, technically we can create a generic Context Registry and admin or clients can register any information for agents to discover.

Comment on lines +303 to +307
- **Optional Binary Artifact Mode**: Requesters MAY set
`a2a-artifact-mode=binary` to receive chunked binary artifacts. Binary chunks
include required metadata (`a2a-event-type`, `a2a-task-id`,
`a2a-artifact-id`, `a2a-chunk-seqno`, `a2a-last-chunk`) and use payload format
indicator `0` with appropriate `Content Type`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps refer to File Transfer instead? Unless I'm misunderstanding the purpose, mostly because it's hard to imagine how LLM-based agents would handle assembling and then processing hundred-megabytes large binary blobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's like requesting to download large size artifact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can maybe support File Transfer in a separate EIP.
This is aiming to implement the broker neutral A2A-over-MQTT spec.

Comment on lines 296 to 299
- **Optional Shared Pool Dispatch**: Requesters MAY publish to
`a2a/v1/request/{org_id}/{unit_id}/pool/{pool_id}`. Pool members consume via
shared subscriptions and responders MUST include `a2a-responder-agent-id` in
pooled responses so requesters can route follow-ups to the concrete responder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Perhaps just allow arbitrary topic tail, and mandate responders to always include both this tail and a2a-responder-agent-id? To avoid pulling pool-related specifics in the protocol.

replies to the provided reply topic using QoS 1 and MUST echo
`Correlation Data`. `Correlation Data` is transport correlation and MUST NOT
be used as an A2A task id. For newly created tasks, responders MUST return a
server-generated `Task.id`, which requesters use for subsequent operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: In a User Property? Perhaps it should be a2a-responder-task-id then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task.id is a part of a2a payload.

Comment on lines +157 to +158
"protocolBinding": "MQTT5+JSONRPC",
"protocolVersion": "1.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Also a bit unclear:

  1. Why JSON-RPC?
  2. Why not JSON-RPC 2.0?
  3. How do all the concepts map into each other: JSON-RPC id vs MQTT Correlation Data, JSON-RPC method vs Agent Skill.

Granted it's mostly protocol details but I think it should help building a coarse but consistent understanding of the feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is A2A protocol version, not JSON-RPC.

For newly accepted cards, EMQX persists the retained card payload without
injecting broker-managed status fields. Invalid registrations are rejected with
a PUBACK reason code indicating the validation error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If agent A registers itself successfully, and later agent B publishes a card to the same topic as agent A's, does agent B "take over" and EMQX starts tracking agent B's connector for liveness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each agent can only register to the self-topic (with id in the topic).

Copy link
Contributor

@thalesmg thalesmg Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the broker know that a certain id belongs to the clientid publishing the card?

in the sections above, it's only recommended that either/both clientid and username follow the proposed convention. what if the publishing client does not follow it?

it sounds like it cannot be enforced if they don't abide to the recommendation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, maybe move username constraints to clientid. i'll update the spec and proposal
or do you have some other idea ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue arises from the convention being just recommended instead of required, and the text already suggests that clientid and/or username should follow it:

Recommended MQTT identity mapping:

  • Client ID or Username format: {org_id}/{unit_id}/{agent_id} (do not include / in the IDs)

if it were required/enforced, then indeed we could actually make such a check if inferred id from topic mathes inferred id from clientid/username. otherwise, we can't reject any client from publishing the a certain card topic that was already claimed by an agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, if we don't enforce it, then maybe we could at least speculate what to do in such cases where someone pushes a new card that should belong to a live agent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll tighten it up as a MUST for Client ID.

discovery messages to subscribers:

- `a2a-status = online` when registration is accepted or agent is active
- `a2a-status = offline` when agent is observed offline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that an agent's liveness must be tracked across the cluster, by all nodes, it seems.
I recall seeing something about requiring usage of Last-Will-Testament to track their liveness, but now we'll then have to track then implicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, latest proposal requires the broker to update agent status in the retained storage after a client goes offline.

the original proposal was to use will message with no payload to act as a DELETE op for the retained card.
it will require the registration to be mandatory after each CONNECT for each agent like the BIRTH messages in spb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, latest proposal requires the broker to update agent status in the retained storage after a client goes offline.

do you mean in the index? the 2nd paragraph above this list claims that we treat the retained messages as immutable (would help avoid transactions/races as well, specially since we use dirty ops for them):

EMQX does not auto-clean retained cards when an agent disconnects and does not
mutate Agent Card payloads for status.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not have to mutate the retained, message, just inject user properties before sending it to the subscriber.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. it's a retained in protocol, but I would not recommend EMQX to store it as retained.
should instead create a distributed table to store the cards and status.
this will make the HTTP interface easier to implement.
we will probably also need some different (vs retainer) indexing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... another complication from keeping it separate is that it might introduce a certain coupling between retainer and this app: retainer will need to ignore subscriptions to discovery topics if a2a registry is enabled, otherwise work as usual. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emqx should not let $a2a messages enter retainer.

the spec only defines MQTT client/broker contract, but not management APIs of the broker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emqx should not let $a2a messages enter retainer.

even if a2a_registry is disabled? that would somewhat break the spec for those specific topics, I think. this is the coupling I was referring to.

also, it sounds like it could still be a bit confusing if the user attempts to clear all retained messages in hopes of clearing all cards (since they are retained from the MQTT point of view), but that would have no effect on the cards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, maybe with feature flag enable=false we should allow $a2a as regular messages passing through.

publish with retained flag but empty payload should make emqx delete the card.
but "clear all retained messages" is a management API from EMQX? which is not specified in any spec, if we have provided management UI for cards, and provided API to "clear all cards". i do not think it would be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, maybe with feature flag enable=false we should allow $a2a as regular messages passing through.

Right, in order to preserve the consistency of retained message behavior, we would need to do that. What is a bit unfortunate is that it introduces coupling between the apps: emqx_retainer will nee to consult a configuration of a foreign app to decide whether it should handle the subscription.

but "clear all retained messages" is a management API from EMQX?

yes, I mean this one (and the equivalent cli counterpart):

2026-02-26_13-09

It might not specified in any spec. But, from the point of view of an user that publishes a retained message, and does observe the effect of subscribing to a topic containing said message and receiving the retained message, it could be jarring to use an API called "clear all retained messages", and then continue to observe the "retained" message being delivered when they subscribe to said topic.


#### 4. Broker-Managed Status via MQTT User Properties

EMQX does not auto-clean retained cards when an agent disconnects and does not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMQX does not auto-clean retained cards when an agent disconnects

should the cards survive a full cluster restart? i.e., should they be persisted to disk?

also, I assume we will need to respect Message-Expiry-Interval like an ordinary retained message, and have some kind of gc just like retainer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be updated to respect expiry interval. after all MQTT v5 is a MUST per spec.
Auto-clean was previous design (delete with will message), updated to 'does not auto-clean', but forgot about retained message expiry interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants