-
Notifications
You must be signed in to change notification settings - Fork 93
feat: add dag-jose format #269
Changes from all commits
6514c0d
9cd7771
28edd8d
ef94dc5
575840a
81be217
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # Specification: DAG-JOSE | ||
|
|
||
| **Status: Descriptive - Draft** | ||
|
|
||
| JOSE is a standard for signing and encrypting JSON objects. The various specifications for JOSE can be found in the [IETF datatracker](https://datatracker.ietf.org/wg/jose/documents/). | ||
|
|
||
| ## Format | ||
|
|
||
| The are two kinds of JOSE objects: JWS ([JSON web signature](https://datatracker.ietf.org/doc/rfc7515/?include_text=1)) and JWE ([JSON web encryption](https://datatracker.ietf.org/doc/rfc7516/?include_text=1)). These two objects are primitives in JOSE and can be used to create JWT and JWM objects etc. The IETF RFCs specify a JSON encoding of JOSE objects. This specification maps the JSON encoding to CBOR. Upon encountering the `dag-jose` multiformat implementations can be sure that the block contains dag-cbor encoded data which matches the IPLD schema we specify below. | ||
|
|
||
| ### Mapping from the JOSE general JSON serialization to dag-jose serialization | ||
|
|
||
| Both JWS and JWE supports three different serialization formats: `Compact Serialization`, `Flattened JSON Serialization`, and `General JSON Serialization`. The first two are more concise, but they only allow for one recipient. Therefore DAG JOSE always uses the `General Serialization` which ensures maximum compatibility with minimum ambiguity. Libraries implementing serialization should accept all JOSE formats including the `Decoded Representation` (see below) and convert them if necessary. | ||
|
|
||
| To map the general JSON serialization to CBOR we do the following: | ||
|
|
||
| - Any field which is represented as `base64url(<data>)` we map directly to `Bytes` . For fields like `header` and `protected` which are specified as the `base64url(ascii(<some json>))` that means that the value is the `ascii(<some json>)` bytes. | ||
| - For JWS we specify that the `payload` property MUST be a CID, and we set the `payload` of the encoded JOSE object to `Bytes` containing the bytes of the CID. For applications where an additional network request to retrieve the linked content is undesirable then an `identity` multihash should be used. | ||
| - For JWE objects the `ciphertext` must decrypt to a cleartext which is the bytes of a CID. This is for the same reason as the `payload` being a CID, and the same approach of using an `identity` multihash can be used, and most likely will be the only way to retain the confidentiality of data. | ||
|
|
||
| Below we present an IPLD schema representing the encoded JOSE objects. Note that there are two IPLD schemas, `EncodedJWE` and `EncodedJWS`. The actual wire format is a single struct which contains all the keys from both the `EncodedJWE` and the `EncodedJWS` structs, implementors should follow [section 9 of the JWE spec](https://tools.ietf.org/html/rfc7516#section-9) and distinguish between these two branches by checking if the `payload` attribute exists, and hence you have a JWS; or the `ciphertext` attribute, hence you have a JWE. | ||
|
|
||
| **Encoded JOSE** | ||
|
|
||
| ```ipldsch | ||
| type EncodedSignature struct { | ||
| header optional {String:Any} | ||
| protected optional Bytes | ||
| signature Bytes | ||
| } | ||
|
|
||
| type EncodedRecipient struct { | ||
| encrypted_key optional Bytes | ||
| header optional {String:Any} | ||
| } | ||
|
|
||
| type EncodedJWE struct { | ||
| aad optional Bytes | ||
mikeal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ciphertext Bytes | ||
| iv optional Bytes | ||
| protected optional Bytes | ||
| recipients [EncodedRecipient] | ||
| tag optional Bytes | ||
| unprotected optional {String:Any} | ||
| } | ||
|
|
||
| type EncodedJWS struct { | ||
| payload optional Bytes | ||
| signatures [EncodedSignature] | ||
| } | ||
| ``` | ||
|
|
||
| ## Padding for encryption | ||
|
|
||
| Applications may need to pad the cleartext when encrypting to avoid leaking the size of the cleartext. This raises the question of how the application knows what part of the decrypted cleartext is padding. In this case we use the fact that the cleartext MUST be a valid CID, implementations should parse the cleartext as a CID and discard any content beyond the multihash digest size - which we assume to be the padding. | ||
|
|
||
|
|
||
| ## Decoded JOSE | ||
|
|
||
| Typically implementations will want to decode this format into something more useful for applications. Exactly what that will look like depends on the language of the implementation, here we use the IPLD schema language to give a somewhat language agnostic description of what the decoded representation might look like at runtime. Note that everything which is specified as `base64url(ascii(<some JSON>))` in the JOSE specs - and which we encode as `Bytes` in the wire format - is here decoded to a `String`. We also add the `link: &Any` attribute to the `DecodedJWS`, which allows applications to easily retrieve the authenticated content. | ||
|
|
||
| Also note that, as with the encoded representation, there are two different representations; `DecodedJWE` and `DecodedJWS`. Applications can distinguish between these two branches in the same way as with the Encoded representation described above. | ||
|
|
||
| ```ipldsch | ||
| type DecodedSignature struct { | ||
| header optional {String:Any} | ||
| protected optional String | ||
| signature String | ||
| } | ||
|
|
||
| type DecodedJWS struct { | ||
| payload String | ||
| signatures [DecodedSignature] | ||
| link: &Any | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @warpfork any preference here on using
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I tend towards likely
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there's |
||
| } | ||
|
|
||
| type DecodedRecipient struct { | ||
| encrypted_key optional String | ||
| header optional {String:Any} | ||
| } | ||
|
|
||
| type DecodedJWE struct { | ||
| aad optional String | ||
| ciphertext String | ||
| iv String | ||
| protected String | ||
| recipients [DecodedRecipient] | ||
| tag String | ||
| unprotected optional {String:Any} | ||
| } | ||
| ``` | ||
|
|
||
| ## Implementations | ||
|
|
||
| - [Javascript](https://github.com/oed/js-dag-jose) | ||
| - [Go](https://github.com/alexjg/go-dag-jose) | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more remark, but this one is just for future-facing, and is not a change request or a blocker to merge this as-is:
Most of the IPLD team has become what I'll call "not bullish" about the use of Identity Multihashes in new data. It's specified, and using it in this spec is fine, but we're starting to prefer to avoid them where possible.
(There are various reasons we've become "not bullish" on Identity Multihash: because we've found that not all implementations have a great time with Identity Multihash in practice; there's a variety of underspecified corner cases around size limits in Identity Multihash (and how to manage size expectations for blocks containing anything that could be an Identity Multihash!); it's just plain a "bad penny" that keeps seeming to turn out to be leaky abstraction that generates a lot of discussion and a mysteriously large number of "tweaks" and "bugfixes" over time whenever used; and there may be a few other issues we've encountered but unfortunately not kept good notes on. None of that is unique to this PR or particularly evidenced here, but until such a time as we have a good point in our documentation to link to about this, I figure I should give at least a short sample of reasons we've become "not bullish".)
Inline CIDs have the interesting property of letting one switch codecs while still embedding data in the same block. But we're very curious if this is actually going to be used very often in practice.
In the future: I suspect it's possible that we could update this spec to describe the
payloadas being a kinded union, in which the resident data can be either a CID linking to another block, or be some other data like a map or list or plain string (or etc -- any regular data) that's simply in the same block. If we wanted to make this change in the future, it should be easy to do so: a schema with a kinded union here will match all of the existing data that's only got links in this position.Because it should be so easy to make such a change that increases the range of accepted data in the future, I'm entirely fine with deferring any more discussion of this. Just wanted to make a quick mention of it, in case anyone is starting to think about this in the future and wants to know if we thought it about way back now. :)
(I think is in a similar vein to what @vmx has just discussed as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("prefer to avoid them" might be a little strong; we're actively discussing this in the IPLD weekly call right now and deciding how to describe our ambivalence is itself a topic :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing the wording about suggesting to use them for JWS. For JWE the use of inline CIDs are essential. Otherwise you wouldn't know how to interpret the decrypted bytes.