-
Notifications
You must be signed in to change notification settings - Fork 93
feat: add dag-jose format #269
Conversation
|
@mikeal In the case of the protected header it would totally make sense to decode the Since the implementation needs to bundle decryption and verification functionality anyway it might make sense to decode all things to binary anyway. I think the most important thing to support is that any existing JOSE object should be possible to serialize using the codec, but that seems separate. Will think some more about this and make another update! |
It might be easier to just inherit the ordering rules from |
|
Since this codec requires data to be encoded in a very specific data model representation I think we need an IPLD Schema for it, similar to what we have for |
I should clarify a bit what needs to be in the spec vs what is an implementation decision. I take “any existing JOSE object” to mean “objects returned from a JOSE signing/encryption library.” Those objects are going to vary a bit in terms of structure between libraries and especially between languages. Codec implementations are free to decide what they do with these inputs and that exact behavior doesn’t need to be part of this spec. All that we really need specified are the rules for writing out the binary representation of the Block and how that Block data is then decoded into IPLD Data Model. Codecs are free to decide how to take any given input and turn it into the format specified here. For instance, the IPLD Schema for this codec should specify |
Good point. I'll make this change.
Makes sense. I'm not super familiar with IPLD Schema. Any pointers except for dag-pb?
Ah, this clarifies your point about having the payload be a separate object and link to it. Let me think about it a bit more! |
|
|
There's an authoring guide for schemas if you follow @vmx' link but there's also a smattering of examples in this repo. Do a |
|
Just updated the spec to be much more clear and well defined. Hopefully it should be much easier to interpret now. |
block-layer/codecs/dag-jose.md
Outdated
| type JOSE union { | ||
| | JWS jws | ||
| | JWE jwe | ||
| } representation kinded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit with the union doesn't quite "compile". When using a kinded representation, the only keywords permissible in the union block are kind keywords (int, map, etc), and those must also match the kind of the types they're tagging.
The JWS and JWE types are both structs, and having no representation stated, structs default to being a map representation. So kinded unions can't be applied here anyway.
You probably want one of keyed, envelope, or inline representations for this union.
I always advocate for keyed wherever possible, because tend to be the friendliest to parser performance; envelope as second choice because it's at least still very clear; and inline is my least favorite because it's just the one with the most edge cases around it by far (it's easy to see if they'll affect any given situation or not, but personally, I just prefer to make a habit of not using them).
But the choice of which of these seems suitable is up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I've written the above naively assuming that this part of the format is something you have any control over.
At any rate, I do hope we can find some clear and explicit mechanism which disambiguates which of these two types of data is in a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In discussion in another band, @oed linked me to https://www.rfc-editor.org/rfc/rfc7515.html#section-7 and https://www.rfc-editor.org/rfc/rfc7516.html#section-7 , and indicates that we do indeed have limited control over the format here if we're complying with those existing specs.
I'm writing up a new issue to describe what it seems like we might be wanting from union representations if we were to describe this stuff in IPLD Schemas: #278
However, it's also possible that we just... won't describe this top level union in IPLD Schemas, and will just have to talk about this detail (... at length) and how to disambiguate these structures in prose in this doc instead. That would be a bit unfortunate, because it's just, well, more prose... but that's okay. We can do that. We're trying to use schemas here for clarity of docs (and I'm glad we did, because it's highlighted this issue nicely), but there's no technical mandate that we have to have a schema if this is going to be implemented as a codec (nor would there be if it was implemented as an ADL, either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think in this case, for now, it's a descriptive device and should just be accompanied by docs to clarify it (either as something inline or with an inline comment pointing to something else). I had to do similar things in the Bitcoin docs I'm working on because I can't control the format but want to use Schemas as a documentation device. I have unions that we can't deal with yet so I called them "faux-unions" in the docs.
block-layer/codecs/dag-jose.md
Outdated
| unprotected optional {String:Any} | ||
| } | ||
| type GeneralJOSE union { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a discussion going on elsewhere, I think in the recently merged FlexibleByteLayout PR, that we should make sure that any entry points are listed at the top of a schema doc for clarity, something like "reverse C function definition order", i.e. list them as a top-down dependency tree where your root is where you generally expect users to enter. In this case I think GeneralJOSE is your expected root type that everything falls under, so you probably need to invert both of your schema definitions. Of course accompanying this with some descriptive text is a nice touch too but probably optional.
|
A couple of more general points of feedback:
|
|
Mostly on a similar page to @rvagg but less worried about a couple things:
|
|
A question before I go though and update the spec based on our discussion @mikeal @vmx @rvagg Right now the data that is put into the block is just serialized by stringifying the json data. Does it make sense to actually serialize this as CBOR instead? This would be a bit more space efficient and would not affect the input data of the encode function or the output of the decode function. |
|
This seems like a win win to me. |
|
I've spent a bit of time working on a javascript implementation ( I did not publish anything yet though ) and I have thought on the payload thingy: I think we should be sticking as closely to the RFC as possible, without defining "weird" ipld quirks and edge cases, especially when it comes to the payload, I'd say implementations should be free to support specific "cty" mime types such as "DAG-JSON" and apply the correct encoding and decoding as they see fit, for example, an implementation might see a JWE/JWS with the "cty" header "application/DAG-JSON" and decode it for the user ( i.e. decoding json-ified CIDs etc. ).
My quick benchmark showed that for the given JWE/JWS structure CBOR reduces the size by only 20 Bytes while being up to 5x slower than JSON ( I used borc and fast-json-stringify |
|
Hey @JonasKruckenberg! We've already got a working javascript implementation here: https://github.com/ceramicnetwork/js-dag-jose Unfortunately this spec is out of date (we will be updating it shortly). Main thing to note is that we've limited the payload to only be allowed to be a CID. Also, 20 bytes can become quite a lot if you have a lot of objects. |
|
Hey! Thanks for the quick reply, it's not working for me sadly, that's why I'm working on my own. Maybe I can look into dag-jose at some point to figure out whats wrong and create an issue. Also I don't think it should be limited to just CIDs. |
You're right about this one yeah, I didn't think about having a lot yet. |
|
@JonasKruckenberg Yeah, we are doing the same on Ceramic. The syncing issue can be mitigated in multiple ways, including DagSync in the future. A hack you could do to get around the only CID "issue" is to use an identity multihash (and thus store the whole content in the CID), this is what we will do for encryption for example. Please file an issue on the js-dag-jose repo if you are having trouble with it! Also note that we probably gain more than 20 bytes since we encode the |
|
Hey folks, I've been working on the Go implementation of this. I've updated the spec to match current implementations. Specifically I've made changes that:
|
- Specify that `payload` of JWS and plaintext of JWE ciphertext must be
CIDs
- Use `Bytes` instead of `String` {String:Any} for authenticated headers
- Add clarifying prose around purpose of the encoded and decoded schemas
- Add clarifying prose for encrypted padding
- Add Go implementation
Maybe vaguely relates to a conversation @warpfork and @willscott have been having about some funky Bytes<->String map keys that Filecoin has been doing. https://github.com/alexjg/go-dag-jose/blob/4ef144c0cad71149125cd2b5393b70cf6f35db7c/dagjose/dagjose.go#L88 I don't really know how this moves forward, but the fact that you have a go-ipld-prime implementation based on this suggests to me that it's most likely doing the "correct" things since it should be much more unforgiving than what you could do in JavaScript. |
I'm a little confused by this statement now. Protected headers (authenticated) were encoded as |
|
There is a typo in that comment. It should read "use To clarify, the original schema looked like this (just including the signature, but it's the same for anything with an authenticated header): And I've changed that to type EncodedSignature struct {
header optional {String:Any}
protected optional Bytes
signature Bytes
}So you can see that the |
|
Makes sense, thanks for clarifying! |
|
Hey all, anyone had a chance to look at this updated spec? |
| type DecodedJWS struct { | ||
| payload String | ||
| signatures [DecodedSignature] | ||
| link: &Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warpfork any preference here on using &Any vs Link since they are both effectively the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tend towards likely &Any but it's true that either is acceptable. I don't think any of our tools make a functional distinction at all, as things presently stand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's : on this line which is a syntax error, I don't suppose someone in this issue wants to PR a fix?
|
LGTM except for one comment I’d like @warpfork to weigh in on. |
block-layer/codecs/dag-jose.md
Outdated
| payload optional Bytes | ||
| } | ||
| type EncodedJOSE union { EncodedJWE | EncodedJWS } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've still got to do something about this "union". The problem discussed in the earlier comment thread still exists.
- This doesn't "compile". A union type in an IPLD Schema must have a representation strategy.
- This doesn't give me instructions on how to figure out which of these I'm dealing with. (Which is the reason that the above is true.)
The former we could shrug off and just say "this is for approximate documentation purposes and doesn't compile"... but the second keeps me up either way.
I think my proposal is actually that we just drop this type from the schema part of the document, and do it in prose instead... along with prose instructions for how to actually distinguish these.
Ideally it would be nice to have an explicit decision tree, like: "EncodedJWE is distinguished when the first key of the top level map is 'aad'; EncodedJWS is distinguished when the first key of the top level map is 'signatures'; any other first key is a parse error for DAG-JOSE data." Stating the rejection cases is something we've seen to be important to cover up front, or incompatibilities are very likely to breed in the grey areas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, your suggestion makes sense and I actually thought we had updated the doc to say this.. Will make sure to fix.
|
I left one more comment which I'm afraid is fairly blocking -- but other than that, the rest of this looks good to me. Wanna disclaim that I don't have a deep understanding of JOSE itself (and I'm definitely assuming any cryptographic semantics have been reviewed by the JOSE specifiers, and I'm not checking that in the slightest), but what I can do is review for whether this spec seems to explain it to me well enough to work with, and I think the answer to that (modulo comment) is... "yes". So that's good enough for me! |
|
Pushed an update that removes the union @warpfork 🙏 " |
|
Ah, I missed that. Mea culpa, thanks for the knock upside the head. ...(after reading)... Uuf, I have to confess I don't actually like the JWE spec here.
(Emphasis mine.) I'm not a fan of that kind of ambiguity, because ISTM it's a pretty serious bug-breeder-reactor. But... if that's the IETF RFC, hokay. I'm not gonna argue. |
|
I had a good chat with @oed about inline CIDs in regardst to this spec. To me the usage of inline CIDs should be discouraged (TODO a proper write up why). With IPLD Schemas we have other ways to solve the some of the problems that inline CIDs solve. We talked about JWS as well es JWE, so I split this comment into two parts as although have very similar solutions are still distinct problems. JWSThe IPLD Schema currently contains: And if you want to inline data, you could use an inline CID. I propose instead using a kinded union for that: An inline CID gives you an indication (via the IPLD Codec of the CID).how the inline data should be decoded. We don't want to lose that functionality. Would an IPLD Path traversing this structure still be the same as with an inline CID? Is that better than in inline CID ? Or does an inline CID exactly serve this purpose (which is really very similar, except that it contains 2 varints more that are not strictly needed)? Or should we have an DIC (direct inline content) instead which is specified as JWEJWE is encrypting data which might be a CID or inlined. You want to be able to pad the data with arbitrary bytes for cryptographic purpose. You could specify the Your decoder (let's expect a binary encoding that is using varints for now) would need to look at the first varint to determine whether the encrypted data is a CID or whether it was inlined. It leads to the same questions as for JWS, whether this is better than an inline CID. Identified problemsInline CIDs support transparent pathing even over different IPLD codecs. Can we provide something similar with the current Schemas? |
| - For JWS we specify that the `payload` property MUST be a CID, and we set the `payload` of the encoded JOSE object to `Bytes` containing the bytes of the CID. For applications where an additional network request to retrieve the linked content is undesirable then an `identity` multihash should be used. | ||
| - For JWE objects the `ciphertext` must decrypt to a cleartext which is the bytes of a CID. This is for the same reason as the `payload` being a CID, and the same approach of using an `identity` multihash can be used, and most likely will be the only way to retain the confidentiality of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more remark, but this one is just for future-facing, and is not a change request or a blocker to merge this as-is:
Most of the IPLD team has become what I'll call "not bullish" about the use of Identity Multihashes in new data. It's specified, and using it in this spec is fine, but we're starting to prefer to avoid them where possible.
(There are various reasons we've become "not bullish" on Identity Multihash: because we've found that not all implementations have a great time with Identity Multihash in practice; there's a variety of underspecified corner cases around size limits in Identity Multihash (and how to manage size expectations for blocks containing anything that could be an Identity Multihash!); it's just plain a "bad penny" that keeps seeming to turn out to be leaky abstraction that generates a lot of discussion and a mysteriously large number of "tweaks" and "bugfixes" over time whenever used; and there may be a few other issues we've encountered but unfortunately not kept good notes on. None of that is unique to this PR or particularly evidenced here, but until such a time as we have a good point in our documentation to link to about this, I figure I should give at least a short sample of reasons we've become "not bullish".)
Inline CIDs have the interesting property of letting one switch codecs while still embedding data in the same block. But we're very curious if this is actually going to be used very often in practice.
In the future: I suspect it's possible that we could update this spec to describe the payload as being a kinded union, in which the resident data can be either a CID linking to another block, or be some other data like a map or list or plain string (or etc -- any regular data) that's simply in the same block. If we wanted to make this change in the future, it should be easy to do so: a schema with a kinded union here will match all of the existing data that's only got links in this position.
Because it should be so easy to make such a change that increases the range of accepted data in the future, I'm entirely fine with deferring any more discussion of this. Just wanted to make a quick mention of it, in case anyone is starting to think about this in the future and wants to know if we thought it about way back now. :)
(I think is in a similar vein to what @vmx has just discussed as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("prefer to avoid them" might be a little strong; we're actively discussing this in the IPLD weekly call right now and deciding how to describe our ambivalence is itself a topic :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the IPLD team has become what I'll call "not bullish" about the use of Identity Multihashes in new data. It's specified, and using it in this spec is fine, but we're starting to prefer to avoid them where possible.
I'm fine with removing the wording about suggesting to use them for JWS. For JWE the use of inline CIDs are essential. Otherwise you wouldn't know how to interpret the decrypted bytes.
vmx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warpfork already described the things we talked about in yesterdays meeting (there aren't really notes, but the recording is linked and the discussion about it starts around 20:20).
To me the conclusion is. Let's go ahead with this PR as it is. We might revisit details in a non breaking matter in the future.
Thanks everyone
|
Thanks @vmx and everyone involved. It's great having this merged 🎉 |
|
Great news! Just a quick thought however, since this codec standard has potentially far reaching consequences if IPFS is to actually take off and be used by the masses we should put A LOT of care into getting this right. |
This adds a specification for
dag-josewhich is a format for signed and encrypted JSON.The
dag-josecodec will allow JWS and JWE data structures to represent IPLD dag nodes. JWS stores the payload encoded asbase64urlwhich makes it impossible to follow links in signed data without first decoding the payload. The implementation of this codec should automatically decode the payload of JWS object so that links can be followed. JWE objects can't be read before they are encrypted, however the decrypted ciphertext should still be able to contain links. The implementation of this codec should include a decryption function that outputs an object with valid IPLD links.Please let me know if you have any questions or if anything could be clarified!