This repository was archived by the owner on Jan 8, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 93
exploration report on encryption in IPLD #348
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
262 changes: 262 additions & 0 deletions
262
design/history/exploration-reports/2021.01-encryption.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| Encryption and IPLD, 2021 | ||
| ========================= | ||
|
|
||
| This is an exploration report about the role and relationship of encryption relating to IPLD, | ||
| gathering some thoughts and recent updates in early 2021. | ||
| It's meant to be useful as a reference piece for further discussion at this time. | ||
|
|
||
| This document takes input from tons of people. | ||
| It's written by warpfork in the immediate aftermath of close conversation with Mikeal, | ||
| but also has tons of input from Carson of Textile (via the 2021.01.11 IPLD Weekly meeting), | ||
| and also draws on other notes exchanged over time with project such as the Ceramic Network, 3Box, Peergos, Qri, and others. | ||
| (Even if you don't see your name here, it's likely you've contributed something -- | ||
| this topic has just been a long time brewing, so attributing all inspirations completely is now hard!) | ||
| Thank you to all these folks for their efforts. | ||
|
|
||
|
|
||
| Overview | ||
| -------- | ||
|
|
||
| People frequently want to implement encryption as part of decentralized systems. | ||
| So, it's no surprise that it's also frequent that people want a story for how encryption and IPLD should interact. | ||
|
|
||
| For a long time, IPLD has been agnostic to any sort of encryption. | ||
| (We've been afraid of doing a _wrong_ thing and baking it into specs.) | ||
| Instead, we've asked that people using IPLD figure out how to compose IPLD and encryption on their own. | ||
| It may now be time for this to change, as we gather lots of input from the community. | ||
|
|
||
| In this document, we're going to cover three major topic groups: | ||
|
|
||
| - 1. A proposal for encryption primitives in IPLD, and a plan for how to use multicodec indicators for encryption! | ||
| - 2. A section about usage conventions we see which have repeatedly emerged, and seem useful, and thus now seem worth identifying and creating vocabulary for. | ||
| - 3. A section for gathering notes about use cases, tradeoff notes, and general cautions about general applied cryptography. | ||
|
|
||
| Comments and feedback on each of these topic groups are welcome. | ||
|
|
||
| At the end of this document, _we will still expect IPLD to be agnostic to encryption_ in the sense that you can bring your own concepts and layer them in IPLD as you please, | ||
| but we _may_ also have some ideas for cryptographic primitives we might give some extra support and coordination for. | ||
|
|
||
|
|
||
| Encryption Codecs | ||
| ----------------- | ||
|
|
||
| The IPLD team is now considering encryption which is signaled by multicodec indicators | ||
| (and thus works anywhere CIDs are used), | ||
| and works in the natural way an IPLD codec is expected to work. | ||
|
|
||
| (This is a big change in stance. | ||
| Previously, we've considered it unclear whether codecs are the right place for this.) | ||
|
|
||
| There's a couple of details about how we expect this to work which are recent realizations, | ||
| and so this document might be nearly the first description of them: | ||
|
|
||
|
|
||
| ### encryption codecs use multicodec indicators | ||
|
|
||
| As stated in the summary above: encryption will use multicodec indicators. | ||
|
|
||
| This means we'll reserve new numbers in the multicodec table. | ||
| We'll expect to see values like "AES-GCM" appear in the same table as "DAG-CBOR". | ||
|
|
||
|
|
||
| ### encryption codecs are still codecs of the usual contracts | ||
|
|
||
| Codecs which do encryption will look like regular IPLD codecs. | ||
|
|
||
| What does this mean? Well, in our recent improvements to formalizations, we now describe a codec as | ||
| the operation "decode" -- `function (rawByteStream) -> (ipldDataModelNode | error)` -- | ||
| and the operation "encode" -- `function (ipldDataModelNode, writeableBytestream) -> (error)`. | ||
| (Loosely. This is psuedocode, not any particular programming language.) | ||
|
|
||
| (Okay, what did _that_ mean? ;) ...I'll do it again in plain language.) | ||
|
|
||
| The key detail that is important for IPLD's soundness is: | ||
| the encoded data stream must be transformable to a "node" -- which must be describable _entirely_ and _purely_ by the IPLD Data Model -- | ||
| and then back again, from that "node" to an encoded data stream. | ||
|
|
||
| Okay, background established. Now: why does this matter to encryption? | ||
|
|
||
| Two reasons: | ||
|
|
||
| - that contract means *no additional parameters* are allowed. So, for encryption, it means keys don't -- *can't* -- enter into this yet. | ||
| - that contract means we always have to be able to transform the encoded form into *something*. | ||
|
|
||
|
|
||
| #### encryption codecs are defined as destructuring ciphertext | ||
|
|
||
| ... *not* as yielding cleartext. This may be unintuitive, but is important. | ||
|
|
||
| First, an example: many encryption schemes have two components in their ciphertext: | ||
| some sort of "initialization vector" (commonly known as an "IV"), which is a number unique to that ciphertext; | ||
| and then the ciphertext body itself. | ||
| So, for such an encryption scheme, the relevant IPLD codec would probably produce a _map_, matching this schema: | ||
|
|
||
| ```ipldsch | ||
| type CodecResult struct { | ||
| iv Bytes | ||
| body Bytes | ||
| } representation map | ||
| ``` | ||
|
|
||
| (The actual serial form may look like anything it wants (likely, some binary length-prefixed format), | ||
| because that's the responsibility of the codec implementation to define; | ||
| this small schema just describes the Data Model view we might expect to be yielded.) | ||
|
|
||
| This is neat in several ways: | ||
|
|
||
| - It means that processing the data into Data Model is always *defined* -- even if you don't have key material. | ||
| - This in turn means IPLD Selectors, and all sorts of other stuff, *work normally* over encrypted data. | ||
| (Not over the cleartext, obviously -- then the encryption wouldn't be doing much, would it? | ||
| But their operation is *defined*, so they can be used safely and predictably.) | ||
| - It means we have a way to access the ciphertext. | ||
| - ... That may not sound like a big deal, but it's been a weird and interesting buggaboo in a lot of other previous proposals about how to fit encryption into IPLD. | ||
| - It means we don't have to solve the problem of how to get key material into a codec. | ||
| - This is a big deal because it means, well, a bunch of our abstraction layers in IPLD don't... uh, shatter. Good. | ||
|
|
||
| Okay, but how do we get to cleartext then? Let us proceed to the next section! | ||
|
|
||
| #### getting to cleartext when using encryption codecs involves feature detection | ||
|
|
||
| Encryption codecs in IPLD libraries will have extra methods on them, and support some kind of "feature detection" to advertise this. | ||
| Those additional methods will accept key material as a parameter, and return an IPLD Data Model Node... of the *cleartext*. | ||
| (E.g., the "node" returned here, and the "node" returned by the codec alone, will be *very* different data.) | ||
|
|
||
| How exactly this looks will vary by langauge and library implementation; | ||
| different languages will have different idioms for doing feature detection. | ||
|
|
||
|
|
||
| ### key management is out of band | ||
|
|
||
| Keys still need to be supplied to the encrypt and decrypt methods of an encryption codec when they are used. | ||
| This key supply and management is something that must be handled "out of band". | ||
|
|
||
| We don't have a total strategy for automatic application of keys in large graphs. | ||
| And we probably won't, either. | ||
| We expect that most applications using cryptography will have some key management strategy that is unique to them, | ||
| and will probably _not_ want their IPLD library dictating anything about key management. | ||
| (For example, many complex applications using cryptography may involve key derivation strategies, | ||
| which can even be content or data-organization aware -- we cannot possibly specify such things in IPLD; we need to just accept instructions on that.) | ||
|
|
||
| IPLD will be open to future work on library functions for how to handle key management in practice. | ||
| If we can find sufficiently common patterns, they may be worth library features. | ||
| However, we should be comfortable understanding that there may actually not be single answers to key management, | ||
| and the number of features relating to it that belong in IPLD libraries might be correspondingly minimal. | ||
|
|
||
|
|
||
| ### encryption codecs can be used recursively | ||
|
|
||
| TODO (this emerges fairly naturally but deserves comment and example) | ||
|
|
||
|
|
||
| ### limitations of this approach | ||
|
|
||
| #### double hashing | ||
|
|
||
| This approach is roughly "mac-then-encrypt-then-mac" (if you're from the era of crypto education which called things "MAC" rather than "MIC" (which would make much more sense (but, I digress))). | ||
|
|
||
| In other words: we hash things twice, and one of the hashes ends up in the output data body (because it's inside the ciphertext). | ||
|
|
||
| There's nothing wrong with this (it's certainly cryptographically sound!); it's just slightly excessive and does spend a few bytes. | ||
|
|
||
| #### selectors don't work over cleartext | ||
|
|
||
| Because these codecs don't immediately yield cleartext, selectors applied to data yielded from these codecs won't be working on the cleartext either. | ||
|
|
||
| _However_, this isn't necessarily a big problem, and we actually have a good remedy available: | ||
| _ADLs could still be composed with this approach._ | ||
| An ADL could handle the key management issues, | ||
| use the codec which is only yielding ciphertext internally (e.g. these nodes would be the ADL "substrate"), | ||
| apply the decryption, and then yield the cleartext as the ADL's output. | ||
| Traversals, selectors, and all the other goodies that are expected to work on IPLD Data Model Nodes could then continue to work upon this data. | ||
| (The key management problem has merely been pushed around, arguably, but critically, it's been pushed out of the area from where multicodec constraints made it unsolvable.) | ||
|
|
||
|
|
||
|
|
||
| Conventions and Usage Patterns around Encryption in IPLD | ||
| --------------------------------------------------------- | ||
|
|
||
| General notice: there are not single solutions to how to compose crypto systems. | ||
| Many tradeoffs exist in design of applications using encryption. | ||
| In some situations, metadata and size and access pattern concealment don't matter; | ||
| in others, they're critically important, and an infinite amount of performance penalty is an acceptable trade. | ||
| We can't make these decisions for applications. | ||
| In this document, we'll limit our scope to talking about patterns that we've seen, | ||
| and building some vocabulary around them, and sharing the ideas that seem to have good results. | ||
|
|
||
|
|
||
| ### Desirable traits | ||
|
|
||
| Some frequently identified desires when working with encrypted data include: | ||
|
|
||
| - ability to use "pinning" services without special integrations or disclosure of key material | ||
| - ability to tersely identify subtrees, e.g. for purposes such as network transfer | ||
|
|
||
| These are things which are well-provided for when using IPLD without encryption, | ||
| but require some additional design when using IPLD with encryption, since the link structure of documents is generally itself encrypted. | ||
|
|
||
| Mind: these goals are complicated: if they didn't require information that is _encrypted_, they wouldn't be worth special mention in the first place. | ||
| It's very important to be sure you also consider the [cryptography caveats](#introduction-to-cryptography-caveats) when working with these goals. | ||
|
|
||
|
|
||
| ### Pattern: Cleartext Manifest over CIDs of Encrypted Data | ||
|
|
||
| Key concepts: | ||
|
|
||
| - All content is encrypted at block level (using the systems described in the [Encryption Codecs](#encryption-codecs) section) (so, we have a set of CIDs, all of which have a multicodec indicator that indicates some kind of encryption codec). | ||
| - We still want to be able to pin the whole set, or fetch the whole set using one query. | ||
|
|
||
| The solution to this is pretty clear: we want some merkle tree of cleartext IPLD objects, and that tree should just link to the encrypted data CIDs. | ||
|
|
||
| #### manifest tree structure can be any form | ||
|
|
||
| An interesting trait of the manifest pattern is that to provide its key benefits -- | ||
| e.g. being able to refer to the whole set of data at once -- | ||
| it doesn't actually _matter_ exactly _what_ tree structure or layout algorithm is used. | ||
|
|
||
| HAMTs or Chunky Trees can both be used; or for small enough data, a plain map in a single block. | ||
| Anything that reaches the goals works; there's little or no need to standardize on this. | ||
|
|
||
|
|
||
|
|
||
| introduction to cryptography caveats | ||
| ------------------------------------ | ||
|
|
||
| Designing cryptographic systems is _tricky_ -- to put it mildly. | ||
|
|
||
| We can't always offer complete systems and complete guidance to cryptographic work. | ||
| What we can do in IPLD is offer some components and, sometimes, some patterns of suggested use. | ||
| How to put those things together (and do so safely) is still fundamentally the responsibility of the application developer. | ||
|
|
||
| We also can't provide a complete introduction and set of coursework on how to compose cryptographic systems! | ||
| Those are educational resources you'll need to find access to elsewhere if you haven't gotten it already. | ||
|
|
||
| With all those caveats made, though, we'd like to offer a few pointers to topics you should at least be aware of. | ||
| These topics are also especially relevant to the combination of encryption and IPLD because of how they involve tradeoffs | ||
| (and, some of those tradeoffs are things that inform *why* we don't move certain kinds of features into IPLD specs -- it's because there's more than one way to go about it). | ||
|
|
||
| ### access patterns of ciphertext can leak hints about the cleartext | ||
|
|
||
| // more description of this would be welcome | ||
|
|
||
| ### size of ciphertext may leak hints about the cleartext | ||
|
|
||
| // more description of this would be welcome | ||
|
|
||
| ### these are example headings, not an exhaustive list | ||
|
|
||
| // it's unclear how much we should offer a primer in cryptography | ||
|
|
||
|
|
||
|
|
||
| Postscript: What Actually Happened | ||
| ---------------------------------- | ||
|
|
||
| The conversation about encryption and its relationship to IPLD is probably still not finished | ||
| (but this document is, because as an exploration report, at some point, we call it done; and if the conversation continues, it'll be with a new document). | ||
|
|
||
| Encryption discussion is still ongoing in PRs: | ||
| in particular, in https://github.com/ipld/specs/pull/349#issuecomment-763901167 it seems we may be backing away from making multicodec indicators do double-duty, | ||
| and instead using a single multicodec indicator to describe a codec that handles the ciphertext in a standard way, | ||
| then creating a new numeric 'code' field for indicating which cipher mechanism is used, and putting that 'code' field in the codec that's handling the ciphertext. | ||
|
|
||
| It will probably remain the case that there will be more than one way to go about encryption when working with IPLD. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section might want to be updated - the current approach is to have a single new
ipldcodec which encodes the cipher as a multicodec integer, so we have one new codec (for now) and a bunch of ciphers in the table.