This repository was archived by the owner on Jan 8, 2026. It is now read-only.
hashmap: differentiate serialization of string and byte keys #192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an alternative to both #180 and #184; I'd like to retire those discussions.
The current state of keys in HashMap spec: the algorithm can accept both
stringandbytesas keys and they are hashed asbytesfor the purpose of indexing (this is implied but not explicitly stated by the current spec form) and for the purpose of serialisation into block form they are stored asbytesregardless of whether you providestringorbytes.The primary problem with this approach is that we lose the ability to differentiate when we deserialise. You require context to know whether they should be used as
bytesor converted back intostring. The algorithm has to be agnostic to this so it ends up getting pushed up the application stack. In naive usage, where you don't have much context, or haven't brought that context along for the ride (perhaps you're inspecting objects through the ipld explorer), you just get byte arrays, even if you were using them as strings. I believe it's fair to say that common usage of this data structure will be as it is in most programming languages: string keys. So being able to differentiate would be nice.The proposed solution here is to (1) explicitly allow both
stringandbytesin the spec, (2) define some basic rules for how these things should be consistently hashed, and (3) serialize them as their original form. So on the block, astringkey would be stored as astring. A byte array provided as a key would be stored asbytes.Minor complications exists if you use a HashMap with both
stringandbytekeys. I don't expect this will happen much in reality, particularly in the typed languages, you should have a consistent interface (especially if such interfaces are defined through schemas where you'd hopefully do something liketype MyMap { String : Foo } representation advanced HashMap- there's your context). Implementations have to do some awkward things like: sorting buckets of mixed types requires a bit of care, checking for the existence of a key also requires care because the same key could be provided asbytesorstringand the hash would be the same but you have to make sure that "does this already exist?" works properly. IMO these should be left to the implementation for now and they should also probably carry suggestions against mixed types, which I'm doing here: https://github.com/rvagg/iamap/pull/8/files#diff-04c6e90faac2675aa89e2176d2eec7d8R244