From 4d89ae04fbde4b6eae0e0a0b0a3bbc743d22e56d Mon Sep 17 00:00:00 2001 From: dignifiedquire Date: Tue, 2 Jul 2019 13:41:59 +0200 Subject: [PATCH 1/3] feat: initial description of AMT --- schema-layer/data-structures/amt.md | 130 ++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 schema-layer/data-structures/amt.md diff --git a/schema-layer/data-structures/amt.md b/schema-layer/data-structures/amt.md new file mode 100644 index 00000000..a15ffb3f --- /dev/null +++ b/schema-layer/data-structures/amt.md @@ -0,0 +1,130 @@ +# Specification: ArrayMappedTrie + +**Status: Draft** + +* [Introduction](#Introduction) +* [Useful references](#Useful-references) +* [Summary](#Summary) +* [Structure](#Structure) + * [Constants](#Constants) + * [Parameters](#Parameters) + * [Node properties](#Node-properties) + * [Schema](#Schema) +* [Algorithm in detail](#Algorithm-in-detail) + * [`Get(index)`](#Getindex) + * [`Expand()`](#Expand) + * [`Set(index, value)`](#Setindex-value) + * [`Delete(index)`](#Deleteindex) + * [`Keys()`, `Values()` and `Entries()`](#Keys-Values-and-Entries) + +## Introduction + +The `SectorSet` is an integer set implemented with a simple array mapped tree. +Integer indexes range from 0 to infinity (TODO practical bounds / bounds implied by encoding?). + +## Structure + +### Constants + +- `S = 256` + +### Parameters + +- `bitWidth` +- `maxDepth` + +### Node Properties + +### Schema + +```sh +# Root node layout +type AmtRoot struct { + bitWidth UInt + maxDepth UInt + map Bytes + data [ Element ] +} + +# Non-root node layout +type AmtNode struct { + map Bytes + data [ Element ] +} + +type Element union { + | Link link + | Value value +} representation keyed + +type Value union { + | Bool bool + | String string + | Bytes bytes + | Int int + | Float float + | Map map + | List list + | Link link +} representation kinded +``` + +## Algorithm in detail + +### `Get(index)` + +Lookup takes in an integer sectorID and returns a LeafNode value if this index +is stored in the SectorSet. Each node has a `height`, a node's child has a +`height` one less than its own height and the first node has a `height` of the +root node's max depth. Leaf nodes have a height of 1. + +At each node the next child is chosen by examining the index and determining +which ordered subtree the index fits into. This can be calculated by taking +the quotient `index / S^(h - 1)`. The index for the recursive search on the +child node is set to the remainder `index % S^(h-1)` + +1. Return `RecursiveGet(index, currentHeight, rootNode)` + +#### `RecursiveGet(index, currentHeight, currentNode)` + +1. Set `childRange` to `S``currentHeight - 1` +2. Set `elementIndex` to `index / childRange` +3. If `currentHeight` is equal to `1`, return `currentNode.data[elementIndex]` +4. Return `RecursiveGet(currentHeight - 1, index % childRange, currentNode.data[elementIndex])` + +### `Set(index, value)` + +First Expand the tree as needed given the input value. + +Now run the Lookup traversal. If the traversal leads to a node at the max depth +(height of 1), then set the `Value` field at `index % childRange` to the insert value. + +If the traversal needs to resolve a pointer link but that link does not exist, +then create the remaining necessary nodes, update them to point to a path +of nodes until reaching the leaf node and set the node's pointer Value at +`index % childRange` to the insert value. + +#### `Expand()` + +As the `SectorSet` grows it becomes necessary to expand the tree to insert +values with higher indexes. When given an index `b` that exceeds the tree's +capacity, the `SectorSet` adds enough parent nodes to the node pointed to by +the root that the `SectorSet` has capacity for its existing indices and `b`. +Pointers are then updated in these new nodes so that there is a path from +the new node with the highest height to the existing node pointed to by root. +Finally the root node updates to point to the node with highest height. + +### `Delete(index)` + +Run the Lookup traversal. If the value is found delete its value from the +`Pointers` array. If the `Pointers` array is empty after this deletion then +update the parent pointer to have a nil link. Continue checking if parents +are empty of links and removing until reaching a parent that is not empty. + +### `Keys(), Values() and Entries()` + +The storage allows for efficient in order traversal, so these must be implemented. + +- `Keys()`: returns an in order iterator over all `keys`. +- `Values()`: returns an in order iterator over all `values`. +- `Entries()`: returns an in order iterator over all `(key, value)` pairs. From d6b8ac239f53417d3646fccfb2ee8304d16577a2 Mon Sep 17 00:00:00 2001 From: dignifiedquire Date: Tue, 2 Jul 2019 13:54:31 +0200 Subject: [PATCH 2/3] fixup --- schema-layer/data-structures/amt.md | 37 +++++++++++++++-------------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/schema-layer/data-structures/amt.md b/schema-layer/data-structures/amt.md index a15ffb3f..ec1446bb 100644 --- a/schema-layer/data-structures/amt.md +++ b/schema-layer/data-structures/amt.md @@ -19,8 +19,10 @@ ## Introduction -The `SectorSet` is an integer set implemented with a simple array mapped tree. -Integer indexes range from 0 to infinity (TODO practical bounds / bounds implied by encoding?). +The `AMT` is an array mapped trie, used to efficiently represent sparse sets of data. They are used in +IPLD by specifiying `{UInt:}`. So the keys must be unsigned integers and the values can be +any type. The keys are interpreted as the indicies of the values. + ## Structure @@ -73,15 +75,15 @@ type Value union { ### `Get(index)` -Lookup takes in an integer sectorID and returns a LeafNode value if this index -is stored in the SectorSet. Each node has a `height`, a node's child has a -`height` one less than its own height and the first node has a `height` of the -root node's max depth. Leaf nodes have a height of 1. +`Get` takes in an `UInt` and returns a `Value` value if this index is stored. Otherwise return and empty value(as appropriate for the implementation platform). + +Each node has a `height`, a node's child has a `height` one less than its own height and the first +node has a `height` of the root node's max depth. Leaf nodes have a height of 1. At each node the next child is chosen by examining the index and determining -which ordered subtree the index fits into. This can be calculated by taking -the quotient `index / S^(h - 1)`. The index for the recursive search on the -child node is set to the remainder `index % S^(h-1)` +which ordered subtree the index fits into. This can be calculated by taking +the quotient `index / S``(h - 1)`. The index for the recursive search on the +child node is set to the remainder `index % S``(h-1)` 1. Return `RecursiveGet(index, currentHeight, rootNode)` @@ -96,7 +98,7 @@ child node is set to the remainder `index % S^(h-1)` First Expand the tree as needed given the input value. -Now run the Lookup traversal. If the traversal leads to a node at the max depth +Now run the `Get(index)` traversal. If the traversal leads to a node at the max depth (height of 1), then set the `Value` field at `index % childRange` to the insert value. If the traversal needs to resolve a pointer link but that link does not exist, @@ -106,20 +108,19 @@ of nodes until reaching the leaf node and set the node's pointer Value at #### `Expand()` -As the `SectorSet` grows it becomes necessary to expand the tree to insert -values with higher indexes. When given an index `b` that exceeds the tree's -capacity, the `SectorSet` adds enough parent nodes to the node pointed to by -the root that the `SectorSet` has capacity for its existing indices and `b`. +As the `AMT` grows it becomes necessary to expand the tree to insert +values with higher indexes. When given an index `b` that exceeds the tree's +capacity, the `AMT` adds enough parent nodes to the node pointed to by +the root that the `AMT` has capacity for its existing indices and `b`. Pointers are then updated in these new nodes so that there is a path from the new node with the highest height to the existing node pointed to by root. Finally the root node updates to point to the node with highest height. ### `Delete(index)` -Run the Lookup traversal. If the value is found delete its value from the -`Pointers` array. If the `Pointers` array is empty after this deletion then -update the parent pointer to have a nil link. Continue checking if parents -are empty of links and removing until reaching a parent that is not empty. +1. Run the `Get` traversal. +2. If the value is found delete its value from the `data` list. + 2.1. If the `data` list is empty now, then prune links until a non empty `data` entry is reached. ### `Keys(), Values() and Entries()` From b1bfa061a1abb109608845637550841be4d3ad66 Mon Sep 17 00:00:00 2001 From: dignifiedquire Date: Tue, 2 Jul 2019 14:02:00 +0200 Subject: [PATCH 3/3] reanmes --- schema-layer/data-structures/amt.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/schema-layer/data-structures/amt.md b/schema-layer/data-structures/amt.md index ec1446bb..96f41ff4 100644 --- a/schema-layer/data-structures/amt.md +++ b/schema-layer/data-structures/amt.md @@ -26,13 +26,10 @@ any type. The keys are interpreted as the indicies of the values. ## Structure -### Constants - -- `S = 256` - ### Parameters -- `bitWidth` +- `s` +- `width` - `maxDepth` ### Node Properties @@ -42,7 +39,8 @@ any type. The keys are interpreted as the indicies of the values. ```sh # Root node layout type AmtRoot struct { - bitWidth UInt + s UInt + width UInt maxDepth UInt map Bytes data [ Element ] @@ -82,14 +80,14 @@ node has a `height` of the root node's max depth. Leaf nodes have a height of 1. At each node the next child is chosen by examining the index and determining which ordered subtree the index fits into. This can be calculated by taking -the quotient `index / S``(h - 1)`. The index for the recursive search on the -child node is set to the remainder `index % S``(h-1)` +the quotient `index / s``(h - 1)`. The index for the recursive search on the +child node is set to the remainder `index % s``(h-1)` 1. Return `RecursiveGet(index, currentHeight, rootNode)` #### `RecursiveGet(index, currentHeight, currentNode)` -1. Set `childRange` to `S``currentHeight - 1` +1. Set `childRange` to `s``currentHeight - 1` 2. Set `elementIndex` to `index / childRange` 3. If `currentHeight` is equal to `1`, return `currentNode.data[elementIndex]` 4. Return `RecursiveGet(currentHeight - 1, index % childRange, currentNode.data[elementIndex])`