This code is not owned by EY and EY provides no warranty and disclaims any and all liability for use of this code. Users must conduct their own diligence with respect to use for their purposes and any and all usage is on an as-is basis and at your own risk.
Construct a Merkle Tree database from Ethereum logs.
Originator: iAmMichaelConnor
- Keep track of a Merkle Tree's root on-chain;
- Minimise the gas cost of adding leaves to the Tree;
- Minimise the gas cost of updating the root;
- Ensure data availability;
Leaves are submitted to a MerkleTree
smart-contract by users. Submitting multiple leaves in-bulk results in considerable gas cost savings per leaf.
Only the root
and a small frontier
of nodes is stored on-chain. New leaves are not stored on-chain; they're emitted as events.
A local merkle-tree database (off-chain) is populated with the leaves and nodes of the Tree, based on NewLeaf
events emitted by the smart-contract.
The database can then be queried, e.g. for sibling-paths in order to provide set-membership proofs.
Disclaimer: Note that this code has not yet completed a security review and therefore we strongly recommend that you do not use it in production. We take no responsibility for any loss you may incur through the use of this code.
./docker-compose.yml <-- 'startup' configuration
|
./merkle-tree/ <-- the 'main' microservice of this repo
|
config/
|
default.js <-- specify important config parameters
|
src/
|
db/ <-- database services for managing the merkle-tree mongodb
|
routes/ <-- API routes for externally accessing this microservice
|
filter-controller.js <-- Ethereum event filter
|
merkle-tree-controller.js <-- merkle-tree update & calculation controller
|
./deployer/ <-- example 'stub' microservice, demonstrating how one would use
'timber' as part of an application
|
contracts/ <-- example MerkleTree.sol contracts for efficient merkle-tree updates
The merkle-tree miscroservice requires the following software to run:
- Docker
- Node.js (tested with node 10.15.3) with npm and node-gyp.
- Xcode Command line tools:
- If running macOS, install Xcode then run
xcode-select --install
to install command line tools.
- If running macOS, install Xcode then run
Clone the merkle-tree repository and use a terminal to enter the directory.
Start Docker:
- On Mac, open Docker.app.
If you have pulled new changes from the repo, then first run:
docker-compose build
Start all microservices:
docker-compose up
It's up and running!
The logging level in Timber can be set to any of: error; warn; info; http; verbose; debug; silly (default: info), via the environment variable LOG_LEVEL
.
Submit loads of leaves to the MerkleTree.sol
smart contract. We can do this easily in another terminal window:
docker-compose run --rm deployer npx mocha --exit --require @babel/register 'test/MerkleTreeController.test.js'
will submit many leaves.
Interact with the merkle-tree microservice through its API. A postman collection (for local testing) is provided at ./merkle-tree/test/postman-collections/.
Send a post
request to http://localhost:9000/start
to start the merkle-tree's event filters for NewLeaf
and NewLeaves
events. Any 'new leaf' events will be picked up by the filters, and cause the new leaf data to be inserted into the mongodb.
You can ask Timber to start automatically by setting the environment variable AUTOSTART
to anything that is truthy. This is useful when you are deploying another Timber instance to an already-running ZKP solution. In this case, Timber will take contract (name, address, treeId) from its config file. Depending on your configuration, not all of these may be needed. Often Timber can infer the contract address from build artefacts.
If AUTOSTART
is used, it will attempt to obtain a contract address and will try to do this AUTOSTART_RETRIES
(default 20) times, every three seconds, before giving up. This is useful if the contracts are not deployed before Timber is started.
Send a patch
request to http://localhost:9000/update
. Given the leaves now stored in the mongodb, this update
command will calculate all of the intermediate nodes from the leaves to the root, and store all of these nodes in the mongodb.
Send a get
request to http://localhost:9000/siblingPath/3
. This will retrieve from the mongodb the sibling-path for the leaf with leafIndex = 3
.
If you want to close the application, make sure to stop containers and remove containers, networks, volumes, and images which were created by up
, using
docker-compose down -v
See the deployment README for unit tests.
There are several startup options which we can specify for Timber.
Some are specified in Timber's docker-compose.<...>.yml
file; others are specified in the config file mounted to app/config/default.js
.
We provide four example docker-compose.<...>.yml
files to demonstrate the four start-up options (based on CONTRACT_ORIGIN
).
Timber relies on being 'fed' details about the contracts it should subscribe to. When we POST to the /start
endpoint of Timber, it will search for contract details at the CONTRACT_ORIGIN
location we have specified.
Importantly, Timber expects smart contract interface json filesin /app/build/contracts
to have the following format:
{
abi: "...",
evm:
bytecode:
object: "0x..."
}
0x/sol-compiler
outputs JSON in this format:
{
compiledOutput:
abi: "...",
evm:
bytecode:
object: "0x..."
}
truffle compile
outputs JSON in this format:
{
abi: "...",
bytecode: "0x...."
}
Timber uses the abi
field but does not currently use bytecode
, so truffle's format works without any translation.
We use the CONTRACT_ORIGIN
environment variable to specify one of four options:
Timber will look for solc-compiled contract interface json files in /app/build/contracts
. You'll need to mount the relevant interface json files to this location.
Timber will GET
the solc-compiled contract interface json files from some external container. We call this external container 'deployer' (because, presumably it has deployed the contracts).
You will need to specify two extra environment variables:
DEPLOYER_HOST: http://<host name of deployer>
DEPLOYER_PORT: 80
Timber will get the solc-compiled contract interface json files from a mongodb collection.
If no solc-compiled json files are available, then we'll need Timber to compile the contracts itself (using solc) from solidity files. You'll need to mount the relevant solidity files (including dependencies) to app/contracts/
. Timber will then compile the .sol
files itself, and store the contract interface json files in app/build/contracts
.
For hashing 'up the merkle tree', there are currently two supported hash types:
sha
or mimc
. You can also choose which elliptic curve mimc
hashing is computed over. The 'default' option (when no CURVE
is defined in the startup options) is ALT_BN_254.
HASH_TYPE | Algorithm | Curve | Cost* (gas)) | Proving time | Constraints** |
---|---|---|---|---|---|
sha |
sha256 | n/a | ~90k | slow | ~25,000 |
mimc |
MiMC-p/p | ALT_BN_254 | ~1.4m | fast | ~740 |
mimc |
MiMC-p/p | BLS12_377 | ~1.8m | fast | ~1000 |
*cost of hashing 32 times up a merkle tree. **for a single hash of two inputs
Many options can be specified in the config file mounted to app/config/default.js
. Some important options are highlighted below:
Specify which contracts, events, and event parameters to subscribe to. E.g.:
// contracts to filter:
contracts: {
// contract name:
MerkleTreeControllerSHA: {
events: {
// filter for the following event names:
NewLeaf: {
// filter for these event parameters:
parameters: ['leafIndex', 'leafValue'],
},
NewLeaves: {
// filter for these event parameters:
parameters: ['minLeafIndex', 'leafValues'],
},
},
},
},
Note: Requests to the API must specify the contractName
in the body of the request.
By specifying the contractName
, Timber will know: which contract to interact with; and which db collection to get from, insert to, or update.
E.g. "contractName": "MerkleTreeControllerSHA"
The structure above assumes you have one tree per contract. If your application requires multiple trees per contract, you can specify a treeId
(and height) for each. This tells Timber to create a separate db collection for each id.
E.g.:
// contracts to filter:
contracts: {
// contract name:
MultipleMerkleTreesControllerSHA: {
treeId: {
a: {
treeHeight: 16,
events: {
// filter for the following event names:
NewLeafA: {
// filter for these event parameters:
parameters: ['leafIndex', 'leafValue'],
},
NewLeavesA: {
// filter for these event parameters:
parameters: ['minLeafIndex', 'leafValues'],
},
},
},
b: {
treeHeight: 10,
events: {
// filter for the following event names:
NewLeafB: {
// filter for these event parameters:
parameters: ['leafIndex', 'leafValue'],
},
NewLeavesB: {
// filter for these event parameters:
parameters: ['minLeafIndex', 'leafValues'],
},
},
},
},
},
},
Note: If the contract has multiple trees, all requests to the API must specify the treeId
in the body of the request.
E.g. { "contractName": "MultipleMerkleTreesController", "treeId": "a" }
If you submit a request to a contract with multiple trees without specifying treeId
, Timber will throw an error. This is because it does not know which tree to direct the request to.
The event names must differ between trees so Timber knows which db collection to add the new leaves to. As long as the first event adds single leaves and the second adds multiple leaves, they can have any structure you like.
E.g. in config:
events: {
// filter for the following event names:
NewLeaf: {
// filter for these event parameters adding a single leaf:
parameters: ['leafIndex', 'leafValue'],
},
NewLeaves: {
// filter for these event parameters adding multiple leaves:
parameters: ['minLeafIndex', 'leafValues'],
},
},
By specifying the contractName
and treeId
, Timber will know: which contract to interact with; and which db collection to get from, insert to, or update.
The height of the merkle tree.
Note that this height must match the height specified elsewhere in your application (e.g. the tree height in the merkle tree contract, or inside any zk-snark circuits).
If you have multiple merkle trees in one smart contract, be sure to define each tree's height under treeId
in the config.
The merkle-tree
container (or 'service') exposes several useful endpoints.
If the microservices are started with the default ./docker-compose.yml
file, these endpoints can be accessed by other containers on the same docker network through http://merkle-tree:80.
To access the merkle-tree
service from your local machine (which is not in the docker network), use http://localhost:9000 by default.
A postman collection (for local testing) is provided at ./merkle-tree/test/postman-collections/.
Note: Requests to the API must specify the contractName
in the body of the request.
By specifying the contractName
, Timber will know: which contract to interact with; and which db collection to get from, insert to, or update.
E.g. "contractName": "MerkleTreeControllerSHA"
See ./merkle-tree/src/routes
for all api routes.
For interacting 'broadly' with Timber:
Starts the event filter. Ensure to pass the relevant contractName
in the req.body. If you have multiple trees in one contract, specify which treeId
you want to start the filter for.
Updates the entire merkle-tree db.
Once started (via /start
), Timber will be listening to contract events for new leaves. These leaves will be stored in the db, but the nodes of the db (i.e. tree data 'above' the leaves) won't be updated automatically, because that would be a waste of computation (node data would be constantly overwritten with each new leaf). Any time we want to GET up-to-date node information from the db, we must first call /update
in order to update all nodes of the tree, based on the current set of leaves in the tree.
Note: when calling the /path
or /siblingPath
getters, an /update
is done automatically, so that the path being returned is up-to-date.
Get a siblingPath (an array of nodes) for a particular leafIndex. This endpoint will be used frequently, for merkle inclusion proofs.
Get a path (an array of nodes) for a particular leafIndex.
There are several /metadata/
endpoints for getting/setting data about the tree. E.g. contractAddress
, contractInterface
, latestLeaf
, latestRecalculation
.
For getting/inserting/updating information relating to leaves of the tree.
GET
/leaf/index/:leafIndex
Get a leaf by leafIndex/leaf/index
Get a leaf by leafIndex/leaf/value
Get a leaf by leafValue
Note that the latter two are not restful endpoints and the leafIndex or the leafValue should be passed in as GET query parameters {contractName, value}
POST
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of leaves into Timber's merkle-tree db should only be done via the event subscriptions to MerkleTree smart contract(s). (See the
/start
endpoint for starting an event subscription). /leaf
Insert a leaf into the merkle-tree db.
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of leaves into Timber's merkle-tree db should only be done via the event subscriptions to MerkleTree smart contract(s). (See the
GET
/leaves
Get information about multiple leaves at once. There are several options, specified through the request body (DEPRECATED) or via query parameters:{ "leafIndices": [10, 15, 100000, 10000000] }
Specify a selection of leaves, by their leafIndices.{ "values": ["0x1234", "hello", "0x12345678"] }
Specify a selection of leaves, by their values.{ "minIndex": 10, "maxIndex": 1000 }
Specify a range of leaves, by the lower and upper bounds of the index range.{}
Get information about all leaves.
/leaves/check
Run some broad checks on the leaves of the tree, to check for corrupted filtering, or leaf tracking./leaves/count
Get the number of leaves currently in the merkle-tree mongodb.
POST
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of leaves into Timber's merkle-tree db should only be done via the event subscriptions to MerkleTree smart contract(s). (See the
/start
endpoint for starting an event subscription). /leaves
Insert multiple leaves into the merkle-tree db.
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of leaves into Timber's merkle-tree db should only be done via the event subscriptions to MerkleTree smart contract(s). (See the
For getting/inserting/updating information relating to nodes of the tree. A leaf is a special type of node (in that it has no children).
GET
/node/index/:nodeIndex
Get a node by nodeIndex/node/index
Get a node by nodeIndex/node/value
Get a node by nodeValue
POST
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of nodes into Timber's merkle-tree db should only be done via the
/update
endpoint. /node
Insert a node into the merkle-tree db.
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of nodes into Timber's merkle-tree db should only be done via the
PATCH
- WARNING: these PATCH requests were built for quick db testing only. In practice, updates of nodes of Timber's merkle-tree db should only be done via the
/update
endpoint. /node
Update the nodes of the tree (see/update
for a more detailed explanation).
- WARNING: these PATCH requests were built for quick db testing only. In practice, updates of nodes of Timber's merkle-tree db should only be done via the
GET
/nodes
Get information about multiple nodes at once. There are several options, specified through the request body (DEPRECATED) or via query parameters:{ "nodeIndices": [10, 15, 100000, 10000000] }
Specify a selection of nodes, by their nodeIndices (can include leaves).{ "values": ["0x1234", "hello", "0x12345678"] }
Specify a selection of nodes, by their values (can include leaves).{ "minIndex": 10, "maxIndex": 1000 }
Specify a range of nodes, by the lower and upper bounds of the index range (can include leaves).{}
Get information about all nodes (includes leaves).
/nodes/count
Get the number of nodes currently in the merkle-tree mongodb (includes leaves).
POST
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of nodes into Timber's merkle-tree db should only be done via the
/update
endpoint. /nodes
Insert multiple nodes into the merkle-tree db.
- WARNING: these POST requests were built for quick db testing only. In practice, insertions of nodes into Timber's merkle-tree db should only be done via the
PATCH
- WARNING: these PATCH requests were built for quick db testing only. In practice, updates of nodes of Timber's merkle-tree db should only be done via the
/update
endpoint. /nodes
Update the nodes of the tree (see/update
for a more detailed explanation).
- WARNING: these PATCH requests were built for quick db testing only. In practice, updates of nodes of Timber's merkle-tree db should only be done via the
GET
/root
Get the root of the tree.
The following gives gas cost measurements for inserting leaves into the MerkleTree.sol contract.
treeHeight = 32
- 32 sha256() hashes. We use assembly to minimise the cost of calling the sha256 precompiled contract (if HASH_TYPE = mimc, the gas cost will increase).
- 1,024 leaves inserted, one-at-a-time, from left to right.
- Notice how there is a jump in cost every
2**n
leaves, when a new level of thefrontier
is written to for the first time. - The very first transaction costs the most, due to initialising of the
leafCount
andlatestRoot
parameters. - Gas values shown include the 21KGas transaction cost.
-
treeHeight = 32
-
We explore the cost of inserting leaves in batches of varying sizes (doubling the batch size each time).
-
We insert each batch into an empty tree each time.
-
Gas values shown include the 21KGas transaction cost each time.
-
The Gas cost per leaf reduces asymptotically with batch size.
-
Batches of 128 leaves and over appear to start levelling out around 10,000 gas per leaf.
- We also explore the cost of inserting a fixed total of 512 leaves, but in batches of varying sizes (doubling the batch size each time).
- I.e.:
- 512 transactions of batches of 1 leaf
- 256 transactions of batches of 2 leaves
- 64 transactions of batches of 4 leaves ...
- 1 transaction of a batch of 512 leaves
- We begin each set of transactions with an empty tree each time.
- Gas values shown have been adjusted to exclude the 21KGas transaction cost each time. This helps hone in on the 'internal contract' gas costs, but massively understates the costs of multiple transactions of small batches (e.g. we're understating the cost of '512 transactions of batches of 1' by over 10MGas).
- The gas cost for inserting a given number of leaves reduces asymptotically as batch size increases.
- Beyond batches of 128 leaves, the gas savings begin to level out.
On-chain
The leaves of the tree are not stored on-chain, they're emitted as events.
The intermediate-nodes of the tree (between the leaves and the root) are not stored on-chain.
Only a 'frontier' is stored on-chain (see the detailed explanation below).
Off-chain
We filter the blockchain for NewLeaf
event emissions, which contain:
NewLeafEvent: {
leafIndex: 1234,
leafValue: '0xacb5678',
root: '0xdef9012'
}
We then insert each leaf into a local mongodb database.
With this database, we can reconstruct the entire merkle tree.
We consistently use the following indexing throughout the codebase:
Level Row nodeIndices frontier indices
4 0 0 [4,
/ \
3 1 1 2 3,
/ \ / \
2 2 3 4 5 6 2,
/ \ / \ / \ / \
1 3 7 8 9 10 11 12 13 14 1,
/ \ / \ / \ / \ / \ / \ / \ / \
0 4 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0]
leafIndices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We start with an empty frontier = [ , , , , ]
.
The frontier will represent the right-most fixed nodeValues at each level of the tree. So frontier[0]
will be the right-most fixed nodeValue at level 0
, and so on up the tree. By 'fixed nodeValue', we mean that the nodeValue
will never again change; it is permanently fixed regardless of future leaf appends.
A user submits the 0th leaf (leafIndex = 0
) to the MerkleTree
smart contract.
We add it to leafIndex = 0
(nodeIndex = 15
) in the contract's local stack (but not to persistent storage, because we can more cheaply emit this leaf's data as an event).
Let's provide a visualisation:
// Inserting a leaf with nodeValue = '15.0' into the tree
nodeValues frontier
0 [ ,
/ \
0 0 ,
/ \ / \
0 0 0 0 ,
/ \ / \ / \ / \
0 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
We use the unusual notation 15.0
to mean "the nodeValue
from the 0th update of nodeIndex 15
".
We now need to hash up the merkle tree to update the root. In order to do this, we need the nodeValues
of the sibling-path of leafIndex = 0
.
The 0th leaf is an easy case where the sibling-path nodes are always to the right of the leaf's path:
// Showing the sibling-path of leafIndex = 0
nodeValues frontier
0 [ ,
/ \
0 *0* ,
/ \ / \
0 *0* 0 0 ,
/ \ / \ / \ / \
0 *0* 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 *0* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
*0* = "sibling-path nodeValue"
Hashing up the tree is easy in this case; if a sibling-node is to the right, then it must never have been updated before, and hence must have nodeValue
0
.
So it's easy to update the tree:
// Hashing computation to update the root for a newly inserted leaf at leafIndex = 0
frontier nodeValue = hash ( left input , right input ) zeros
[ , 7.0 = hash ( 15.0 , 0 ) <-- 0
, 3.0 = hash ( 7.0 , 0 ) <-- 0
, 1.0 = hash ( 3.0 , 0 ) <-- 0
, 0.0 = hash ( 1.0 , 0 ) <-- 0
]
^ these are the values in our recalculated path from leafIndex = 0
We will only use the frontier
to inject sibling-nodes which are to the left of a leaf's path. More on that later.
Our updated tree can be visualised like this:
// Updating the path from leafIndex = 0 to the root:
nodeValues frontier
0.0 [ ,
/ \
1.0 0 ,
/ \ / \
3.0 0 0 0 ,
/ \ / \ / \ / \
7.0 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
By 7.0
, we mean "the nodeValue
from the 0th update of nodeIndex 7
", etc.
Note that nothing has yet been stored in persistent storage on-chain.
Notice now, that the nodeValue
15.0
will never change in future. Notice also that when we come to insert a new leaf to leafIndex = 1
, its sister-path will include nodeValue
15.0
on its left. Therefore, when we come to update the root to include the new leaf, we will need to left-inject nodeValue
15.0
into our hashing computation.
Now the purpose of the frontier
starts to become clear. We will add nodeValue
15.0
to frontier[0]
(persistent storage), so that we can later left-inject it into our hashing computation when we come to insert leafIndex = 1
.
// Adding a nodeValue to frontier[0]
nodeValues frontier
0.0 [ ,
/ \
1.0 0 ,
/ \ / \
3.0 0 0 0 ,
/ \ / \ / \ / \
7.0 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --> 15.0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
The smart contract emits the leaf value as an event NewLeaf
, which is then picked-up by timber
's event listener and added to the mongodb.
That completes the insertion of leaf 0.
Let's add some more leaves (always appending them from left to right):
A user submits the 1th leaf (leafIndex = 1
) to the MerkleTree
smart contract.
We add it to leafIndex = 1
(nodeIndex = 16
) in the contract's local stack (but not to persistent storage, because we can more cheaply emit this leaf's data as an event).
Let's provide a visualisation:
// Inserting a leaf with nodeValue = '16.0' into the tree
nodeValues frontier
0.0 [ ,
/ \
1.0 0 ,
/ \ / \
3.0 0 0 0 ,
/ \ / \ / \ / \
7.0 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 16.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15.0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
Note, we haven't yet recalculated the path from leafIndex = 1 (nodeValue = 16.0) to the root.
The above visualisation is misleading, because most of this data wasn't stored in persistent storage. In actual fact all the smart contract knows is:
// Data known by the smart contract (or implied, in the case of all the 0's):
nodeValues frontier
? [ ,
/ \
? 0 ,
/ \ / \
? 0 0 0 ,
/ \ / \ / \ / \
? 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
? 16.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15.0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
In order to insert nodeValue
16.0
into the tree and update its path, we will need the nodeValues
of the sibling-path of leafIndex = 1
, and we will also need to know whether those sibling-nodes are on the left or the right of the path up the tree:
// Hashing computation to update the root for a newly inserted leaf at leafIndex = 1
frontier nodeValue = hash ( left input , right input ) zeros
[15.0, 7.1 = hash ( ? , ? ) 0
, 3.1 = hash ( ? , ? ) 0
, 1.1 = hash ( ? , ? ) 0
, 0.1 = hash ( ? , ? ) 0
]
^ these are the values in our recalculated path from leafIndex = 1
We can actually deduce the 'left-ness' or 'right-ness' of a leaf's path up the tree from the binary representation the leaf's leafIndex:
binary leafIndices
*
/ \
>> 0 1
/ \ / \
>> 00 01 10 11
/ \ / \ / \ / \
>> 000 001 010 011 100 101 110 111
/ \ / \ / \ / \ / \ / \ / \ / \
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
Notice how for leafIndex = 1 = 0b0001
the path is on the right, left, left, left as we work up the tree? (Associate a binary 1
with right
and a binary 0
with left
, and you'll see a pattern for the 'left-ness' or 'right-ness' of the path up the tree from a particular leafIndex):
// Path up the tree
nodeValues
root
/ \
left S
/ \ / \
left S 0 0
/ \ / \ / \ / \
left S 0 0 0 0 0 0
/ \ / \ / \ / \ / \ / \ / \ / \
S right
0b0001 = "left <- left <- left <- right"
S = "sibling-node"
Now we can hash up the tree, by injecting the sister-path to the opposing 'left, right, right, right' positions (as indicated by the arrows below):
// Hashing computation to update the root for a newly inserted leaf at leafIndex = 1
frontier nodeValue = hash ( left input , right input ) zeros
[15.0, --> 7.1 = hash ( 15.0 , 16.0 ) 0
, 3.1 = hash ( 7.1 , 0 ) <-- 0
, 1.1 = hash ( 3.1 , 0 ) <-- 0
, 0.1 = hash ( 1.1 , 0 ) <-- 0
]
^ these are the values in our recalculated path from leafIndex = 1
We can visualise the tree after updating the path (but remember the smart contract isn't actually storing anything except the frontier!):
// Inserting a leaf with nodeValue = '16.0' into the tree
nodeValues frontier
0.1 [ ,
/ \
1.1 0 ,
/ \ / \
3.1 0 0 0 ,
/ \ / \ / \ / \
7.1 0 0 0 0 0 0 0 ,
/ \ / \ / \ / \ / \ / \ / \ / \
15.0 16.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15.0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <-- leafIndices
Now we've udpated the tree, how do we decide which nodeValue to add to the frontier
?
We use the following algorithm to decide which index (or storage 'slot') of the frontier
to update:
// Which level of the path to store in the frontier, after adding each leafIndex:
// pseudocode:
- After adding leafIndex = l and recalculating its path to the root:
if l == 0 (mod 2) return 0; // store the 0th level of the path in frontier[0].
if l + 1 - 2**1 == 0 (mod 2**2) return 1; // store the 1st level of the path in frontier[1].
if l + 1 - 2**2 == 0 (mod 2**3) return 2; // store the 2nd level of the path in frontier[2].
if l + 1 - 2**3 == 0 (mod 2**4) return 3; // store the 3rd level of the path in frontier[3].
...
// solidity:
function getFrontierSlot(uint leafIndex) private pure returns (uint slot) {
slot = 0;
if ( leafIndex % 2 == 1 ) {
uint exp1 = 1;
uint pow1 = 2;
uint pow2 = pow1 << 1;
while (slot == 0) {
if ( (leafIndex + 1 - pow1) % pow2 == 0 ) {
slot = exp1;
} else {
pow1 = pow2;
pow2 = pow2 << 1;
exp1++;
}
}
}
}
// Which level of the path to store in the frontier, after adding each leafIndex:
level nodeIndices frontier
4 15-> [4,
/ \
3 7-> . 3,
/ \ / \
2 3-> . 11-> . 2,
/ \ / \ / \ / \
1 1-> . 5-> . 9-> . 13-> . 1,
/ \ / \ / \ / \ / \ / \ / \ / \
0 0-> . 2-> . 4-> . 6-> . 8-> . 10-> . 12-> . 14-> . 0]
leafIndices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
/*
I.e.:
- After adding leaf 0 and recalculating its path to the root, we store the 0th
level of this path in the frontier.
- After adding leaf 1 and recalculating its path, we store the 1st level of
this path in the frontier.
...
- After adding leaf 7 and recalculating its path, we store the 3rd level of this
path in the frontier.
*/
After adding the 1st leaf and updating the root, the smart contract now has stored:
// Data known by the smart contract:
frontier = [ 15.0, 7.1, , , ];
leafCount = 2;
That's very lightweight in terms of storage costs!
We can project forward and show how the frontier
will progress as new leaves are added:
After adding the 1st leaf and updating the root, the smart contract now has stored:
// Change in the frontier after successively adding leaves to the tree:
// (recall x.y means the nodeValue of nodeIndex = x after its yth overwrite)
After adding leaf 0: frontier = [ 15.0, , , , ];
After adding leaf 1: frontier = [ 15.0, 7.1, , , ];
After adding leaf 2: frontier = [ 17.0, 7.1, , , ];
After adding leaf 3: frontier = [ 17.0, 7.1, 3.3, , ];
After adding leaf 4: frontier = [ 19.0, 7.1, 3.3, , ];
After adding leaf 5: frontier = [ 19.0, 9.1, 3.3, , ];
After adding leaf 6: frontier = [ 21.0, 9.1, 3.3, , ];
After adding leaf 7: frontier = [ 21.0, 9.1, 3.3, 1.7, ];
After adding leaf 8: frontier = [ 23.0, 9.1, 3.3, 1.7, ];
...