Merkle Patricia Tree

Merkle Patricia Tree

Introduction

The Merkle Patricia Trie (MPT) is a fundamental data structure used in Ethereum for organizing and storing key-value pairs. It combines properties of both a Merkle Tree and a Patricia Trie to provide a cryptographically secure and efficient way to manage data.

Why Ethereum uses MPT?

Data Integrity and Security

The MPT uses cryptographic hashing to generate a root representing the whole data stored in the tree to ensure data integrity. Any modification in the data results in a change in the root hash, making tampering easily detectable.

Proof

The MPT enables efficient generation of proofs for the inclusion or exclusion of data. This is crucial for verifying transactions and states without having to download the entire blockchain.

Efficiency

The Patricia Trie component of the MPT optimizes for efficient storage and retrieval of data by compressing paths. This reduces the amount of data that needs to be stored and transmitted.
The MPT's structure supports efficient scaling as the blockchain grows, handling large volumes of data and frequent updates without significant performance degradation.

Consistency and Determinism

The MPT provides a deterministic structure, meaning the same set of data will always produce the same MPT. This is important for ensuring all nodes in the network have a consistent view of the blockchain state so that they can reach consensus.

Key Features

  1. Merkle Tree Properties: The MPT utilizes cryptographic hashing to ensure the integrity and authenticity of data. Each node in the tree contains a hash of its children, leading to a single root hash that represents the entire dataset. This root hash changes whenever any part of the data changes, making it easy to verify the integrity of the entire data set.
  1. Patricia Trie Properties: The Patricia Trie aspect of MPT optimizes storage by compressing paths of nodes with only one child. This reduces the overall size of the tree and speeds up the retrieval of data.

How it Works

  1. Key Representation: Keys in MPT are converted into a hexadecimal format and split into nibbles (4-bit units). This splitting is crucial for the traversal of the trie.
  1. Nodes: The MPT is composed of several types of nodes:
      • Branch Nodes: Have up to 16 children and store key-value pairs.
      • Leaf Nodes: Store the actual data at the end of a key's path.
      • Extension Nodes: Compress long paths of nodes with a single child.
  1. Root Hash: The root of the MPT is a single hash that represents the entire state of the trie. Any change in the underlying data results in a different root hash, providing a secure way to verify data integrity.
  1. State Trie: In Ethereum, the MPT is used to store the state trie, which includes all account balances, contract storage, and other data relevant to the current state of the blockchain.

Benefits

  1. Efficiency: The use of Patricia Trie reduces the storage space required and speeds up data retrieval.
  1. Security: The Merkle Tree structure ensures data integrity and enables efficient proof of inclusion or exclusion.
  1. Scalability: The MPT allows for efficient updates and scaling of the blockchain as new transactions are added.
  1. Provability: The Merkle Tree structure enables efficient generation of proofs for the inclusion or exclusion of data. The proof verification process is also fast.

Node

notion image

Short Node (shortNode):

  • Structure: A short node consists of a key (a sequence of nibbles) and a single value node.
  • Purpose: Used to represent a path that consists of a sequence of nibbles that do not branch out into multiple paths. This node type helps in compressing the trie by collapsing non-branching paths into a single node.
Fields:
  • Key: The sequence of nibbles that represents the path.
  • Val: The child node that the key leads to.
  • Flags: Metadata about the node, such as whether it is dirty (i.e., has been modified).
Example:
shortNode { Key: [1, 2, 3], Val: valueNode }

Full Node (fullNode):

Structure: A full node contains an array of 17 children nodes.
Purpose: Used to represent a node that can branch out into multiple paths. Each child node corresponds to a possible nibble (0-15), with the 17th child (index 16) being a potential value node.
Fields:
  • Children: An array of 17 child nodes.
  • Flags: Metadata about the node.

Value Node (valueNode):

Structure: A value node contains the actual data stored at the end of a path in the trie.
Purpose: Represents the value associated with a key in the trie.
Fields:
  • A byte slice representing the value.
Example:
valueNode: [value]

Hash Node (hashNode):

Structure: A hash node is a reference to a node that is stored elsewhere, identified by its hash.
Purpose: Used to reference nodes that are not immediately available in memory, allowing for efficient storage and retrieval.
Fields:
  • A byte slice representing the hash of the node.
Example:
hashNode: [hash]

Code Analysis

New Trie

trie.NewEmpty creates an empty Trie. db is the database used to store data.
/// go-ethereum/trie/trie.go // NewEmpty is a shortcut to create empty tree. It's mostly used in tests. func NewEmpty(db database.Database) *Trie { tr, _ := New(TrieID(types.EmptyRootHash), db) return tr }
 
TrieID is the identifier of Trie. In the construction of Trie, root is the hash representation of the Trie which can be used to fetch the whole Trie data from database.
/// go-ethereum/trie/trie_id.go // TrieID constructs an identifier for a standard trie(not a second-layer trie) // with provided root. It's mostly used in tests and some other tries like CHT trie. func TrieID(root common.Hash) *ID { return &ID{ StateRoot: root, Owner: common.Hash{}, Root: root, } } /// go-ethereum/common/types.go // Lengths of hashes and addresses in bytes. const ( // HashLength is the expected length of the hash HashLength = 32 // AddressLength is the expected length of the address AddressLength = 20 ) // Hash represents the 32 byte Keccak256 hash of arbitrary data. type Hash [HashLength]byte
 
New constructs Trie instance using Trie ID and database.
  • calls newTrieReader to fetch trie reader of the Trie. reader is the handler trie can retrieve nodes from.
  • if the root of Trie is not empty node, then it will call trie.resolveAndTrack to load the root node using trie reader.
  • return Trie instance.
/// go-ethereum/trie/trie.go // Trie is a Merkle Patricia Trie. Use New to create a trie that sits on // top of a database. Whenever trie performs a commit operation, the generated // nodes will be gathered and returned in a set. Once the trie is committed, // it's not usable anymore. Callers have to re-create the trie with new root // based on the updated trie database. // // Trie is not safe for concurrent use. type Trie struct { root node owner common.Hash // Flag whether the commit operation is already performed. If so the // trie is not usable(latest states is invisible). committed bool // Keep track of the number leaves which have been inserted since the last // hashing operation. This number will not directly map to the number of // actually unhashed nodes. unhashed int // reader is the handler trie can retrieve nodes from. reader *trieReader // tracer is the tool to track the trie changes. // It will be reset after each commit operation. tracer *tracer } // New creates the trie instance with provided trie id and the read-only // database. The state specified by trie id must be available, otherwise // an error will be returned. The trie root specified by trie id can be // zero hash or the sha3 hash of an empty string, then trie is initially // empty, otherwise, the root node must be present in database or returns // a MissingNodeError if not. func New(id *ID, db database.Database) (*Trie, error) { reader, err := newTrieReader(id.StateRoot, id.Owner, db) if err != nil { return nil, err } trie := &Trie{ owner: id.Owner, reader: reader, tracer: newTracer(), } if id.Root != (common.Hash{}) && id.Root != types.EmptyRootHash { rootnode, err := trie.resolveAndTrack(id.Root[:], nil) if err != nil { return nil, err } trie.root = rootnode } return trie, nil }
 
newTrieReader generates trie reader which can be used to fetch node of the Trie.
If the stateRoot is EmptyRootHash, which means that the Trie is an empty Trie, then there is no need for reader. Otherwise it will call db.Reader to construct reader for the Trie.
/// go-ethereum/trie/trie_reader.go // trieReader is a wrapper of the underlying node reader. It's not safe // for concurrent usage. type trieReader struct { owner common.Hash reader database.Reader banned map[string]struct{} // Marker to prevent node from being accessed, for tests } // newTrieReader initializes the trie reader with the given node reader. func newTrieReader(stateRoot, owner common.Hash, db database.Database) (*trieReader, error) { if stateRoot == (common.Hash{}) || stateRoot == types.EmptyRootHash { if stateRoot == (common.Hash{}) { log.Error("Zero state root hash!") } return &trieReader{owner: owner}, nil } reader, err := db.Reader(stateRoot) if err != nil { return nil, &MissingNodeError{Owner: owner, NodeHash: stateRoot, err: err} } return &trieReader{owner: owner, reader: reader}, nil }
 
resolveAndTrack uses trie reader to load and decode node, and calls tracer.onRead to record the load operation.
/// go-ethereum/trie/trie.go // resolveAndTrack loads node from the underlying store with the given node hash // and path prefix and also tracks the loaded node blob in tracer treated as the // node's original value. The rlp-encoded blob is preferred to be loaded from // database because it's easy to decode node while complex to encode node to blob. func (t *Trie) resolveAndTrack(n hashNode, prefix []byte) (node, error) { blob, err := t.reader.node(prefix, common.BytesToHash(n)) if err != nil { return nil, err } t.tracer.onRead(prefix, blob) return mustDecodeNode(n, blob), nil } /// go-ethereum/trie/tracer.go // onRead tracks the newly loaded trie node and caches the rlp-encoded // blob internally. Don't change the value outside of function since // it's not deep-copied. func (t *tracer) onRead(path []byte, val []byte) { t.accessList[string(path)] = val }
 
tracer is the tool to track the trie changes. It will be reset after each commit operation.
/// go-ethereum/trie/tracer.go // tracer tracks the changes of trie nodes. During the trie operations, // some nodes can be deleted from the trie, while these deleted nodes // won't be captured by trie.Hasher or trie.Committer. Thus, these deleted // nodes won't be removed from the disk at all. Tracer is an auxiliary tool // used to track all insert and delete operations of trie and capture all // deleted nodes eventually. // // The changed nodes can be mainly divided into two categories: the leaf // node and intermediate node. The former is inserted/deleted by callers // while the latter is inserted/deleted in order to follow the rule of trie. // This tool can track all of them no matter the node is embedded in its // parent or not, but valueNode is never tracked. // // Besides, it's also used for recording the original value of the nodes // when they are resolved from the disk. The pre-value of the nodes will // be used to construct trie history in the future. // // Note tracer is not thread-safe, callers should be responsible for handling // the concurrency issues by themselves. type tracer struct { inserts map[string]struct{} deletes map[string]struct{} accessList map[string][]byte } // newTracer initializes the tracer for capturing trie changes. func newTracer() *tracer { return &tracer{ inserts: make(map[string]struct{}), deletes: make(map[string]struct{}), accessList: make(map[string][]byte), } }

Insert data

Trie.MustUpdate can insert key, value into the Trie. It calls Trie.Update and checks whether there is error.
/// go-ethereum/trie/trie.go // MustUpdate is a wrapper of Update and will omit any encountered error but // just print out an error message. func (t *Trie) MustUpdate(key, value []byte) { if err := t.Update(key, value); err != nil { log.Error("Unhandled trie error in Trie.Update", "err", err) } }
 
Trie.Update checks whether the Trie has been commited. If it has, the Trie is immutable, can’t insert data. Otherwise, call Trie.update to insert value. We will analyze the commit process detailedly later.
/// go-ethereum/trie/trie.go // Update associates key with value in the trie. Subsequent calls to // Get will return value. If value has length zero, any existing value // is deleted from the trie and calls to Get will return nil. // // The value bytes must not be modified by the caller while they are // stored in the trie. // // If the requested node is not present in trie, no error will be returned. // If the trie is corrupted, a MissingNodeError is returned. func (t *Trie) Update(key, value []byte) error { // Short circuit if the trie is already committed and not usable. if t.committed { return ErrCommitted } return t.update(key, value) }
 
Trie.update:
  • increase Trie.unhashed by 1 which tracks the number of leaves which have been inserted since the last hashing operation.
  • convert key to nibbles.
  • calls Trie.insert or Trie.delete according to whether the value is empty.
  • update the root of Trie.
 
Purpose of Trie.unhashed
  • Counting Modifications: The unhashed field is incremented each time a leaf node is inserted/updated/deleted.
  • Efficiency Consideration: By tracking the number of unhashed nodes, the implementation can decide when to perform certain operations like committing the trie.
/// go-ethereum/trie/trie.go func (t *Trie) update(key, value []byte) error { t.unhashed++ // convert key to nibbles k := keybytesToHex(key) // if value is empty which means insert operation, then call t.insert if len(value) != 0 { _, n, err := t.insert(t.root, nil, k, valueNode(value)) if err != nil { return err } // update the root of Trie. t.root = n } else { // if value is empty which means delete operation, then call t.delete _, n, err := t.delete(t.root, nil, k) if err != nil { return err } // update the root of Trie. t.root = n } return nil }
 
keybytesToHex converts byte array to nibble array. nibble is of type byte but only uses the first 4 bits with range [0, ). Note it adds trailing byte with value 16 which is used to indicate whether or not the node at the key contains a value. 16 will be used in branchNode to store data in the 16th position of branchNode.
/// go-ethereum/trie/encoding.go func keybytesToHex(str []byte) []byte { l := len(str)*2 + 1 var nibbles = make([]byte, l) for i, b := range str { nibbles[i*2] = b / 16 nibbles[i*2+1] = b % 16 } nibbles[l-1] = 16 return nibbles }
 
Trie.insert inserts key and value into the Trie.
It has three parameters:
  • n: The current node in the trie where the insertion process is taking place. It can be of various types, such as shortNode, fullNode, valueNode, or hashNode. Different kinds of nodes mean differnet insertion operation.
  • prefix: This is the prefix of the key leading up to the current node. It represents the path taken from the root of the trie to the current node. This parameter helps in tracking the path and identifying where the insertion should continue.
  • key: This is the remaining part of the key that needs to be inserted into the trie. As the insertion process moves deeper into the trie, this key gets shortened until it becomes empty, indicating that the exact position for the new value has been found.
  • value: This is the value to be inserted into the trie. It is typically a valueNode, which stores the actual data corresponding to the key.
It has three return values:
  • bool (Dirty Flag): Indicates whether the trie was modified as a result of the insertion. If it is, then the upper caller should update its sub node.
  • node (Updated Node): Represents the updated (if has update else original node) node after the insertion.
  • error: Indicates if any issues occurred during the insertion process.
 
The insert function recursively updates the trie from the leaf node back to the root node. Each insertion step involves:
1.Determining the Node Type:
  • Based on the type of the current node (n), the function follows specific rules to handle the insertion.
2.Handling Leaf Nodes:
  • If the key length is zero, it means the exact position for the new value has been found:
    • If the current node (n) is a valueNode, compare it with the new value.
    • If they differ, replace the existing value with the new value.
    • Return the updated value node and mark the trie as dirty if changes were made.
3.Handling Short Nodes:
  • Calculate the matching length between the key and the node’s key.
  • If the entire key matches, recursively insert the remaining key into the value of the shortNode.
  • If the keys differ, create a fullNode to branch out at the point where they differ.
  • Replace the shortNode with the new branch node if necessary, or update it to point to the new branch node.
4. Handling Full Nodes:
  • Identify the child node corresponding to the first byte of the key.
  • Recursively insert the remaining key into this child node.
  • Update the current node with the returned updated child node.
  • Return the modified fullNode.
5. Handling Nil Nodes:
  • If the current node is nil, create a new shortNode with the remaining key and the value.
  • Track the insertion in the tracer.
6. Handling Hash Nodes:
  • If a hashNode is encountered, resolve it to the actual node.
  • Recursively insert into the resolved node.
  • Return the updated node if the trie is modified.
7. Propagating Changes Upwards:
  • Each recursive call updates the sub-node and propagates the changes upwards.
  • The process continues until the root node is updated, ensuring the entire path from the leaf to the root reflects the new insertion.
 
/// go-ethereum/trie/trie.go func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error) { // if key is empty, which means we have found the valueNode if len(key) == 0 { // update Dirty Flag according to whether valueNode data is the same as // the value to update, if v, ok := n.(valueNode); ok { return !bytes.Equal(v, value.(valueNode)), value, nil } // if n is not valueNode, then it is nil, we should insert new valueNode. // so set Dirty Flag to be true. return true, value, nil } // get type of n, and update according to node type switch n := n.(type) { case *shortNode: // ... case *fullNode: // ... case nil: // ... case hashNode: // ... default: panic(fmt.Sprintf("%T: invalid node: %v", n, n)) } }
 
There are four kinds of node.
fullNode is branch node which has 16 children and its own value (valueNode).
shortNode is extension node whose Key stores mutual prefix of different child nodes, and Val stores the next node.
valueNode stores the value.
hashNode stores hash of node whose data is stored in database rather then memory.
/// go-ethereum/trie/node.go type ( fullNode struct { Children [17]node // Actual trie node data to encode/decode (needs custom encoder) flags nodeFlag } shortNode struct { Key []byte Val node flags nodeFlag } hashNode []byte valueNode []byte ) // nodeFlag contains caching-related metadata about a node. type nodeFlag struct { hash hashNode // cached hash of the node (may be nil) dirty bool // whether the node has changes that must be written to the database }
 

ShortNode

When dealing with shortNode, it first uses prefixLen to calculate the length of mutual leading bytes of remaining key and key of the shortNode.
  • if the matched length equals key length of shortNode, then insert on the node of the shortNode
  • if the matched length is smaller than the key length of shortNode , then create a fullNode to branch out at the point where they differ. Create two child nodes and insert them into the newly created branch node, one is the node of the shortNode, the other is new node contains value to be inserted.
Note when the matched length is one less then the key length of shortNode, due to the last nibble of the shortNode key being 16, the valueNode/hashNode of the shortNode will be stored in the 16th position of branchNode.
/// go-ethereum/trie/trie.go case *shortNode: matchlen := prefixLen(key, n.Key) // If the whole key matches, keep this short node as is // and only update the value. if matchlen == len(n.Key) { dirty, nn, err := t.insert(n.Val, append(prefix, key[:matchlen]...), key[matchlen:], value) if !dirty || err != nil { return false, n, err } return true, &shortNode{n.Key, nn, t.newFlag()}, nil } // Otherwise branch out at the index where they differ. branch := &fullNode{flags: t.newFlag()} var err error _, branch.Children[n.Key[matchlen]], err = t.insert(nil, append(prefix, n.Key[:matchlen+1]...), n.Key[matchlen+1:], n.Val) if err != nil { return false, nil, err } _, branch.Children[key[matchlen]], err = t.insert(nil, append(prefix, key[:matchlen+1]...), key[matchlen+1:], value) if err != nil { return false, nil, err } // Replace this shortNode with the branch if it occurs at index 0. if matchlen == 0 { return true, branch, nil } // New branch node is created as a child of the original short node. // Track the newly inserted node in the tracer. The node identifier // passed is the path from the root node. t.tracer.onInsert(append(prefix, key[:matchlen]...)) // Replace it with a short node leading up to the branch. return true, &shortNode{key[:matchlen], branch, t.newFlag()}, nil
 
prefixLen calcualtes the length of common leading bytes of two byte array.
/// go-ethereum/trie/encoding.go // prefixLen returns the length of the common prefix of a and b. func prefixLen(a, b []byte) int { var i, length = 0, len(a) if len(b) < length { length = len(b) } for ; i < length; i++ { if a[i] != b[i] { break } } return i }

FullNode

When dealing with fullNode, it just update the child node of the fullNode according to key[0] which specifies the child node’s position.
case *fullNode: dirty, nn, err := t.insert(n.Children[key[0]], append(prefix, key[0]), key[1:], value) if !dirty || err != nil { return false, n, err } n = n.copy() n.flags = t.newFlag() n.Children[key[0]] = nn return true, n, nil

Nil

There are several scenarios when it comes to the nil branch. For example, when add a new child node of a branch node. In such case, there is no node before, so it just creates a new shortNode contains the value, and its key absorbs all the remaining key.
/// go-ethereum/trie/trie.go case nil: // New short node is created and track it in the tracer. The node identifier // passed is the path from the root node. Note the valueNode won't be tracked // since it's always embedded in its parent. t.tracer.onInsert(prefix) return true, &shortNode{key, value, t.newFlag()}, nil

HashNode

hashNode only stores the hash of original node which saves memory space. When dealing with hashNode, we need to first load the original node from database, and just insert on it.
/// go-ethereum/trie/trie.go case hashNode: // We've hit a part of the trie that isn't loaded yet. Load // the node and insert into it. This leaves all child nodes on // the path to the value in the trie. rn, err := t.resolveAndTrack(n, prefix) if err != nil { return false, nil, err } dirty, nn, err := t.insert(rn, prefix, key, value) if !dirty || err != nil { return false, rn, err } return true, nn, nil

Update data

Trie.MustUpdate can also update data. Just pass the key and corresponding value.
Given a shortNode as an example, assume the shortNode's val is valueNode. During update, matchlen == len(n.Key), so it calls Trie.insert on shortNode’s valueNode with empty key.
In the Trie.insert, if key is empty, it will check whether the value to update equals the shortNode’s valueNode value. If them differ, the dirty flag will be set true, and in the upper call, a new shortNode will be constructed with new valueNode.
// go-ethereum/trie/trie.go func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error) { if len(key) == 0 { if v, ok := n.(valueNode); ok { return !bytes.Equal(v, value.(valueNode)), value, nil } return true, value, nil } switch n := n.(type) { case *shortNode: matchlen := prefixLen(key, n.Key) // If the whole key matches, keep this short node as is // and only update the value. if matchlen == len(n.Key) { dirty, nn, err := t.insert(n.Val, append(prefix, key[:matchlen]...), key[matchlen:], value) if !dirty || err != nil { return false, n, err } return true, &shortNode{n.Key, nn, t.newFlag()}, nil } // Otherwise branch out at the index where they differ. branch := &fullNode{flags: t.newFlag()} var err error _, branch.Children[n.Key[matchlen]], err = t.insert(nil, append(prefix, n.Key[:matchlen+1]...), n.Key[matchlen+1:], n.Val) if err != nil { return false, nil, err } _, branch.Children[key[matchlen]], err = t.insert(nil, append(prefix, key[:matchlen+1]...), key[matchlen+1:], value) if err != nil { return false, nil, err } // Replace this shortNode with the branch if it occurs at index 0. if matchlen == 0 { return true, branch, nil } // New branch node is created as a child of the original short node. // Track the newly inserted node in the tracer. The node identifier // passed is the path from the root node. t.tracer.onInsert(append(prefix, key[:matchlen]...)) // Replace it with a short node leading up to the branch. return true, &shortNode{key[:matchlen], branch, t.newFlag()}, nil // ... }

Get data

We can call Trie.MustGet to get value of key.
Inside Trie.MustGet, it calls Trie.Get to get value and checks whether there is error.
/// go-ethereum/trie/trie.go // MustGet is a wrapper of Get and will omit any encountered error but just // print out an error message. func (t *Trie) MustGet(key []byte) []byte { res, err := t.Get(key) if err != nil { log.Error("Unhandled trie error in Trie.Get", "err", err) } return res }
 
Trie.Get checks whether the Trie has been committed, if already committed, then it cannot be used to fetch value.
Then it calls Trie.get to fetch value.
If there is any node resolve happens, the didResolve will be true, it will update the root to the new root which includes resolved nodes.
Note it uses keybytesToHex to convert bytes to nibbles which matches the insertion operation.
/// go-ethereum/trie/trie.go // Get returns the value for key stored in the trie. // The value bytes must not be modified by the caller. // // If the requested node is not present in trie, no error will be returned. // If the trie is corrupted, a MissingNodeError is returned. func (t *Trie) Get(key []byte) ([]byte, error) { // Short circuit if the trie is already committed and not usable. if t.committed { return nil, ErrCommitted } value, newroot, didResolve, err := t.get(t.root, keybytesToHex(key), 0) if err == nil && didResolve { t.root = newroot } return value, err }
 
Trie.get fetches value based on node, key and pos.
  • node is current node based on which to recursively locate node.
  • key is the original key used to locate node.
  • pos is the first position of the remaining key.
Trie.get recursively locates the node/value of the key, and track whether there is resolved hashNode.
/// go-ethereum/trie/trie.go func (t *Trie) get(origNode node, key []byte, pos int) (value []byte, newnode node, didResolve bool, err error) { switch n := (origNode).(type) { case nil: return nil, nil, false, nil case valueNode: return n, n, false, nil case *shortNode: if len(key)-pos < len(n.Key) || !bytes.Equal(n.Key, key[pos:pos+len(n.Key)]) { // key not found in trie return nil, n, false, nil } value, newnode, didResolve, err = t.get(n.Val, key, pos+len(n.Key)) if err == nil && didResolve { n = n.copy() n.Val = newnode } return value, n, didResolve, err case *fullNode: value, newnode, didResolve, err = t.get(n.Children[key[pos]], key, pos+1) if err == nil && didResolve { n = n.copy() n.Children[key[pos]] = newnode } return value, n, didResolve, err case hashNode: child, err := t.resolveAndTrack(n, key[:pos]) if err != nil { return nil, n, true, err } value, newnode, _, err := t.get(child, key, pos) return value, newnode, true, err default: panic(fmt.Sprintf("%T: invalid node: %v", origNode, origNode)) } }

Delete data

Call Trie.MustDelete to delete value specified by key.
/// go-ethereum/trie/trie.go // MustDelete is a wrapper of Delete and will omit any encountered error but // just print out an error message. func (t *Trie) MustDelete(key []byte) { if err := t.Delete(key); err != nil { log.Error("Unhandled trie error in Trie.Delete", "err", err) } }
 
Trie.Delete first checks whether the Trie has been committed, committed Trie is unmodifiable.
Then it increases Trie.unhashed by 1, calculates the nibble represent of the key, then call Trie.delete to delete the value.
After the deletion, it updates the root.
/// go-ethereum/trie/trie.go // Delete removes any existing value for key from the trie. // // If the requested node is not present in trie, no error will be returned. // If the trie is corrupted, a MissingNodeError is returned. func (t *Trie) Delete(key []byte) error { // Short circuit if the trie is already committed and not usable. if t.committed { return ErrCommitted } t.unhashed++ k := keybytesToHex(key) _, n, err := t.delete(t.root, nil, k) if err != nil { return err } t.root = n return nil }
 
Trie.delete recursively deletes key. It checks the type of current node, and execute corresponding logic based on node type.
/// go-ethereum/trie/trie.go // delete returns the new root of the trie with key deleted. // It reduces the trie to minimal form by simplifying // nodes on the way up after deleting recursively. func (t *Trie) delete(n node, prefix, key []byte) (bool, node, error) { switch n := n.(type) { case *shortNode: // ... case *fullNode: // ... case hashNode: // ... default: panic(fmt.Sprintf("%T: invalid node: %v (%v)", n, n, key)) } }

ShortNode

When dealing with shortNode :
  • calculate matchlen as the length of the mutual leading bytes of key and shortNode’s key
  • if matchlen is smaller then shortNode’s Key length, meaning there is no node corresponds to the key, so there is no deletion.
  • if matchlen is same as the key, meaning the shortNode is the node of the key, then delete it.
  • else, calls Trie.delete on child node of shortNode, get the updated node. If the updated node’s type is shortNode, then merges them and mark the parent shortNode as deleted in tracer. Otherwise updates the child node.
/// go-ethereum/trie/trie.go case *shortNode: matchlen := prefixLen(key, n.Key) if matchlen < len(n.Key) { return false, n, nil // don't replace n on mismatch } if matchlen == len(key) { // The matched short node is deleted entirely and track // it in the deletion set. The same the valueNode doesn't // need to be tracked at all since it's always embedded. t.tracer.onDelete(prefix) return true, nil, nil // remove n entirely for whole matches } // The key is longer than n.Key. Remove the remaining suffix // from the subtrie. Child can never be nil here since the // subtrie must contain at least two other values with keys // longer than n.Key. dirty, child, err := t.delete(n.Val, append(prefix, key[:len(n.Key)]...), key[len(n.Key):]) if !dirty || err != nil { return false, n, err } switch child := child.(type) { case *shortNode: // The child shortNode is merged into its parent, track // is deleted as well. t.tracer.onDelete(append(prefix, n.Key...)) // Deleting from the subtrie reduced it to another // short node. Merge the nodes to avoid creating a // shortNode{..., shortNode{...}}. Use concat (which // always creates a new slice) instead of append to // avoid modifying n.Key since it might be shared with // other nodes. return true, &shortNode{concat(n.Key, child.Key...), child.Val, t.newFlag()}, nil default: return true, &shortNode{n.Key, child, t.newFlag()}, nil }

FullNode

  • call Trie.delete to delete on child and get updated child node. Update n with the new child node.
  • if child node is not nil, then there must exist two child node. So n is still fullNode, just return the updated fullNode to caller.
  • if child node is nil, we should consider the possibility the there is only one child node left. In such case, n should be converted into shortNode.
  • if the only one child node is at position 16, then it is a valueNode, just construct a shortNode
  • if the only one child node is not at position 16, we should check whether it’s a shortNode, if it is, they should be merged.
/// go-ethereum/trie/trie.go case *fullNode: dirty, nn, err := t.delete(n.Children[key[0]], append(prefix, key[0]), key[1:]) if !dirty || err != nil { return false, n, err } n = n.copy() n.flags = t.newFlag() n.Children[key[0]] = nn // Because n is a full node, it must've contained at least two children // before the delete operation. If the new child value is non-nil, n still // has at least two children after the deletion, and cannot be reduced to // a short node. if nn != nil { return true, n, nil } // Reduction: // Check how many non-nil entries are left after deleting and // reduce the full node to a short node if only one entry is // left. Since n must've contained at least two children // before deletion (otherwise it would not be a full node) n // can never be reduced to nil. // // When the loop is done, pos contains the index of the single // value that is left in n or -2 if n contains at least two // values. pos := -1 for i, cld := range &n.Children { if cld != nil { if pos == -1 { pos = i } else { pos = -2 break } } } if pos >= 0 { if pos != 16 { // If the remaining entry is a short node, it replaces // n and its key gets the missing nibble tacked to the // front. This avoids creating an invalid // shortNode{..., shortNode{...}}. Since the entry // might not be loaded yet, resolve it just for this // check. cnode, err := t.resolve(n.Children[pos], append(prefix, byte(pos))) if err != nil { return false, nil, err } if cnode, ok := cnode.(*shortNode); ok { // Replace the entire full node with the short node. // Mark the original short node as deleted since the // value is embedded into the parent now. t.tracer.onDelete(append(prefix, byte(pos))) k := append([]byte{byte(pos)}, cnode.Key...) return true, &shortNode{k, cnode.Val, t.newFlag()}, nil } } // Otherwise, n is replaced by a one-nibble short node // containing the child. return true, &shortNode{[]byte{byte(pos)}, n.Children[pos], t.newFlag()}, nil } // n still contains at least two values and cannot be reduced. return true, n, nil

ValueNode

If it is a valueNode, just return nil to delete it.
/// go-ethereum/trie/trie.go case valueNode: return true, nil, nil

Nil

nil node means there is no such node to delete. So just return false to indicate there is no corresponding node.
/// go-ethereum/trie/trie.go case nil: return false, nil, nil

HashNode

For hashNode, first resolve it and call Trie.delete on the resolved node.
/// go-ethereum/trie/trie.go case hashNode: // We've hit a part of the trie that isn't loaded yet. Load // the node and delete from it. This leaves all child nodes on // the path to the value in the trie. rn, err := t.resolveAndTrack(n, prefix) if err != nil { return false, nil, err } dirty, nn, err := t.delete(rn, prefix, key) if !dirty || err != nil { return false, rn, err } return true, nn, nil

Hash

Trie.Hash calls Trie.hashRoot to hash the Trie and get the hashed node, then convert it to 32 bytes array and return. #To what’s the hash and cached.
/// go-ethereum/trie/trie.go // Hash returns the root hash of the trie. It does not write to the // database and can be used even if the trie doesn't have one. func (t *Trie) Hash() common.Hash { hash, cached := t.hashRoot() t.root = cached return common.BytesToHash(hash.(hashNode)) } /// go-ethereum/common/types.go // Lengths of hashes and addresses in bytes. const ( // HashLength is the expected length of the hash HashLength = 32 // AddressLength is the expected length of the address AddressLength = 20 ) // Hash represents the 32 byte Keccak256 hash of arbitrary data. type Hash [HashLength]byte
 
BytesToHash converts byte array to 32 length byte array.
  • If the byte array length is bigger than 32, it will be kept trailing 32 bytes.
  • If the byte array length is smaller than 32, it will occupy the trailing part of the 32 bytes.
/// go-ethereum/common/types.go // Lengths of hashes and addresses in bytes. const ( // HashLength is the expected length of the hash HashLength = 32 // AddressLength is the expected length of the address AddressLength = 20 ) // Hash represents the 32 byte Keccak256 hash of arbitrary data. type Hash [HashLength]byte // BytesToHash sets b to hash. // If b is larger than len(h), b will be cropped from the left. func BytesToHash(b []byte) Hash { var h Hash h.SetBytes(b) return h } // SetBytes sets the hash to the value of b. // If b is larger than len(h), b will be cropped from the left. func (h *Hash) SetBytes(b []byte) { if len(b) > len(h) { b = b[len(b)-HashLength:] } copy(h[HashLength-len(b):], b) }
 
Inside the Trie.hashRoot :
  • if the root is nil, then it just returns a hashNode with value of empty Trie’s hash.
  • it uses pool to reuse hasher which hashes node. If the Trie.unhashed is larger than 100, it will function in a parallel way. As illustrated above, Trie.unhashed records modifications of a Trie durng insert, update, delete.
  • hashRoot recursively hashes the whole Trie, it first calls h.hash on the root.
/// go-ethereum/trie/trie.go // hashRoot calculates the root hash of the given trie func (t *Trie) hashRoot() (node, node) { if t.root == nil { return hashNode(types.EmptyRootHash.Bytes()), nil } // If the number of changes is below 100, we let one thread handle it h := newHasher(t.unhashed >= 100) defer func() { returnHasherToPool(h) t.unhashed = 0 }() hashed, cached := h.hash(t.root, true) return hashed, cached } /// go-ethereum/core/types/hashes.go EmptyRootHash = common.HexToHash("56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421") /// go-ethereum/trie/hasher.go // hasher is a type used for the trie Hash operation. A hasher has some // internal preallocated temp space type hasher struct { sha crypto.KeccakState tmp []byte encbuf rlp.EncoderBuffer parallel bool // Whether to use parallel threads when hashing } // hasherPool holds pureHashers var hasherPool = sync.Pool{ New: func() interface{} { return &hasher{ tmp: make([]byte, 0, 550), // cap is as large as a full fullNode. sha: sha3.NewLegacyKeccak256().(crypto.KeccakState), encbuf: rlp.NewEncoderBuffer(nil), } }, } func newHasher(parallel bool) *hasher { h := hasherPool.Get().(*hasher) h.parallel = parallel return h } func returnHasherToPool(h *hasher) { hasherPool.Put(h) }
 
Empty root hash is calculated assuming the empty root’s value is empty string. Use RLP to encode an empty string will get 0x80, then use keccak256 to get the hashed value.
import { keccak256 } from "ethers"; async function main() { // The RLP encoding of an empty string is 0x80 const emptyRLP = "0x80"; const emptyRootHash = keccak256(emptyRLP); console.log(emptyRootHash); // Should print '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421' } main();
 
The hash function in the hasher type is used to recursively hash a trie node. It collapses the node into a hash and also returns a copy of the original node with the computed hash.
Parameters
  • n node: The trie node to be hashed.
  • force bool: A boolean flag indicating whether to force the hashing of the node even if it is small enough to be stored inline. "stored inline" refers to the optimization where small nodes are not hashed and stored as separate entries. Instead, they are embedded directly within their parent node.
 
Return Values
  • hashed node: The resulting node after hashing. This will be a hashNode if the node was hashed or the original node if it was stored inline.
  • cached node: A copy of the original node with the computed hash stored. This helps avoid recomputing the hash in future operations.
 
The cached node is essentially a copy of the original node with the computed hash stored. This helps in avoiding repeated hash computations. Here’s how it is used:
  1. When a Node is Hashed:
      • The hash function calculates the hash of the node and updates the node's flags.hash with the computed hash.
      • The cached node is returned with this updated hash, so subsequent operations can directly use the cached hash without recomputing it.
  1. When a Node is Stored Inline:
      • If the node is not hashed (i.e., stored inline because it is small), the cached node will be the same as the original node.
      • This indicates that no hash computation is needed for this node, and it can be directly used in its current form.
Note in the insertion operation, when there is modification on fullNode or shortNode, it will new a flag which drops the cached hash if there is one.
/// go-ethereum/trie/hasher.go // hash collapses a node down into a hash node, also returning a copy of the // original node initialized with the computed hash to replace the original one. func (h *hasher) hash(n node, force bool) (hashed node, cached node) { // Return the cached hash if it's available if hash, _ := n.cache(); hash != nil { return hash, n } // Trie not processed yet, walk the children switch n := n.(type) { case *shortNode: // ... case *fullNode: // ... default: // Value and hash nodes don't have children, so they're left as were return n, n } }
 
only fullNode and shortNode may have cached hash. hashNode itself is hash of node. valueNode doesn’t need to be hashed.
/// go-ethereum/trie/node.go // nodeFlag contains caching-related metadata about a node. type nodeFlag struct { hash hashNode // cached hash of the node (may be nil) dirty bool // whether the node has changes that must be written to the database } func (n *fullNode) cache() (hashNode, bool) { return n.flags.hash, n.flags.dirty } func (n *shortNode) cache() (hashNode, bool) { return n.flags.hash, n.flags.dirty } func (n hashNode) cache() (hashNode, bool) { return nil, true } func (n valueNode) cache() (hashNode, bool) { return nil, true }
 

Hash Short Node

  • hashShortNodeChildren collapses the child node of the shortNode to hashNode. It hashes the children of the shortNode and get the hashNode which replaces the original child node. Also it update the Key of the shortNode in hex compact encoding.
  • then it encode the collapsed node to get hash representation of the shortNode.
  • finally it updates the hash flag of the shortNode
/// go-ethereum/trie/hasher.go case *shortNode: collapsed, cached := h.hashShortNodeChildren(n) hashed := h.shortnodeToHash(collapsed, force) // We need to retain the possibly _not_ hashed node, in case it was too // small to be hashed if hn, ok := hashed.(hashNode); ok { cached.flags.hash = hn } else { cached.flags.hash = nil } return hashed, cached
 
hashShortNodeChildren:
  • shallow copy the shortNode, and update the collapsed node’s key using hex compact encoding(refer).
  • If the child node of the shortNode is of type fullNode or shortNode, it will recursively hash the child node and get corresponding hash node and cached node of the child node.
  • update collapsed and cached's Val.
/// go-ethereum/trie/hasher.go // hashShortNodeChildren collapses the short node. The returned collapsed node // holds a live reference to the Key, and must not be modified. func (h *hasher) hashShortNodeChildren(n *shortNode) (collapsed, cached *shortNode) { // Hash the short node's child, caching the newly hashed subtree collapsed, cached = n.copy(), n.copy() collapsed.Key = hexToCompact(n.Key) // Unless the child is a valuenode or hashnode, hash it switch n.Val.(type) { case *fullNode, *shortNode: collapsed.Val, cached.Val = h.hash(n.Val, false) } return collapsed, cached } /// go-ethereum/trie/encoding.go func hexToCompact(hex []byte) []byte { terminator := byte(0) if hasTerm(hex) { terminator = 1 hex = hex[:len(hex)-1] } buf := make([]byte, len(hex)/2+1) buf[0] = terminator << 5 // the flag byte if len(hex)&1 == 1 { buf[0] |= 1 << 4 // odd flag buf[0] |= hex[0] // first nibble is contained in the first byte hex = hex[1:] } decodeNibbles(hex, buf[1:]) return buf }
 
shortnodeToHash hashes the collapsed node to get final hashNode.
  1. Encode the Short Node:
      • The shortNode is first encoded using RLP (Recursive Length Prefix) encoding. This is done using the encode method of the shortNode, which writes the encoded data into the encbuf buffer.
  1. Retrieve the Encoded Bytes:
      • After encoding the shortNode, the encoded bytes are retrieved from the encbuf buffer. This is done using the encodedBytes method of the hasher.
  1. Check the Length of the Encoded Data:
      • The length of the encoded data is checked. If the length is less than 32 bytes and the force flag is not set to true, the node is considered small enough to be stored inline within its parent node. Therefore, the original shortNode is returned.
  1. Hash the Encoded Data:
      • If the length of the encoded data is 32 bytes or more, or if the force flag is true, the encoded data is hashed. This is done using the hashData method of the hasher, which performs the Keccak-256 hashing of the encoded data.
  1. Return the Hash Node:
      • The result of the hashData method is a hashNode, which is returned as the representation of the shortNode.
/// go-ethereum/trie/hasher.go // shortnodeToHash creates a hashNode from a shortNode. The supplied shortnode // should have hex-type Key, which will be converted (without modification) // into compact form for RLP encoding. // If the rlp data is smaller than 32 bytes, `nil` is returned. func (h *hasher) shortnodeToHash(n *shortNode, force bool) node { // encode shortNode n.encode(h.encbuf) // retrieve encoded bytes enc := h.encodedBytes() if len(enc) < 32 && !force { return n // Nodes smaller than 32 bytes are stored inside their parent } // hash encoded byte array return h.hashData(enc) } // hashData hashes the provided data func (h *hasher) hashData(data []byte) hashNode { n := make(hashNode, 32) h.sha.Reset() h.sha.Write(data) h.sha.Read(n) return n }
 
encoding implementation of diffrent nodes.
FullNode:
  • encode all childrens
ShortNode:
  • encode Key and Val
HashNode & ValueNode:
  • encode the byte array
/// go-ethereum/trie/node_enc.go func (n *fullNode) encode(w rlp.EncoderBuffer) { offset := w.List() for _, c := range n.Children { if c != nil { c.encode(w) } else { w.Write(rlp.EmptyString) } } w.ListEnd(offset) } func (n *shortNode) encode(w rlp.EncoderBuffer) { offset := w.List() w.WriteBytes(n.Key) if n.Val != nil { n.Val.encode(w) } else { w.Write(rlp.EmptyString) } w.ListEnd(offset) } func (n hashNode) encode(w rlp.EncoderBuffer) { w.WriteBytes(n) } func (n valueNode) encode(w rlp.EncoderBuffer) { w.WriteBytes(n) } func (n rawNode) encode(w rlp.EncoderBuffer) { w.Write(n) } /// go-ethereum/rlp/encode.go // EmptyString is the encoding of an empty string. EmptyString = []byte{0x80}
 

Hash Full Node

  • first hash all the child nodes of the fullNode , and replace the original child node with the corresponding hashNodes.
  • then encode fullNode to get hashNode
  • update the hash flag with the hashNode.
/// go-ethereum/trie/hasher.go case *fullNode: collapsed, cached := h.hashFullNodeChildren(n) hashed = h.fullnodeToHash(collapsed, force) if hn, ok := hashed.(hashNode); ok { cached.flags.hash = hn } else { cached.flags.hash = nil } return hashed, cached
 
if hasher.parallel is true, then it uses go routine to hash all child nodes concurrently, else hashes child nodes one by one.
/// go-ethereum/trie/hasher.go func (h *hasher) hashFullNodeChildren(n *fullNode) (collapsed *fullNode, cached *fullNode) { // Hash the full node's children, caching the newly hashed subtrees cached = n.copy() collapsed = n.copy() if h.parallel { var wg sync.WaitGroup wg.Add(16) for i := 0; i < 16; i++ { go func(i int) { hasher := newHasher(false) if child := n.Children[i]; child != nil { collapsed.Children[i], cached.Children[i] = hasher.hash(child, false) } else { collapsed.Children[i] = nilValueNode } returnHasherToPool(hasher) wg.Done() }(i) } wg.Wait() } else { for i := 0; i < 16; i++ { if child := n.Children[i]; child != nil { collapsed.Children[i], cached.Children[i] = h.hash(child, false) } else { collapsed.Children[i] = nilValueNode } } } return collapsed, cached }
 
after having hashed all child nodes, it has generated the collapsed version of the original fullNode with all child nodes replaced by corresponding hashNode. Then will call fullnodeToHash to calculate the corresponding hashNode of the collapsed fullNode.
It encodes all child hashNodes to get the RLP encoding of the collapsed fullNode and hashes it to get the hashNode.
/// go-ethereum/trie/hasher.go // fullnodeToHash is used to create a hashNode from a fullNode, (which // may contain nil values) func (h *hasher) fullnodeToHash(n *fullNode, force bool) node { n.encode(h.encbuf) enc := h.encodedBytes() if len(enc) < 32 && !force { return n // Nodes smaller than 32 bytes are stored inside their parent } return h.hashData(enc) }
/// go-ethereum/trie/node_enc.go func (n *fullNode) encode(w rlp.EncoderBuffer) { offset := w.List() for _, c := range n.Children { if c != nil { c.encode(w) } else { w.Write(rlp.EmptyString) } } w.ListEnd(offset) }

Commit

Overview

The commit process involves hashing and storing the modified nodes to ensure that changes to the trie are persisted and verifiable.
  1. Dirty Node Tracking:
      • Throughout trie operations (insertions, deletions, updates), nodes that are modified are marked as dirty. These dirty nodes need to be processed during the commit.
  1. Commit Operation:
      • The commit operation starts by initiating a committer, which will capture all the dirty nodes and hash them.
      • The process begins at the root of the trie and propagates changes down to the leaves, ensuring that every modified node is hashed and stored.
  1. Node Storage:
      • Nodes are stored in a persistent storage (like a database). The hash of each node is used as the key in the storage. For shortNode, the compact key and hashNode of the child node are stored. For fullNode, hashNodes of all child nodes are stored.
      • If a node is embedded in its parent (due to path compression), it will not be stored. Instead, it is managed as part of its parent node.
  1. Updating the Root Hash:
      • Once all the dirty nodes have been hashed and stored, the root hash of the trie is updated.
      • The root hash represents the current state of the trie and can be used to verify the integrity of the entire data structure.
  1. Tracing and Cleanup:
      • The commit process uses a tracer to track all changes and manage the state of the trie. After the commit, the tracer is reset to prepare for future operations.
      • The trie is marked as committed and becomes read-only, ensuring that the state cannot be modified further until a new trie instance is created based on the updated state.
So later if we want to load the trie and fetch node of certain key. We need to first load the root of the Trie which is a hashNode, and we should resolve all nodes lead to the target node from the database to reach the wanted node.
 
For example, if we initially load the hashNode of the Trie root into memory, and we update valueNode#A, delete the f#A_c#1(fullNode#A ’s 1th child node). Then during the commit process, the NodeSet will record that the shortNode#A, fullNode#A and shortNode#B are updated, f#A_c#1 is deleted. Those modification recorded in NodeSet will be udpated to database.
notion image
 
Commit is the entry to commit a Trie: Parameter:
  • collectLeaf: A boolean flag indicating whether to collect leaf nodes.If set true, then leave nodes(valueNode) will be collected stored in the trienode.NodeSet.
Return:
  • common.Hash: The root hash of the committed trie.
  • trienode.NodeSet: A set of all nodes that were committed.
  • error: An error if the commit process fails.
Inside:
  • defer to reset the tracer and update the Trie.committed to false. Because tracer is used to record the modification on a Trie, once the Trie has been committed, all modifications have been updated to database, so there is no need for those records. Also Trie.committed records whether the Trie has been committed, if it has, then it can’t be read or modified anymore. To read or modify, another new Trie with the root should be created along with other setups like a new tracer.
  • If the root is nil, which means that the Trie is intially nil, or all nodes have been deleted. In such case, only need to record deleted nodes. It calls deletedNodes to get pathes of all deleted nodes and record them in NodeSet.
  • Hash the Trie. Because it needs to store modified node’s hash into database.
  • check whether the root is dirty, if not, which means the whole Trie hasn’t been modified, so there is no need to commit.
  • commit the tree
    • add deleted nodes: loop deleted nodes recorded in tracer, add them to NodeSet
    • add updated nodes: commit the Trie from Trie.root
/// go-ethereum/trie/trie.go // Commit collects all dirty nodes in the trie and replaces them with the // corresponding node hash. All collected nodes (including dirty leaves if // collectLeaf is true) will be encapsulated into a nodeset for return. // The returned nodeset can be nil if the trie is clean (nothing to commit). // Once the trie is committed, it's not usable anymore. A new trie must // be created with new root and updated trie database for following usage func (t *Trie) Commit(collectLeaf bool) (common.Hash, *trienode.NodeSet, error) { defer t.tracer.reset() defer func() { t.committed = true }() // Trie is empty and can be classified into two types of situations: // (a) The trie was empty and no update happens => return nil // (b) The trie was non-empty and all nodes are dropped => return // the node set includes all deleted nodes if t.root == nil { paths := t.tracer.deletedNodes() if len(paths) == 0 { return types.EmptyRootHash, nil, nil // case (a) } nodes := trienode.NewNodeSet(t.owner) for _, path := range paths { nodes.AddNode([]byte(path), trienode.NewDeleted()) } return types.EmptyRootHash, nodes, nil // case (b) } // Derive the hash for all dirty nodes first. We hold the assumption // in the following procedure that all nodes are hashed. rootHash := t.Hash() // Do a quick check if we really need to commit. This can happen e.g. // if we load a trie for reading storage values, but don't write to it. if hashedNode, dirty := t.root.cache(); !dirty { // Replace the root node with the origin hash in order to // ensure all resolved nodes are dropped after the commit. t.root = hashedNode return rootHash, nil, nil } // start to comimit the Trie nodes := trienode.NewNodeSet(t.owner) // add deleted nodes for _, path := range t.tracer.deletedNodes() { nodes.AddNode([]byte(path), trienode.NewDeleted()) } // add updated nodes t.root = newCommitter(nodes, t.tracer, collectLeaf).Commit(t.root) return rootHash, nodes, nil }
 
Note that deletedNodes checks whether the deleted node’s path is in accessList, only those in accessList are considered deleted. accessList stores nodes fetched from database, if a deleted node’s path doesn’t exsit in accessList, meaning the node is embedded in its parent node, itself is not stored in database. So those nodes’ deletion has no need to be updated to database because there is no record for those nodes.
Give an example, assume a shortNode has a valueNode(embedded in the shortNode), during the commit process, the compact key of the shortNode and the valueNode are stored in single blob in database with shortNode ’s hash as key. The valueNode itself is not stored in database. During read of the valueNode, the shortNode will be loaded and the hash will be stored in the tracer.accessList. So if the node is not in accessList, its not read in database but embedded in other node, so there is no need to delete it.
/// go-ethereum/trie/tracer.go // deletedNodes returns a list of node paths which are deleted from the trie. func (t *tracer) deletedNodes() []string { var paths []string for path := range t.deletes { // It's possible a few deleted nodes were embedded // in their parent before, the deletions can be no // effect by deleting nothing, filter them out. _, ok := t.accessList[path] if !ok { continue } paths = append(paths, path) } return paths }
 
NewNodeSet initializes NodeSet which contains a set of nodes collected during the commit operation. Each node is keyed by path.
/// go-ethereum/trie/trienode/node.go // NewNodeSet initializes a node set. The owner is zero for the account trie and // the owning account address hash for storage tries. func NewNodeSet(owner common.Hash) *NodeSet { return &NodeSet{ Owner: owner, Nodes: make(map[string]*Node), } } // leaf represents a trie leaf node type leaf struct { Blob []byte // raw blob of leaf Parent common.Hash // the hash of parent node } // NodeSet contains a set of nodes collected during the commit operation. // Each node is keyed by path. It's not thread-safe to use. type NodeSet struct { Owner common.Hash Leaves []*leaf Nodes map[string]*Node updates int // the count of updated and inserted nodes deletes int // the count of deleted nodes } // Node is a wrapper which contains the encoded blob of the trie node and its // node hash. It is general enough that can be used to represent trie node // corresponding to different trie implementations. type Node struct { Hash common.Hash // Node hash, empty for deleted node Blob []byte // Encoded node blob, nil for the deleted node }
 
AddNode adds node to NodeSet. For deletede node, node.Blob is empty.
/// go-ethereum/trie/trienode/node.go // AddNode adds the provided node into set. func (set *NodeSet) AddNode(path []byte, n *Node) { if n.IsDeleted() { set.deletes += 1 } else { set.updates += 1 } set.Nodes[string(path)] = n } // IsDeleted returns the indicator if the node is marked as deleted. func (n *Node) IsDeleted() bool { return len(n.Blob) == 0 }
 
commiter is the tool used for the trie Commit operation.
/// go-ethereum/trie/committer.go // committer is the tool used for the trie Commit operation. The committer will // capture all dirty nodes during the commit process and keep them cached in // insertion order. type committer struct { nodes *trienode.NodeSet tracer *tracer collectLeaf bool } // newCommitter creates a new committer or picks one from the pool. func newCommitter(nodeset *trienode.NodeSet, tracer *tracer, collectLeaf bool) *committer { return &committer{ nodes: nodeset, tracer: tracer, collectLeaf: collectLeaf, } }
 
Inside the commit, it first check whether the node is dirty, if not and it has hash,meaning it has not been modified, just return the hash. Otherwise operate according to the node type.
/// go-ethereum/trie/committer.go // Commit collapses a node down into a hash node. func (c *committer) Commit(n node) hashNode { return c.commit(nil, n).(hashNode) } // commit collapses a node down into a hash node and returns it. func (c *committer) commit(path []byte, n node) node { // if this path is clean, use available cached data hash, dirty := n.cache() if hash != nil && !dirty { return hash } // Commit children, then parent, and remove the dirty flag. switch cn := n.(type) { case *shortNode: // ... case *fullNode: // ... case hashNode: return cn default: // nil, valuenode shouldn't be committed panic(fmt.Sprintf("%T: invalid node: %v", n, n)) } }

Commit shortNode

When committing shortNode:
  • first, if the child node is fullNode, it first recursively commits the child node and get the hashNode.
  • second, update the key using compact encoding.
  • finally, store the compact version shortNode in NodeSet, and return the hashNode representation to upper caller.
Note that during this process, if the child node is fullNode, it will call commit to recursively commit nodes.
/// go-ethereum/trie/committer.go case *shortNode: // Commit child collapsed := cn.copy() // If the child is fullNode, recursively commit, // otherwise it can only be hashNode or valueNode. if _, ok := cn.Val.(*fullNode); ok { collapsed.Val = c.commit(append(path, cn.Key...), cn.Val) } // The key needs to be copied, since we're adding it to the // modified nodeset. collapsed.Key = hexToCompact(cn.Key) hashedNode := c.store(path, collapsed) if hn, ok := hashedNode.(hashNode); ok { return hn } return collapsed
 
commiter.store stores path and corresponding node’s blob.
It encodes the node using RLP to get blob, then stores it in NodeSet with node path as key, node hash and blob as value. If collectLeaf is true meaning to also collect leaf nodes, it will check whether the node is shortNode(only shortNode may have valueNode stored) and the child node is valueNode, if it is, then stores the valueNode in NodeSet.Leaves array.
Note if the node’s hash is nil and it is recorded in accessList(loaded from database), record it to be deleted. This may happen when the shortNode's child node is a valueNode, and resolved from database before. But checking this, it make sure the embedded nodes will be deleted from database. (But current geth implementation wont store embedded nodes into database, so I guess this code may be an upgrade helps to delete embedded nodes in database which are stored due to previous geth implementation )
/// go-ethereum/trie/committer.go // store hashes the node n and adds it to the modified nodeset. If leaf collection // is enabled, leaf nodes will be tracked in the modified nodeset as well. func (c *committer) store(path []byte, n node) node { // Larger nodes are replaced by their hash and stored in the database. var hash, _ = n.cache() // This was not generated - must be a small node stored in the parent. // In theory, we should check if the node is leaf here (embedded node // usually is leaf node). But small value (less than 32bytes) is not // our target (leaves in account trie only). if hash == nil { // The node is embedded in its parent, in other words, this node // will not be stored in the database independently, mark it as // deleted only if the node was existent in database before. _, ok := c.tracer.accessList[string(path)] if ok { c.nodes.AddNode(path, trienode.NewDeleted()) } return n } // Collect the dirty node to nodeset for return. nhash := common.BytesToHash(hash) c.nodes.AddNode(path, trienode.New(nhash, nodeToBytes(n))) // Collect the corresponding leaf node if it's required. We don't check // full node since it's impossible to store value in fullNode. The key // length of leaves should be exactly same. if c.collectLeaf { if sn, ok := n.(*shortNode); ok { if val, ok := sn.Val.(valueNode); ok { c.nodes.AddLeaf(nhash, val) } } } return hash }
/// go-ethereum/trie/trienode/node.go type Node struct { Hash common.Hash // Node hash, empty for deleted node Blob []byte // Encoded node blob, nil for the deleted node } // New constructs a node with provided node information. func New(hash common.Hash, blob []byte) *Node { return &Node{Hash: hash, Blob: blob} } // AddNode adds the provided node into set. func (set *NodeSet) AddNode(path []byte, n *Node) { if n.IsDeleted() { set.deletes += 1 } else { set.updates += 1 } set.Nodes[string(path)] = n } type NodeSet struct { Owner common.Hash Leaves []*leaf Nodes map[string]*Node updates int // the count of updated and inserted nodes deletes int // the count of deleted nodes } // AddLeaf adds the provided leaf node into set. TODO(rjl493456442) how can // we get rid of it? func (set *NodeSet) AddLeaf(parent common.Hash, blob []byte) { set.Leaves = append(set.Leaves, &leaf{Blob: blob, Parent: parent}) }
/// go-ethereum/trie/node_enc.go func nodeToBytes(n node) []byte { w := rlp.NewEncoderBuffer(nil) n.encode(w) result := w.ToBytes() w.Flush() return result }

Commit fullNode

When dealing with committing fullNode :
  • first, commit all child nodes to get corresponding hashNode array.
  • second, store node’s path, hash and blob into NodeSet
  • finally, return hashNode
/// go-ethereum/trie/committer.go case *fullNode: /// commit all child nodes to get corresponding hashNode array. hashedKids := c.commitChildren(path, cn) collapsed := cn.copy() collapsed.Children = hashedKids // store path and node into NodeSet hashedNode := c.store(path, collapsed) if hn, ok := hashedNode.(hashNode); ok { return hn } return collapsed
 
committer.commitChildren recursively commits 16 child nodes of fullNode to get each hashNode.
For the 17th child, it only can be valueNode or nil, so there is no need to commit it separately, it is embededd in the fullNode.
/// go-ethereum/trie/committer.go // commitChildren commits the children of the given fullnode func (c *committer) commitChildren(path []byte, n *fullNode) [17]node { var children [17]node for i := 0; i < 16; i++ { child := n.Children[i] if child == nil { continue } // If it's the hashed child, save the hash value directly. // Note: it's impossible that the child in range [0, 15] // is a valueNode. if hn, ok := child.(hashNode); ok { children[i] = hn continue } // Commit the child recursively and store the "hashed" value. // Note the returned node can be some embedded nodes, so it's // possible the type is not hashNode. children[i] = c.commit(append(path, byte(i)), child) } // For the 17th child, it's possible the type is valuenode. if n.Children[16] != nil { children[16] = n.Children[16] } return children }

Prove data existence/inexistence

Overview

One of the big advantages of MPT is the easiness to prove certain data exsitence of a Trie. This is quite useful for light client to get confidence of certain transaction has been recorded in blockchain.
 

Generate Prove

Trie.Proveis to construct a Merkle proof for a given key in the trie. This proof includes all the encoded nodes along the path to the value associated with the key. If the key is present in the trie, the proof will end with the node containing the value. If the key is not present, the proof will end with the node that demonstrates the absence of the key, including all nodes of the longest existing prefix.
Inside the Trie.Prove:
  • first, get all nodes along the path to the value associated with the key
  • second, loop those nodes to get corresponding node’s hash and encoded data of the collapsed node ( Note that collapsed node’s child is hashNode). Record each node’s hash and collapsed node blob into proofDb for later proof verification.
/// go-ethereum/trie/proof.go // Prove constructs a merkle proof for key. The result contains all encoded nodes // on the path to the value at key. The value itself is also included in the last // node and can be retrieved by verifying the proof. // // If the trie does not contain a value for key, the returned proof contains all // nodes of the longest existing prefix of the key (at least the root node), ending // with the node that proves the absence of the key. func (t *Trie) Prove(key []byte, proofDb ethdb.KeyValueWriter) error { // Short circuit if the trie is already committed and not usable. if t.committed { return ErrCommitted } // Collect all nodes on the path to key. var ( prefix []byte nodes []node tn = t.root ) key = keybytesToHex(key) for len(key) > 0 && tn != nil { switch n := tn.(type) { case *shortNode: if len(key) < len(n.Key) || !bytes.Equal(n.Key, key[:len(n.Key)]) { // The trie doesn't contain the key. tn = nil } else { tn = n.Val prefix = append(prefix, n.Key...) key = key[len(n.Key):] } // even if the node doesn't exsit, it also stores the longest node along the path // into proofDb. So later we can prove the inexistence of this key. nodes = append(nodes, n) case *fullNode: tn = n.Children[key[0]] prefix = append(prefix, key[0]) key = key[1:] nodes = append(nodes, n) case hashNode: // Retrieve the specified node from the underlying node reader. // trie.resolveAndTrack is not used since in that function the // loaded blob will be tracked, while it's not required here since // all loaded nodes won't be linked to trie at all and track nodes // may lead to out-of-memory issue. blob, err := t.reader.node(prefix, common.BytesToHash(n)) if err != nil { log.Error("Unhandled trie error in Trie.Prove", "err", err) return err } // The raw-blob format nodes are loaded either from the // clean cache or the database, they are all in their own // copy and safe to use unsafe decoder. tn = mustDecodeNodeUnsafe(n, blob) default: panic(fmt.Sprintf("%T: invalid node: %v", tn, tn)) } } hasher := newHasher(false) defer returnHasherToPool(hasher) for i, n := range nodes { var hn node n, hn = hasher.proofHash(n) if hash, ok := hn.(hashNode); ok || i == 0 { // If the node's database encoding is a hash (or is the // root node), it becomes a proof element. enc := nodeToBytes(n) if !ok { hash = hasher.hashData(enc) } proofDb.Put(hash, enc) } } return nil }
 

Verify Proof

In the verification process, it starts from the root node, decodes the node from the blob recorded in proofDb, and try to reach the target node. In this process, it will recursively try to fetch the child node’ blob from proofDb according to the hash stored in parent node and decode it. If there is error, for example there is no matching blob in proofDb, the verification fails.
VerifyProof doesn’t check the hash and blob matches, in practice, it should be checked.
 
Verify node inexsitence:
In the proof generation process, if the key is not present, the proof will end with the node that demonstrates the absence of the key, including all nodes of the longest existing prefix. Storing the longest existing prefix is to make sure the Trie information stored in proofDb is enough to prove target key doesn’t exist in the Trie.
/// go-ethereum/trie/proof.go // VerifyProof checks merkle proofs. The given proof must contain the value for // key in a trie with the given root hash. VerifyProof returns an error if the // proof contains invalid trie nodes or the wrong value. func VerifyProof(rootHash common.Hash, key []byte, proofDb ethdb.KeyValueReader) (value []byte, err error) { key = keybytesToHex(key) wantHash := rootHash for i := 0; ; i++ { buf, _ := proofDb.Get(wantHash[:]) if buf == nil { return nil, fmt.Errorf("proof node %d (hash %064x) missing", i, wantHash) } n, err := decodeNode(wantHash[:], buf) if err != nil { return nil, fmt.Errorf("bad proof node %d: %v", i, err) } keyrest, cld := get(n, key, true) switch cld := cld.(type) { case nil: // The trie doesn't contain the key. return nil, nil case hashNode: key = keyrest copy(wantHash[:], cld) case valueNode: return cld, nil } } }
/// go-ethereum/trie/proof.go // get returns the child of the given node. Return nil if the // node with specified key doesn't exist at all. // // There is an additional flag `skipResolved`. If it's set then // all resolved nodes won't be returned. func get(tn node, key []byte, skipResolved bool) ([]byte, node) { for { switch n := tn.(type) { case *shortNode: if len(key) < len(n.Key) || !bytes.Equal(n.Key, key[:len(n.Key)]) { return nil, nil } tn = n.Val key = key[len(n.Key):] if !skipResolved { return key, tn } case *fullNode: tn = n.Children[key[0]] key = key[1:] if !skipResolved { return key, tn } case hashNode: return key, n case nil: return key, nil case valueNode: return nil, n default: panic(fmt.Sprintf("%T: invalid node: %v", tn, tn)) } } }
 
For example, to prove the valueNode#1 in the Trie below, we first call Trie.Prove to generate proof. The proofDb will contain three nodes’ hash and blob including ShortNode#A, FullNode#A, ShortNode#B.
 
notion image
 

Encode Node

nodeToBytes is used to encode node and return its blob(blob refers to the serialized form of a node). It calls rlp.NewEncoderBuffer to new a EncoderBuffer to help encode.
For shortNode, it encodes its key(compact encoding) and child node.
For fullNode, it encodes all 17 child nodes.
/// go-ethereum/trie/node_enc.go func nodeToBytes(n node) []byte { w := rlp.NewEncoderBuffer(nil) n.encode(w) result := w.ToBytes() w.Flush() return result } func (n *fullNode) encode(w rlp.EncoderBuffer) { offset := w.List() for _, c := range n.Children { if c != nil { c.encode(w) } else { w.Write(rlp.EmptyString) } } w.ListEnd(offset) } func (n *shortNode) encode(w rlp.EncoderBuffer) { offset := w.List() w.WriteBytes(n.Key) if n.Val != nil { n.Val.encode(w) } else { w.Write(rlp.EmptyString) } w.ListEnd(offset) } func (n hashNode) encode(w rlp.EncoderBuffer) { w.WriteBytes(n) } func (n valueNode) encode(w rlp.EncoderBuffer) { w.WriteBytes(n) } func (n rawNode) encode(w rlp.EncoderBuffer) { w.Write(n) }

Decode Node

When geth encounters hashNode during read operation, it will first load the blob of the hashNode according to the node prefix and hash, then decode it.
/// go-ethereum/trie/trie.go // resolveAndTrack loads node from the underlying store with the given node hash // and path prefix and also tracks the loaded node blob in tracer treated as the // node's original value. The rlp-encoded blob is preferred to be loaded from // database because it's easy to decode node while complex to encode node to blob. func (t *Trie) resolveAndTrack(n hashNode, prefix []byte) (node, error) { blob, err := t.reader.node(prefix, common.BytesToHash(n)) if err != nil { return nil, err } t.tracer.onRead(prefix, blob) return mustDecodeNode(n, blob), nil } /// go-ethereum/trie/node.go // mustDecodeNode is a wrapper of decodeNode and panic if any error is encountered. func mustDecodeNode(hash, buf []byte) node { n, err := decodeNode(hash, buf) if err != nil { panic(fmt.Sprintf("node %x: %v", hash, err)) } return n }
 
Inside the decodeNodeUnsafe, it counts the number of encoded values in the node blob, according to which it decides whether the node is a shortNode or fullNode. Then it calls corresponding decode method to decode the node.
/// go-ethereum/trie/node.go // decodeNode parses the RLP encoding of a trie node. It will deep-copy the passed // byte slice for decoding, so it's safe to modify the byte slice afterwards. The- // decode performance of this function is not optimal, but it is suitable for most // scenarios with low performance requirements and hard to determine whether the // byte slice be modified or not. func decodeNode(hash, buf []byte) (node, error) { return decodeNodeUnsafe(hash, common.CopyBytes(buf)) } // decodeNodeUnsafe parses the RLP encoding of a trie node. The passed byte slice // will be directly referenced by node without bytes deep copy, so the input MUST // not be changed after. func decodeNodeUnsafe(hash, buf []byte) (node, error) { if len(buf) == 0 { return nil, io.ErrUnexpectedEOF } elems, _, err := rlp.SplitList(buf) if err != nil { return nil, fmt.Errorf("decode error: %v", err) } switch c, _ := rlp.CountValues(elems); c { case 2: n, err := decodeShort(hash, elems) return n, wrapError(err, "short") case 17: n, err := decodeFull(hash, elems) return n, wrapError(err, "full") default: return nil, fmt.Errorf("invalid number of list elements: %v", c) } }

Key encodings in MPT

KEYBYTES encoding

Description:
  • KEYBYTES encoding contains the actual key and nothing else. This is the raw byte representation of the key.
  • This encoding is the input to most API functions and represents the most straightforward form of the key.
Usage:
  • Used for interactions with the trie API.
  • Represents the plain byte sequence of the key without any additional formatting.
Example:
  • A key 0x1234 in KEYBYTES encoding is simply the byte array [0x12, 0x34].

HEX encoding

Description:
  • HEX encoding contains one byte for each nibble (half-byte) of the key. It may include a trailing terminator byte (0x10) to indicate whether the node at the key contains a value.
  • This encoding is convenient for nodes loaded in memory because it allows easy access and manipulation of individual nibbles.
Usage:
  • Used internally in the trie for easy manipulation of keys.
  • The terminator byte helps distinguish nodes that contain values.
Example:
  • A key 0x1234 in HEX encoding would be [0x01, 0x02, 0x03, 0x04].
  • If this key represents a node containing a value, it would be [0x01, 0x02, 0x03, 0x04, 0x10].

Hex prefix encoding (Compact encoding)

Description:
  • COMPACT encoding, also known as "hex prefix encoding," compresses the key's nibbles into bytes and includes a flag byte at the beginning.
  • The flag byte indicates whether the length of the key is odd or even and whether the node at the key is a value node.
  • The high nibble of the first byte contains the flags:
    • The lowest bit of the high nibble encodes the oddness of the length.
    • The second-lowest bit indicates whether the node at the key is a value node.
  • The low nibble of the first byte is zero for even-length keys or contains the first nibble for odd-length keys.
  • The remaining nibbles are packed into the following bytes.
Usage:
  • Used for storing nodes on disk due to its space efficiency.
  • This encoding is defined by the Ethereum Yellow Paper.
Example:
  • A key 0x1234 with a value would be [0x20, 0x12, 0x34] in COMPACT encoding.
    • 0x20 indicates an even-length key with a value node. [0010 0000]
  • An odd-length key 0x123 would be [0x31, 0x23] in COMPACT encoding.
    • 0x31 indicates an odd-length key with a value node, and includes the first nibble 0x1. [0011 0x1]
In the yellow paper, the t represents whether the key represents a value node. For value node’s key, the second lowest bit of the first nibble in the first byte will be 1.
notion image
notion image