Appendix

Predict Node’s Self Address

Reasons for Predicting the Node's Current IP

  1. Public IP Is Used In Node Discovery
      • The predicted public IP address and port are added to the node's Ethereum Node Record (ENR). This record includes various pieces of information about the node, such as its public key, IP address, port, and other relevant metadata.
      • The ENR is periodically signed and updated to reflect the latest information about the node.
      • The updated ENR is broadcast to other nodes in the network using the Node Discovery Protocol (specifically, the Ethereum Discovery Protocol v4 or v5).
      • Other nodes can query the ENR to find out the public IP address and port of the node, allowing them to initiate direct connections.
  1. Node’s Public IP May Change
    1. NAT Traversal:
        • NAT Devices: NAT devices (such as routers) map private IP addresses to a public IP address. The node itself only knows its private IP address and not the public IP address assigned by the NAT device.
        • Public IP: The node needs to know its public IP address to communicate with external nodes effectively. The NAT device translates the private IP to a public IP and vice versa.
    2. Dynamic IP Addresses:
        • ISP Changes: Internet Service Providers (ISPs) often change the public IP address assigned to a home router. This dynamic assignment makes it difficult for the node to reliably know its public IP.
        • Update Frequency: Even if the node queries the public IP from an external service, the IP can change, necessitating frequent updates.

How Prediction Works

  1. Statements from External Hosts:
      • IP Statements: When the node makes an outbound connection to another node, the receiving node can see the public IP address and port from which the connection originates. Other nodes in the network can observe the node's public IP and port and make statements about it. These statements are collected and used to predict the node's public endpoint.
      • Consistency Check: By collecting multiple statements, the node can determine the most consistent public IP and port, filtering out temporary or incorrect observations.
  1. IP Tracker:
      • IPTracker: The IPTracker component records these statements and uses them to predict the node's public endpoint.
      • Algorithm: The prediction algorithm considers the number of consistent statements, the timing of the statements, and the network conditions to provide an accurate prediction.
  1. Fallback Mechanisms:
      • Static and Fallback IP: The node can be configured with static or fallback IP addresses and ports, which are used if prediction fails.
      • Endpoint Prediction: The predictAddr function utilizes the statements to predict the node's current public IP and port.

Predict Process

Here is how the node uses the prediction in practice:
  • Receiving Statements: Other nodes send statements about the observed public IP and port.
  • Updating Tracker: These statements are recorded by the IPTracker.
  • Predicting Endpoint: When needed, the node calls predictAddr to get the current public IP and port based on the collected statements.
  • Updating ENR: The node updates its Ethereum Node Record (ENR) with the predicted public endpoint, ensuring other nodes can connect to it.

Code

LocalNode.updateEndpoints handles IP prediction update.
Step:
  1. call endpoint.get to get the predicted external endpoint (IP and Port)
  1. if there is a predicted one, then update it in ENR. Else delete previous one stored in LocalNode.entries.
In the LocalNode.get, the external endpoint choice order is: 1.the one static. 2. the one predicted. 3. fallback.
Note that ln.set and ln.delete calls ln.invalidate to clear stored Node. Because set and delete methods modify the entries that constitute the node record. These modifications mean that the current version of the node record (ln.cur) is no longer valid because the entries that define it have changed.
/// ---p2p/enode/localnode.go--- // updateEndpoints updates the record with predicted endpoints. func (ln *LocalNode) updateEndpoints() { ip4, udp4 := ln.endpoint4.get() ip6, udp6 := ln.endpoint6.get() if ip4 != nil && !ip4.IsUnspecified() { ln.set(enr.IPv4(ip4)) } else { ln.delete(enr.IPv4{}) } if ip6 != nil && !ip6.IsUnspecified() { ln.set(enr.IPv6(ip6)) } else { ln.delete(enr.IPv6{}) } if udp4 != 0 { ln.set(enr.UDP(udp4)) } else { ln.delete(enr.UDP(0)) } if udp6 != 0 && udp6 != udp4 { ln.set(enr.UDP6(udp6)) } else { ln.delete(enr.UDP6(0)) } } // get returns the endpoint with highest precedence. func (e *lnEndpoint) get() (newIP net.IP, newPort uint16) { newPort = e.fallbackUDP if e.fallbackIP != nil { newIP = e.fallbackIP } if e.staticIP != nil { newIP = e.staticIP } else if ip, port := predictAddr(e.track); ip != nil { newIP = ip newPort = port } return newIP, newPort } // store Entry in ln.entries. func (ln *LocalNode) set(e enr.Entry) { val, exists := ln.entries[e.ENRKey()] if !exists || !reflect.DeepEqual(val, e) { ln.entries[e.ENRKey()] = e ln.invalidate() } } // delete previous Entry stored in ln.entries. func (ln *LocalNode) delete(e enr.Entry) { _, exists := ln.entries[e.ENRKey()] if exists { delete(ln.entries, e.ENRKey()) ln.invalidate() } } func (ln *LocalNode) invalidate() { ln.cur.Store((*Node)(nil)) } /// ---p2p/enr/entries.go--- type Entry interface { ENRKey() string } /// ---go/src/net/ip.go--- // IsUnspecified reports whether ip is an unspecified address, either // the IPv4 address "0.0.0.0" or the IPv6 address "::". func (ip IP) IsUnspecified() bool { return ip.Equal(IPv4zero) || ip.Equal(IPv6unspecified) }
 
predictAddr calls IPTracker.PredictEndpoint to predict ep(external endpoint: IP and port of the node) in string form. Then parse it to get IP and port.
/// ---p2p/enode/localnode.go--- // predictAddr wraps IPTracker.PredictEndpoint, converting from its string-based // endpoint representation to IP and port types. func predictAddr(t *netutil.IPTracker) (net.IP, uint16) { ep := t.PredictEndpoint() if ep == "" { return nil, 0 } ipString, portString, _ := net.SplitHostPort(ep) ip := net.ParseIP(ipString) port, err := strconv.ParseUint(portString, 10, 16) if err != nil { return nil, 0 } return ip, uint16(port) }
 
PredictEndpoint first cleans old exter endpoint statements (IP and Port) based on the valid time window setting (garbage collection). Then it counts the endpoint that appears the most frequently, makes it the next predicted IP/Port and returns to caller.
/// ---p2p/netutil/iptrack.go--- // PredictEndpoint returns the current prediction of the external endpoint. func (it *IPTracker) PredictEndpoint() string { it.gcStatements(it.clock.Now()) // The current strategy is simple: find the endpoint with most statements. counts := make(map[string]int, len(it.statements)) maxcount, max := 0, "" for _, s := range it.statements { c := counts[s.endpoint] + 1 counts[s.endpoint] = c if c > maxcount && c >= it.minStatements { maxcount, max = c, s.endpoint } } return max } // IPTracker predicts the external endpoint, i.e. IP address and port, of the local host // based on statements made by other hosts. type IPTracker struct { window time.Duration contactWindow time.Duration minStatements int clock mclock.Clock statements map[string]ipStatement contact map[string]mclock.AbsTime lastStatementGC mclock.AbsTime lastContactGC mclock.AbsTime } func (it *IPTracker) gcStatements(now mclock.AbsTime) { it.lastStatementGC = now cutoff := now.Add(-it.window) for host, s := range it.statements { if s.time < cutoff { delete(it.statements, host) } } }

Why Register for NAT Port Mapping Only for Public Addresses?

  1. NAT Relevance:
      • NAT (Network Address Translation) is a technique used to allow devices on a private network to communicate with devices on the public internet. NAT routers modify the source and/or destination IP addresses in packets to enable this communication.
      • Port mapping (or port forwarding) is a specific feature of NAT that allows incoming traffic on a specific port to be directed to a particular device and port within the private network.
  1. Purpose of Port Mapping:
      • Port mapping is necessary when you want external devices to initiate communication with a device behind a NAT router. This is typical for services that need to be publicly accessible, like web servers or, in this case, Ethereum nodes.
  1. Loopback and Private Addresses:
      • A loopback address is used to refer to the local machine. When a service is bound to a loopback address, it can only be accessed from the same machine. This is often used for development or when the service is meant to be isolated from external access for security reasons.Example: If your Ethereum node is configured with the address 127.0.0.1:30303, it means only applications on the same machine can connect to it. So the node is local to the host and do not go through the NAT router, thus no port mapping needed.
      • Private IP addresses are used within a local network. These addresses are not routable on the public internet, meaning devices with these addresses cannot be accessed directly from outside the local network. Example: If Ethereum node is configured with the address 192.168.1.5:30303, it means it is accessible only to other devices within the same local network (e.g., home or office network).
  1. Public Addresses:
      • If the server has a public IP address, it is accessible from other networks. However, in many configurations, the device itself may still be behind a NAT router (e.g., in a data center or ISP setup), and explicit port mapping might still be required to ensure that incoming connections on specific ports are correctly routed to the server.

Elliptic Curve Diffie-Hellman (ECDH)

Elliptic Curve Diffie-Hellman (ECDH) is a key exchange protocol based on elliptic curve cryptography. It allows two parties to establish a shared secret over an insecure channel, which can then be used to encrypt subsequent communications.

How ECDH Works:

  1. Key Generation:
      • Each party generates an elliptic curve key pair (a private key and a corresponding public key).
  1. Key Exchange:
      • The parties exchange their public keys over an insecure channel.
  1. Shared Secret Computation:
      • Each party uses their own private key and the received public key to compute a shared secret.

Detailed Example:

Let's assume two parties, Alice and Bob, want to establish a shared secret using ECDH.
1. Key Generation
  • Alice generates a key pair:
    • Private key: a
    • Public key: A = a * G (where G is the base point on the elliptic curve)
  • Bob generates a key pair:
    • Private key: b
    • Public key: B = b * G
2. Key Exchange
  • Alice sends her public key A to Bob.
  • Bob sends his public key B to Alice.
3. Shared Secret Computation
  • Alice computes the shared secret using her private key a and Bob's public key B:
    • Shared secret: S = a * B = a * (b * G)
  • Bob computes the shared secret using his private key b and Alice's public key A:
    • Shared secret: S = b * A = b * (a * G)
Both computations yield the same shared secret S because of the commutative property of elliptic curve point multiplication.
 

AES & MAC

AES (Advanced Encryption Standard) and MAC (Message Authentication Code) are cryptographic algorithms used within various security protocols to ensure the confidentiality, integrity, and authenticity of data.

AES (Advanced Encryption Standard)

AES is a symmetric encryption algorithm used to protect data by converting it into an unreadable format, which can only be reverted to its original form using a secret key.
  • Key Features:
    • Symmetric Encryption: Uses the same key for encryption and decryption.
    • Block Cipher: Encrypts data in fixed-size blocks (128 bits).
    • Key Sizes: Supports key sizes of 128, 192, and 256 bits.

MAC (Message Authentication Code)

MAC is a cryptographic technique used to ensure the integrity and authenticity of a message. It involves creating a small, fixed-size piece of data (the MAC) from a message and a secret key.
  • Key Features:
    • Integrity: Ensures that the message has not been altered.
    • Authenticity: Verifies that the message was sent by someone who knows the secret key.
    • Common Algorithms: HMAC (Hash-based MAC), CMAC (Cipher-based MAC).

Example Workflow in TLS

1. Initial Handshake
  • Key Exchange: Client and server agree on a shared secret using asymmetric cryptography (e.g., ECDH).
  • Session Key Generation: Derive AES and MAC keys from the shared secret.
2. Data Encryption and Integrity
  • Data Encryption: AES is used to encrypt the plaintext data before transmission.
  • MAC Calculation: A MAC is calculated over the ciphertext and some additional data using HMAC.
  • Message Transmission: The encrypted data and its MAC are sent to the recipient.
3. Data Decryption and Verification
  • MAC Verification: The recipient calculates the MAC of the received data and compares it to the received MAC to ensure integrity and authenticity.
  • Data Decryption: If the MAC is valid, the recipient decrypts the data using AES to retrieve the original plaintext.

Code Example

package main import ( "crypto/aes" "crypto/cipher" "crypto/hmac" "crypto/rand" "crypto/sha256" "encoding/hex" "fmt" "io" "log" ) // Encrypt the plaintext using AES and returns the ciphertext along with the IV func encryptAES(plaintext, key []byte) (ciphertext, iv []byte, err error) { block, err := aes.NewCipher(key) if err != nil { return nil, nil, err } iv = make([]byte, aes.BlockSize) if _, err := io.ReadFull(rand.Reader, iv); err != nil { return nil, nil, err } stream := cipher.NewCTR(block, iv) ciphertext = make([]byte, len(plaintext)) stream.XORKeyStream(ciphertext, plaintext) return ciphertext, iv, nil } // GenerateHMAC generates a HMAC using SHA-256 func generateHMAC(message, key []byte) []byte { mac := hmac.New(sha256.New, key) mac.Write(message) return mac.Sum(nil) } // VerifyHMAC verifies the HMAC func verifyHMAC(message, key, receivedMAC []byte) bool { expectedMAC := generateHMAC(message, key) return hmac.Equal(expectedMAC, receivedMAC) } func main() { // AES key (must be 16, 24, or 32 bytes long) aesKey := []byte("thisisaverysecretkey1234") // 24 bytes // HMAC key (can be of any length, but ideally should be of sufficient length for security) hmacKey := []byte("anotherverysecretkey123") // The message to be encrypted plaintext := []byte("This is a secret message.") // Encrypt the plaintext ciphertext, iv, err := encryptAES(plaintext, aesKey) if err != nil { log.Fatalf("Encryption failed: %v", err) } // Generate HMAC for the ciphertext hmac := generateHMAC(ciphertext, hmacKey) fmt.Printf("Plaintext: %s\n", plaintext) fmt.Printf("Ciphertext: %s\n", hex.EncodeToString(ciphertext)) fmt.Printf("IV: %s\n", hex.EncodeToString(iv)) fmt.Printf("HMAC: %s\n", hex.EncodeToString(hmac)) // To simulate the decryption process: // Verify the HMAC if verifyHMAC(ciphertext, hmacKey, hmac) { fmt.Println("HMAC verified. Decrypting the message...") // Decrypt the ciphertext block, err := aes.NewCipher(aesKey) if err != nil { log.Fatalf("Decryption failed: %v", err) } decrypted := make([]byte, len(ciphertext)) stream := cipher.NewCTR(block, iv) stream.XORKeyStream(decrypted, ciphertext) fmt.Printf("Decrypted message: %s\n", decrypted) } else { fmt.Println("HMAC verification failed. Message may have been tampered with.") } }

Kademlia-like distributed hash table

The Kademlia protocol is designed for efficient node lookup in a distributed hash table. Nodes are identified by their IDs, each node maintains its own nodes table. The network is structured into "buckets" that store nodes based on the logarithmic distance from the local node. Each bucket covers a specific range of distances, helping to organize nodes in a way that makes searching and storing nodes efficient.
notion image
 
Usage in the Table In Node Discovery Process
The Table struct uses the buckets to manage and look up nodes. When a new node is seen, it's added to the appropriate bucket based on its XOR distance from the local node. If a bucket is full, the new node may be added to the replacements list, and nodes can be promoted to the main entries list if existing nodes become inactive.This structure ensures that the table remains balanced and can efficiently finds and manages nodes, adhering to the principles of the Kademlia protocol.
 
In Geth, there is 17 buckets(nBuckets), each bucket contains at most 3(bucketSize) connected nodes. It’s unlikely that a remote node is very close to the local node, so there is no need to create too many buckets in small distance range, Geth sets bucketMinDistance(239), nodes whose distance is smaller than this distance() should all be inserted into one single bucket(closest bucket). The possibility that one random node falls into the closest bucket is .
/// ---p2p/discover/table.go--- const ( alpha = 3 // Kademlia concurrency factor bucketSize = 16 // Kademlia bucket size maxReplacements = 10 // Size of per-bucket replacement list // We keep buckets for the upper 1/15 of distances because // it's very unlikely we'll ever encounter a node that's closer. hashBits = len(common.Hash{}) * 8 nBuckets = hashBits / 15 // Number of buckets bucketMinDistance = hashBits - nBuckets // Log distance of closest bucket // IP address limits. bucketIPLimit, bucketSubnet = 2, 24 // at most 2 addresses from the same /24 tableIPLimit, tableSubnet = 10, 24 copyNodesInterval = 30 * time.Second seedMinTableTime = 5 * time.Minute seedCount = 30 seedMaxAge = 5 * 24 * time.Hour ) // Table is the 'node table', a Kademlia-like index of neighbor nodes. The table keeps // itself up-to-date by verifying the liveness of neighbors and requesting their node // records when announcements of a new record version are received. type Table struct { // ... buckets [nBuckets]*bucket // index of known nodes by distance // ... } // bucket contains nodes, ordered by their last activity. the entry // that was most recently active is the first element in entries. type bucket struct { entries []*node // live entries, sorted by time of last contact replacements []*node // recently seen nodes to be used if revalidation fails ips netutil.DistinctNetSet index int }
 
bucket calculates which bucket the node should be inserted to according to the XOR distance between remote node and local node.
/// ---p2p/discover/table.go--- // bucket returns the bucket for the given node ID hash. func (tab *Table) bucket(id enode.ID) *bucket { d := enode.LogDist(tab.self().ID(), id) return tab.bucketAtDistance(d) } func (tab *Table) bucketAtDistance(d int) *bucket { if d <= bucketMinDistance { return tab.buckets[0] } return tab.buckets[d-bucketMinDistance-1] } /// ---p2p/enode/node.go--- // LogDist returns the logarithmic distance between a and b, log2(a ^ b). func LogDist(a, b ID) int { lz := 0 for i := range a { x := a[i] ^ b[i] if x == 0 { lz += 8 } else { lz += bits.LeadingZeros8(x) break } } return len(a)*8 - lz } /// ---go/src/math/bits/bits.go--- // LeadingZeros8 returns the number of leading zero bits in x; the result is 8 for x == 0. func LeadingZeros8(x uint8) int { return 8 - Len8(x) }