RLP

Introduction

RLP, or Recursive Length Prefix, is a serialization method used in Ethereum to encode structured data. RLP standardizes the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and make sure encoded data is uniquely decodable.

Ethereum uses it for several core purposes, including:

Encoding Transactions: RLP is used to encode transactions before they are sent to the Ethereum network. This includes the transaction’s nonce, gas price, gas limit, to address, value, data, and signature components.

Encoding Blocks: Blocks and block headers are also serialized using RLP when they are transmitted across the network or stored.

Account State: In Ethereum, the account state (nonce, balance, storage root, and code hash) is stored in the Ethereum state trie using RLP for serialization.

Definition of structures to be encoded:

From the definition, we can see the is arbitrarily nested bytes array.

Encoding Rule

In the encoding rule defined in the yellow paper, we can see the the idea of RLP is to use the first byte to distinguish different encoding type based on the data’s type and byte array length.

Byte array and nested byte array has different encoding scheme.

for byte array, RLP encodes the first byte based on the size of the byte array

if the byte array is small enough(), the first byte can contain the value itself.
if size is bigger( , the first byte will contain information of the length of byte array
if size is bigger (), the first byte will contain information of the length of byte array represents the length of to-encode byte array.

for nested byte array, RLP first encodes its child arrays and concatenate results, then get the final result based on the size of the concatenated byte array.

Code Analysis

Encode

Encode is the entry to encode value.

Inside the Encode, it basically

try to get encBuffer from w, if it fails to get, then get one through pool. encBuffer is a struct to store intermidiate value during encoding for later calculates final encode result.

use buf(encBuffer) to encode value where intermediate value will be stored in encBuffer struct

call buf.writeTo to integrate intermediate value and get encoding result.

Note: the function of encBuffer is to store intermidiate value calculated during the encoding, From the definition of RLP, we can see that when encoding data, the first byte is calculated based on the type and size of data, only when we first decides the size of data can we decide the first byte. In the encoding of nested array or struct, geth first encodes child array and then encodes parents array, so the encBuffer is to store child array’s encoded data based on which we can calculate parent array’s first byte and concatenate child array.


// go-ethereum/rlp/encode.go

// Encode writes the RLP encoding of val to w. Note that Encode may
// perform many small writes in some cases. Consider making w
// buffered.
//
// Please see package-level documentation of encoding rules.
func Encode(w io.Writer, val interface{}) error {
	// try to get encBuffer from w
	if buf := encBufferFromWriter(w); buf != nil {
		return buf.encode(val)
	}

	// if fails to get encBuffer from w, then get it from pool.
	buf := getEncBuffer()
	
	// after usage, return buf to pool for reuse.
	defer encBufferPool.Put(buf)
	
	// encode value
	if err := buf.encode(val); err != nil {
		return err
	}
	return buf.writeTo(w)
}

Get encBuffer from cache

Geth uses encBufferPool to reuse encBuffer. If it cant decode encBufferfrom w, geth gets it from pool which saves memory.


// go-ethereum/rlp/encbuffer.go

// EncoderBuffer is a buffer for incremental encoding.
//
// The zero value is NOT ready for use. To get a usable buffer,
// create it using NewEncoderBuffer or call Reset.
type EncoderBuffer struct {
	buf *encBuffer
	dst io.Writer

	ownBuffer bool
}

type encBuffer struct {
	str     []byte     // string data, contains everything except list headers
	lheads  []listhead // all list headers
	lhsize  int        // sum of sizes of all encoded list headers
	sizebuf [9]byte    // auxiliary buffer for uint encoding
}

// The global encBuffer pool.
var encBufferPool = sync.Pool{
	New: func() interface{} { return new(encBuffer) },
}

func getEncBuffer() *encBuffer {
	buf := encBufferPool.Get().(*encBuffer)
	buf.reset()
	return buf
}

func (buf *encBuffer) reset() {
	buf.lhsize = 0
	buf.str = buf.str[:0]
	buf.lheads = buf.lheads[:0]
}

func encBufferFromWriter(w io.Writer) *encBuffer {
	switch w := w.(type) {
	case EncoderBuffer:
		return w.buf
	case *EncoderBuffer:
		return w.buf
	case *encBuffer:
		return w
	default:
		return nil
	}
}

Load encoder from cache

Because RLP has different encoding scheme for different types. So Geth implements corresponding encoder and decoder for each type.

Inside encode, it gets corresponding writer(encoder) based on value’s type, and use the right write to encode.


// go-ethereum/rlp/encbuffer.go

func (buf *encBuffer) encode(val interface{}) error {
	rval := reflect.ValueOf(val)
	writer, err := cachedWriter(rval.Type())
	if err != nil {
		return err
	}
	return writer(rval, buf)
}

To save memory, geth uses cache to reuse encoders and decoders. In the cachedWriter, it construct the key of the encoder, and try to get it from cache, if there is no instance in cache, then it will generate a new one.


// go-ethereum/rlp/typecache.go

// typekey is the key of a type in typeCache. It includes the struct tags because
// they might generate a different decoder.
type typekey struct {
	reflect.Type
	rlpstruct.Tags
}

// typeinfo is an entry in the type cache.
type typeinfo struct {
	decoder    decoder
	decoderErr error // error from makeDecoder
	writer     writer
	writerErr  error // error from makeWriter
}

// the up-to-date map is stored in cur, next is a staging value used during update of map
type typeCache struct {
	cur atomic.Value

	// This lock synchronizes writers.
	mu   sync.Mutex
	next map[typekey]*typeinfo
}

var theTC = newTypeCache()

func newTypeCache() *typeCache {
	c := new(typeCache)
	c.cur.Store(make(map[typekey]*typeinfo))
	return c
}

func cachedWriter(typ reflect.Type) (writer, error) {
	info := theTC.info(typ)
	return info.writer, info.writerErr
}

Note typekey also contain Tags except Type of the value. This is because different struct type value can have different encoding scheme, so geth uses Tags to differentiate those structs’ different encoders/decoders.


// go-ethereum/rlp/internal/rlpstruct/rlpstruct.go

// Tags represents struct tags.
type Tags struct {
	// rlp:"nil" controls whether empty input results in a nil pointer.
	// nilKind is the kind of empty value allowed for the field.
	NilKind NilKind
	NilOK   bool

	// rlp:"optional" allows for a field to be missing in the input list.
	// If this is set, all subsequent fields must also be optional.
	Optional bool

	// rlp:"tail" controls whether this field swallows additional list elements. It can
	// only be set for the last field, which must be of slice type.
	Tail bool

	// rlp:"-" ignores fields.
	Ignored bool
}

Inside the typeCache.info, it tries to load from cache, if not exists, then generate a new one.


func (c *typeCache) info(typ reflect.Type) *typeinfo {
	key := typekey{Type: typ}
	if info := c.cur.Load().(map[typekey]*typeinfo)[key]; info != nil {
		return info
	}

	// Not in the cache, need to generate info for this type.
	return c.generate(typ, rlpstruct.Tags{})
}

Inside the generate, it first locks generate , then try to load data from cur, if the value doesn’t exist, then it will one and store it in the cur.


// go-ethereum/rlp/typecache.go

func (c *typeCache) generate(typ reflect.Type, tags rlpstruct.Tags) *typeinfo {
	c.mu.Lock()
	defer c.mu.Unlock()

	cur := c.cur.Load().(map[typekey]*typeinfo)
	if info := cur[typekey{typ, tags}]; info != nil {
		return info
	}

	// Copy cur to next.
	c.next = maps.Clone(cur)

	// Generate.
	info := c.infoWhileGenerating(typ, tags)

	// next -> cur
	c.cur.Store(c.next)
	c.next = nil
	return info
}

func (c *typeCache) infoWhileGenerating(typ reflect.Type, tags rlpstruct.Tags) *typeinfo {
	key := typekey{typ, tags}
	if info := c.next[key]; info != nil {
		return info
	}
	// Put a dummy value into the cache before generating.
	// If the generator tries to lookup itself, it will get
	// the dummy value and won't call itself recursively.
	info := new(typeinfo)
	c.next[key] = info
	info.generate(typ, tags)
	return info
}

func (i *typeinfo) generate(typ reflect.Type, tags rlpstruct.Tags) {
	i.decoder, i.decoderErr = makeDecoder(typ, tags)
	i.writer, i.writerErr = makeWriter(typ, tags)
}

makeDecoder and makeWriter basically return the corresponding encode/decode function according to values’ type and tags.


// go-ethereum/rlp/encode.go

// makeWriter creates a writer function for the given type.
func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) {
	kind := typ.Kind()
	switch {
	case typ == rawValueType:
		return writeRawValue, nil
	case typ.AssignableTo(reflect.PointerTo(bigInt)):
		return writeBigIntPtr, nil
	case typ.AssignableTo(bigInt):
		return writeBigIntNoPtr, nil
	case typ == reflect.PointerTo(u256Int):
		return writeU256IntPtr, nil
	case typ == u256Int:
		return writeU256IntNoPtr, nil
	case kind == reflect.Ptr:
		return makePtrWriter(typ, ts)
	case reflect.PointerTo(typ).Implements(encoderInterface):
		return makeEncoderWriter(typ), nil
	case isUint(kind):
		return writeUint, nil
	case kind == reflect.Bool:
		return writeBool, nil
	case kind == reflect.String:
		return writeString, nil
	case kind == reflect.Slice && isByte(typ.Elem()):
		return writeBytes, nil
	case kind == reflect.Array && isByte(typ.Elem()):
		return makeByteArrayWriter(typ), nil
	case kind == reflect.Slice || kind == reflect.Array:
		return makeSliceWriter(typ, ts)
	case kind == reflect.Struct:
		return makeStructWriter(typ)
	case kind == reflect.Interface:
		return writeInterface, nil
	default:
		return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ)
	}
}


// go-ethereum/rlp/decode.go

func makeDecoder(typ reflect.Type, tags rlpstruct.Tags) (dec decoder, err error) {
	kind := typ.Kind()
	switch {
	case typ == rawValueType:
		return decodeRawValue, nil
	case typ.AssignableTo(reflect.PointerTo(bigInt)):
		return decodeBigInt, nil
	case typ.AssignableTo(bigInt):
		return decodeBigIntNoPtr, nil
	case typ == reflect.PointerTo(u256Int):
		return decodeU256, nil
	case typ == u256Int:
		return decodeU256NoPtr, nil
	case kind == reflect.Ptr:
		return makePtrDecoder(typ, tags)
	case reflect.PointerTo(typ).Implements(decoderInterface):
		return decodeDecoder, nil
	case isUint(kind):
		return decodeUint, nil
	case kind == reflect.Bool:
		return decodeBool, nil
	case kind == reflect.String:
		return decodeString, nil
	case kind == reflect.Slice || kind == reflect.Array:
		return makeListDecoder(typ, tags)
	case kind == reflect.Struct:
		return makeStructDecoder(typ)
	case kind == reflect.Interface:
		return decodeInterface, nil
	default:
		return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ)
	}
}

Encode

After the encoder has been fetched, it just calls the writer to encode value, intermidiate data will be stored in encBuffer.


// go-ethereum/rlp/encbuffer.go

func (buf *encBuffer) encode(val interface{}) error {
	rval := reflect.ValueOf(val)
	writer, err := cachedWriter(rval.Type())
	if err != nil {
		return err
	}
	return writer(rval, buf)
}

Generate result

writeTo calcualtes the result encoded data according to intermidiate value stored in encBuffer.


// go-ethereum/rlp/encbuffer.go

// writeTo writes the encoder output to w.
func (buf *encBuffer) writeTo(w io.Writer) (err error) {
	strpos := 0
	for _, head := range buf.lheads {
		// write string data before header
		if head.offset-strpos > 0 {
			n, err := w.Write(buf.str[strpos:head.offset])
			strpos += n
			if err != nil {
				return err
			}
		}
		// write the header
		enc := head.encode(buf.sizebuf[:])
		if _, err = w.Write(enc); err != nil {
			return err
		}
	}
	if strpos < len(buf.str) {
		// write string data after the last list header
		_, err = w.Write(buf.str[strpos:])
	}
	return err
}

Encode Big int

writeBigIntNoPtr checks whether value is non-negative, RLP doesn’t support encode negative value.


// go-ethereum/rlp/encode.go

func writeBigIntNoPtr(val reflect.Value, w *encBuffer) error {
	i := val.Interface().(big.Int)
	if i.Sign() == -1 {
		return ErrNegativeBigInt
	}
	w.writeBigInt(&i)
	return nil
}

In the writeBigInt:

check whether its Uint64, if is, then calls buf.writeUint64 to encode

calculate the minimal length of data in byte array format

encode string header. (string header means the encoded data except original data)

copy the data itself into buf.str


// go-ethereum/rlp/encbuffer.go

type listhead struct {
	offset int // index of this header in string data
	size   int // total size of encoded data (including list headers)
}

type encBuffer struct {
	str     []byte     // string data, contains everything except list headers
	lheads  []listhead // all list headers
	lhsize  int        // sum of sizes of all encoded list headers
	sizebuf [9]byte    // auxiliary buffer for uint encoding
}

// writeBigInt writes i as an integer.
func (buf *encBuffer) writeBigInt(i *big.Int) {
	// get length of data in bit array format
	bitlen := i.BitLen()
	if bitlen <= 64 {
		buf.writeUint64(i.Uint64())
		return
	}
	
	// Integer is larger than 64 bits, encode from i.Bits().
	// The minimal byte length is bitlen rounded up to the next
	// multiple of 8, divided by 8.
	length := ((bitlen + 7) & -8) >> 3
	buf.encodeStringHeader(length)
	
	// allocate memory to copy the data itself into buf.str. Because i is of big.Int type,
	// so geth needs to convert the big.Int format to big-endian byte array 
	buf.str = append(buf.str, make([]byte, length)...)
	index := length
	bytesBuf := buf.str[len(buf.str)-length:]
	for _, d := range i.Bits() {
		for j := 0; j < wordBytes && index > 0; j++ {
			index--
			bytesBuf[index] = byte(d)
			d >>= 8
		}
	}
}

Inside the encodeStringHeader, it basically checks whether size is smaller than 56:

if is, then the header will be 0x80 + size (byte conversion is to make sure the result is a byte, avoid overflow)

if not, then the header will be 0xB7 + sizesize(minimal length of size in byte array) + byte array represents size of original data.


// go-ethereum/rlp/encbuffer.go

func (buf *encBuffer) encodeStringHeader(size int) {
	if size < 56 {
		buf.str = append(buf.str, 0x80+byte(size))
	} else {
		// putint copies the size into buffer in byte array format and return the minimal length of the 
		// byte array represents the size
		sizesize := putint(buf.sizebuf[1:], uint64(size))
		buf.sizebuf[0] = 0xB7 + byte(sizesize)
		
		// buf.sizebuf[1:sizesize+1] is byte array represents the size
		buf.str = append(buf.str, buf.sizebuf[:sizesize+1]...)
	}
}


// go-ethereum/rlp/encode.go

// putint writes i to the beginning of b in big endian byte
// order, using the least number of bytes needed to represent i.
func putint(b []byte, i uint64) (size int) {
	switch {
	case i < (1 << 8):
		b[0] = byte(i)
		return 1
	case i < (1 << 16):
		b[0] = byte(i >> 8)
		b[1] = byte(i)
		return 2
	case i < (1 << 24):
		b[0] = byte(i >> 16)
		b[1] = byte(i >> 8)
		b[2] = byte(i)
		return 3
	case i < (1 << 32):
		b[0] = byte(i >> 24)
		b[1] = byte(i >> 16)
		b[2] = byte(i >> 8)
		b[3] = byte(i)
		return 4
	case i < (1 << 40):
		b[0] = byte(i >> 32)
		b[1] = byte(i >> 24)
		b[2] = byte(i >> 16)
		b[3] = byte(i >> 8)
		b[4] = byte(i)
		return 5
	case i < (1 << 48):
		b[0] = byte(i >> 40)
		b[1] = byte(i >> 32)
		b[2] = byte(i >> 24)
		b[3] = byte(i >> 16)
		b[4] = byte(i >> 8)
		b[5] = byte(i)
		return 6
	case i < (1 << 56):
		b[0] = byte(i >> 48)
		b[1] = byte(i >> 40)
		b[2] = byte(i >> 32)
		b[3] = byte(i >> 24)
		b[4] = byte(i >> 16)
		b[5] = byte(i >> 8)
		b[6] = byte(i)
		return 7
	default:
		b[0] = byte(i >> 56)
		b[1] = byte(i >> 48)
		b[2] = byte(i >> 40)
		b[3] = byte(i >> 32)
		b[4] = byte(i >> 24)
		b[5] = byte(i >> 16)
		b[6] = byte(i >> 8)
		b[7] = byte(i)
		return 8
	}
}

Because in the big int encoding process, we doesn’t encounter usage of buf.lheads, so writeTo just write the data in buf.str which is just the final encoded data into writer. buf.lheads will be used when encodes nested array like struct.


// // go-ethereum/rlp/encbuffer.go

// writeTo writes the encoder output to w.
func (buf *encBuffer) writeTo(w io.Writer) (err error) {
	strpos := 0
	for _, head := range buf.lheads {
		// write string data before header
		if head.offset-strpos > 0 {
			n, err := w.Write(buf.str[strpos:head.offset])
			strpos += n
			if err != nil {
				return err
			}
		}
		// write the header
		enc := head.encode(buf.sizebuf[:])
		if _, err = w.Write(enc); err != nil {
			return err
		}
	}
	if strpos < len(buf.str) {
		// write string data after the last list header
		_, err = w.Write(buf.str[strpos:])
	}
	return err
}

Encode Struct

In the makeWriter, if type is struct, it wil call makeStructWriter to generate encoder.


// go-ethereum/rlp/encode.go

// makeWriter creates a writer function for the given type.
func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) {
	kind := typ.Kind()
	// ...
	case kind == reflect.Struct:
		return makeStructWriter(typ)
	// ...
	}
}

Inside the makeStructWriter:

get fields info in struct which includes index, encoder/decoder.


// go-ethereum/rlp/encode.go

type field struct {
	index    int
	info     *typeinfo
	optional bool
}

func makeStructWriter(typ reflect.Type) (writer, error) {
	fields, err := structFields(typ)
	if err != nil {
		return nil, err
	}
	for _, f := range fields {
		if f.info.writerErr != nil {
			return nil, structFieldError{typ, f.index, f.info.writerErr}
		}
	}

	var writer writer
	firstOptionalField := firstOptionalField(fields)
	if firstOptionalField == len(fields) {
		// This is the writer function for structs without any optional fields.
		writer = func(val reflect.Value, w *encBuffer) error {
			lh := w.list()
			for _, f := range fields {
				if err := f.info.writer(val.Field(f.index), w); err != nil {
					return err
				}
			}
			w.listEnd(lh)
			return nil
		}
	} else {
		// If there are any "optional" fields, the writer needs to perform additional
		// checks to determine the output list length.
		writer = func(val reflect.Value, w *encBuffer) error {
			lastField := len(fields) - 1
			for ; lastField >= firstOptionalField; lastField-- {
				if !val.Field(fields[lastField].index).IsZero() {
					break
				}
			}
			lh := w.list()
			for i := 0; i <= lastField; i++ {
				if err := fields[i].info.writer(val.Field(fields[i].index), w); err != nil {
					return err
				}
			}
			w.listEnd(lh)
			return nil
		}
	}
	return writer, nil
}

structFields resolves the typeinfo of all public fields in the struct type. Basically, it will get/generate corresponding typeinfo which contains writer based on the field type and tags in the struct. The resolved typeinfo(writer) will be used later to encode the field.


// go-ethereum/rlp/typecache.go

// typeinfo is an entry in the type cache.
type typeinfo struct {
	decoder    decoder
	decoderErr error // error from makeDecoder
	writer     writer
	writerErr  error // error from makeWriter
}

type field struct {
	index    int
	info     *typeinfo
	optional bool
}

// structFields resolves the typeinfo of all public fields in a struct type.
func structFields(typ reflect.Type) (fields []field, err error) {
	// Convert fields to rlpstruct.Field.
	var allStructFields []rlpstruct.Field
	for i := 0; i < typ.NumField(); i++ {
		rf := typ.Field(i)
		allStructFields = append(allStructFields, rlpstruct.Field{
			Name:     rf.Name,
			Index:    i,
			Exported: rf.PkgPath == "",
			Tag:      string(rf.Tag),
			Type:     *rtypeToStructType(rf.Type, nil),
		})
	}

	// Filter/validate fields.
	structFields, structTags, err := rlpstruct.ProcessFields(allStructFields)
	if err != nil {
		if tagErr, ok := err.(rlpstruct.TagError); ok {
			tagErr.StructType = typ.String()
			return nil, tagErr
		}
		return nil, err
	}

	// Resolve typeinfo.
	for i, sf := range structFields {
		typ := typ.Field(sf.Index).Type
		tags := structTags[i]
		info := theTC.infoWhileGenerating(typ, tags)
		fields = append(fields, field{sf.Index, info, tags.Optional})
	}
	return fields, nil
}

ProcessFields will validate fields according to tags and extract fields and corresponding tags needed to be encoded. Different tags of same type field lead to different encoding.


// go-ethereum/rlp/internal/rlpstruct/rlpstruct.go

// Field represents a struct field.
type Field struct {
	Name     string
	Index    int
	Exported bool
	Type     Type
	Tag      string
}

// NilKind is the RLP value encoded in place of nil pointers.
type NilKind uint8

const (
	NilKindString NilKind = 0x80
	NilKindList   NilKind = 0xC0
)

// Tags represents struct tags.
type Tags struct {
	// rlp:"nil" controls whether empty input results in a nil pointer.
	// nilKind is the kind of empty value allowed for the field.
	NilKind NilKind
	NilOK   bool

	// rlp:"optional" allows for a field to be missing in the input list.
	// If this is set, all subsequent fields must also be optional.
	Optional bool

	// rlp:"tail" controls whether this field swallows additional list elements. It can
	// only be set for the last field, which must be of slice type.
	Tail bool

	// rlp:"-" ignores fields.
	Ignored bool
}

// ProcessFields filters the given struct fields, returning only fields
// that should be considered for encoding/decoding.
func ProcessFields(allFields []Field) ([]Field, []Tags, error) {
	lastPublic := lastPublicField(allFields)

	// Gather all exported fields and their tags.
	var fields []Field
	var tags []Tags
	for _, field := range allFields {
		if !field.Exported {
			continue
		}
		
		// check whether field should be encoded according the type and tag
		ts, err := parseTag(field, lastPublic)
		if err != nil {
			return nil, nil, err
		}
		if ts.Ignored {
			continue
		}
		fields = append(fields, field)
		tags = append(tags, ts)
	}

	// Verify optional field consistency. If any optional field exists,
	// all fields after it must also be optional. Note: optional + tail
	// is supported.
	var anyOptional bool
	var firstOptionalName string
	for i, ts := range tags {
		name := fields[i].Name
		if ts.Optional || ts.Tail {
			if !anyOptional {
				firstOptionalName = name
			}
			anyOptional = true
		} else {
			if anyOptional {
				msg := fmt.Sprintf("must be optional because preceding field %q is optional", firstOptionalName)
				return nil, nil, TagError{Field: name, Err: msg}
			}
		}
	}
	return fields, tags, nil
}

// validate tag of field in struct and calculate corresponding tags.
func parseTag(field Field, lastPublic int) (Tags, error) {
	name := field.Name
	tag := reflect.StructTag(field.Tag)
	var ts Tags
	for _, t := range strings.Split(tag.Get("rlp"), ",") {
		switch t = strings.TrimSpace(t); t {
		case "":
			// empty tag is allowed for some reason
		case "-":
			ts.Ignored = true
		case "nil", "nilString", "nilList":
			ts.NilOK = true
			if field.Type.Kind != reflect.Ptr {
				return ts, TagError{Field: name, Tag: t, Err: "field is not a pointer"}
			}
			switch t {
			case "nil":
				ts.NilKind = field.Type.Elem.DefaultNilValue()
			case "nilString":
				ts.NilKind = NilKindString
			case "nilList":
				ts.NilKind = NilKindList
			}
		case "optional":
			ts.Optional = true
			if ts.Tail {
				return ts, TagError{Field: name, Tag: t, Err: `also has "tail" tag`}
			}
		case "tail":
			ts.Tail = true
			if field.Index != lastPublic {
				return ts, TagError{Field: name, Tag: t, Err: "must be on last field"}
			}
			if ts.Optional {
				return ts, TagError{Field: name, Tag: t, Err: `also has "optional" tag`}
			}
			if field.Type.Kind != reflect.Slice {
				return ts, TagError{Field: name, Tag: t, Err: "field type is not slice"}
			}
		default:
			return ts, TagError{Field: name, Tag: t, Err: "unknown tag"}
		}
	}
	return ts, nil
}

After geth have resolved all the fields including corresponding writer of each field, geth will construct writer for this kind of struct. Also the writer is based on optional field status. In struct, field cam be marked as optional which means this field can be omitted if the value is zero (depends on type). Geth checks whether those fields marked as “optional” is consistent (If any optional field exists, all fields after it must also be optional). In the encoding, geth locates last optional field which is not zero and omits zero fields after this field in the encoding.


// go-ethereum/rlp/encode.go

type field struct {
	index    int
	info     *typeinfo
	optional bool
}

func makeStructWriter(typ reflect.Type) (writer, error) {
	fields, err := structFields(typ)
	if err != nil {
		return nil, err
	}
	for _, f := range fields {
		if f.info.writerErr != nil {
			return nil, structFieldError{typ, f.index, f.info.writerErr}
		}
	}

	var writer writer
	firstOptionalField := firstOptionalField(fields)
	
	// there is no optional field
	if firstOptionalField == len(fields) {
		// This is the writer function for structs without any optional fields.
		writer = func(val reflect.Value, w *encBuffer) error {
			lh := w.list()
			for _, f := range fields {
				if err := f.info.writer(val.Field(f.index), w); err != nil {
					return err
				}
			}
			w.listEnd(lh)
			return nil
		}
	} else {
		// If there are any "optional" fields, the writer needs to perform additional
		// checks to determine the output list length.
		writer = func(val reflect.Value, w *encBuffer) error {
			lastField := len(fields) - 1
			// calculate the last optional field whose value is not zero.
			for ; lastField >= firstOptionalField; lastField-- {
				if !val.Field(fields[lastField].index).IsZero() {
					break
				}
			}
			
			// encode struct
			lh := w.list()
			for i := 0; i <= lastField; i++ {
				if err := fields[i].info.writer(val.Field(fields[i].index), w); err != nil {
					return err
				}
			}
			w.listEnd(lh)
			return nil
		}
	}
	return writer, nil
}

In the struct encoder, we can see that it first calls w.list, then loop fields to use corresponding encoder to encode each field, and finally calls w.listEnd.


// go-ethereum/rlp/encode.go

func makeStructWriter(typ reflect.Type) (writer, error) {
	writer = func(val reflect.Value, w *encBuffer) error {
		lh := w.list()
		for _, f := range fields {
			if err := f.info.writer(val.Field(f.index), w); err != nil {
				return err
			}
		}
		w.listEnd(lh)
		return nil
	}
}

list and listEnd are used to generate struct’s header information.

In the encoding of simple type data, like int, geth can simply calcualte the encoded data and append it into encBuffer.str. But when deal with struct, it become more complex. Because we can’t calculate the struct’ header before we have calculated struct’s elements’ data.

So in the encoding of struct data, RLP firsts encodes all element’s data, append those data into encBuffer.str, and also use list and listEnd to generate and record related information of the struct header for the calcualtion of struct header later.

Geth uses encBuffer.lheads to record essential information to calculate struct’s header. Struct’s field may also be struct, so we need to record the start position of struct’s elements’ data, plus all child structs’ head information of the struct, so that we can add the elements’ data’s byte length and all child structs’ header length to calcualte the parent struct’s data length, then calculate the correct header.

In the list, it appends a new lhead (list head) to encBuffer.lheads which records the start position of struct’s encoded data, also the current list header sizes.

After the encoding, we can calcualte the size of struct encoded data including all child struct’s header as:

is encoded data doesn’t includes header data.

is all child struct’s headers’ size.

The addition of both is the byte length of the struct’s elements rlp encoded data.


// go-ethereum/rlp/encbuffer.go

// list adds a new list header to the header stack. It returns the index of the header.
// Call listEnd with this index after encoding the content of the list.
func (buf *encBuffer) list() int {
	// len(buf.str) is the start position of this struct's rlp encoded data
	// buf.lhsize is the current size of all list headers
	buf.lheads = append(buf.lheads, listhead{offset: len(buf.str), size: buf.lhsize})
	return len(buf.lheads) - 1
}

func (buf *encBuffer) listEnd(index int) {
	lh := &buf.lheads[index]
	// calculate the byte array length of the struct elements' rlp encoded data.
	lh.size = buf.size() - lh.offset - lh.size

	// if data size is smaller than 56, then the header size is 1 byte
	if lh.size < 56 {
		buf.lhsize++ // length encoded into kind tag
	} else {
		// if data size is bigger than 56, then the header is 1 bytes plus minimal byte array reprensents the length of data
		buf.lhsize += 1 + intsize(uint64(lh.size))
	}
}

// size returns the length of the encoded data.
func (buf *encBuffer) size() int {
	return len(buf.str) + buf.lhsize
}

// intsize computes the minimum number of bytes required to store i.
func intsize(i uint64) (size int) {
	for size = 1; ; size++ {
		if i >>= 8; i == 0 {
			return size
		}
	}
}

The implementation of struct/list encoding is quite complexed, let’s use an exampl to illustrate this process, assume we want to encode struct:


{
	5,
	{
		10
	}
}

Geth first encodes the first field which is int 5, at this time, and fields of encBuffer will be :


str = [0x05]
lheads =[
	{
		offset:0,
		size:0
	}
]
lhsize = 0

After encodes the second struct field, fields of encBuffer:


str = [0x05,0x0A]
lheads =[
	{
		offset:0,
		size:0
	}
	{
		offset:1,
		size:1
	}
]
lhsize = 1

After the outer loop calls buf.listEnd() to update the outer struct’s hearder:


str = [0x05,0x0A]
lheads =[
	{
		offset:0,
		size:3
	}
	{
		offset:1,
		size:1
	}
]
lhsize = 2

Based on the list header, we can know that the child struct’ elements encoding is 0x0A, whose byte length is 1, so the header wil be 0xC2(0xC0(192) plus byte length 1), thus the child struct’s encoding is 0xC10A. The parent struct’s prefix will be 0xC3(0xC0 plus byte length 3), the encoding will be 0xC305C10A.

After writers having encode all fields, geth will call encBuffer.writeTo to integrate all intermidiate data(element’s data and list headers) to calculate final encoding of struct.

Inside writeTo, it essentially calcualtes each header’s encoding, and concatenate with element’s data to get the finally encoding of the struct.


// go-ethereum/rlp/encbuffer.go

// writeTo writes the encoder output to w.
func (buf *encBuffer) writeTo(w io.Writer) (err error) {
	strpos := 0
	for _, head := range buf.lheads {
		// write front non-list element's data before header
		if head.offset-strpos > 0 {
			n, err := w.Write(buf.str[strpos:head.offset])
			strpos += n
			if err != nil {
				return err
			}
		}
		// calculate the header's encoding, and write into w
		enc := head.encode(buf.sizebuf[:])
		if _, err = w.Write(enc); err != nil {
			return err
		}
	}
	if strpos < len(buf.str) {
		// write string data after the last list header
		_, err = w.Write(buf.str[strpos:])
	}
	return err
}


// go-ethereum/rlp/encode.go

// encode writes head to the given buffer, which must be at least
// 9 bytes long. It returns the encoded bytes.
func (head *listhead) encode(buf []byte) []byte {
	return buf[:puthead(buf, 0xC0, 0xF7, uint64(head.size))]
}

// puthead writes a list or string header to buf.
// buf must be at least 9 bytes long.
func puthead(buf []byte, smalltag, largetag byte, size uint64) int {
	if size < 56 {
		buf[0] = smalltag + byte(size)
		return 1
	}
	
	// size of the minimal byte array represents the size
	sizesize := putint(buf[1:], size)
	buf[0] = largetag + byte(sizesize)
	return sizesize + 1
}

Encode Array/Slice


// makeWriter creates a writer function for the given type.
func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) {
	kind := typ.Kind()
	// ...
	case kind == reflect.Slice || kind == reflect.Array:
		return makeSliceWriter(typ, ts)
	// ...
}

In the makeSliceWriter, the encoding is similar to struct(struct can be seen as array whose elements can be of different types). One difference is the Tail flag. If the slice/array is the last public field of a struct, it will have the Tail flag, then geth won’t computes its list header.

This is because we can calculate the byte length of encoded elements data of the array/slice based on the information of the parent struct’ list header and byte length of preceding fields’ encoded data. So this process actually make the RLP encoding more compact.


func makeSliceWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) {
	etypeinfo := theTC.infoWhileGenerating(typ.Elem(), rlpstruct.Tags{})
	if etypeinfo.writerErr != nil {
		return nil, etypeinfo.writerErr
	}

	var wfn writer
	if ts.Tail {
		// This is for struct tail slices.
		// w.list is not called for them.
		wfn = func(val reflect.Value, w *encBuffer) error {
			vlen := val.Len()
			for i := 0; i < vlen; i++ {
				if err := etypeinfo.writer(val.Index(i), w); err != nil {
					return err
				}
			}
			return nil
		}
	} else {
		// This is for regular slices and arrays.
		wfn = func(val reflect.Value, w *encBuffer) error {
			vlen := val.Len()
			if vlen == 0 {
				w.str = append(w.str, 0xC0)
				return nil
			}
			listOffset := w.list()
			for i := 0; i < vlen; i++ {
				if err := etypeinfo.writer(val.Index(i), w); err != nil {
					return err
				}
			}
			w.listEnd(listOffset)
			return nil
		}
	}
	return wfn, nil
}

Decode

Decode is the entry to decode RLP-encoded data. It uses struct Stream to help decode data.


// go-ethereum/rlp/decode.go

// Kind represents the kind of value contained in an RLP stream.
type Kind int8

const (
	Byte Kind = iota
	String
	List
)

// Stream can be used for piecemeal decoding of an input stream. This
// is useful if the input is very large or if the decoding rules for a
// type depend on the input structure. Stream does not keep an
// internal buffer. After decoding a value, the input reader will be
// positioned just before the type information for the next value.
//
// When decoding a list and the input position reaches the declared
// length of the list, all operations will return error EOL.
// The end of the list must be acknowledged using ListEnd to continue
// reading the enclosing list.
//
// Stream is not safe for concurrent use.
type Stream struct {
	r ByteReader

	remaining uint64   // number of bytes remaining to be read from r
	size      uint64   // size of value ahead
	kinderr   error    // error from last readKind
	stack     []uint64 // list sizes
	uintbuf   [32]byte // auxiliary buffer for integer decoding
	kind      Kind     // kind of value ahead
	byteval   byte     // value of single byte in type tag
	limited   bool     // true if input limit is in effect
}

streamPool = sync.Pool{
	New: func() interface{} { return new(Stream) },
}

// Decode parses RLP-encoded data from r and stores the result in the value pointed to by
// val. Please see package-level documentation for the decoding rules. Val must be a
// non-nil pointer.
//
// If r does not implement ByteReader, Decode will do its own buffering.
//
// Note that Decode does not set an input limit for all readers and may be vulnerable to
// panics cause by huge value sizes. If you need an input limit, use
//
//	NewStream(r, limit).Decode(val)
func Decode(r io.Reader, val interface{}) error {
	stream := streamPool.Get().(*Stream)
	defer streamPool.Put(stream)

	stream.Reset(r, 0)
	return stream.Decode(val)
}

Stream.Reset discards any information about the current decoding context and starts reading from r.


// go-ethereum/rlp/decode.go

// Reset discards any information about the current decoding context
// and starts reading from r. This method is meant to facilitate reuse
// of a preallocated Stream across many decoding operations.
//
// If r does not also implement ByteReader, Stream will do its own
// buffering.
func (s *Stream) Reset(r io.Reader, inputLimit uint64) {
	if inputLimit > 0 {
		s.remaining = inputLimit
		s.limited = true
	} else {
		// Attempt to automatically discover
		// the limit when reading from a byte slice.
		switch br := r.(type) {
		case *bytes.Reader:
			s.remaining = uint64(br.Len())
			s.limited = true
		case *bytes.Buffer:
			s.remaining = uint64(br.Len())
			s.limited = true
		case *strings.Reader:
			s.remaining = uint64(br.Len())
			s.limited = true
		default:
			s.limited = false
		}
	}
	// Wrap r with a buffer if it doesn't have one.
	bufr, ok := r.(ByteReader)
	if !ok {
		bufr = bufio.NewReader(r)
	}
	s.r = bufr
	// Reset the decoding context.
	s.stack = s.stack[:0]
	s.size = 0
	s.kind = -1
	s.kinderr = nil
	s.byteval = 0
	s.uintbuf = [32]byte{}
}

Stream.Decode do sanity checks, and load decoder function from cache to decode data.


// go-ethereum/rlp/decode.go

// Decode decodes a value and stores the result in the value pointed
// to by val. Please see the documentation for the Decode function
// to learn about the decoding rules.
func (s *Stream) Decode(val interface{}) error {
	if val == nil {
		return errDecodeIntoNil
	}
	rval := reflect.ValueOf(val)
	rtyp := rval.Type()
	if rtyp.Kind() != reflect.Ptr {
		return errNoPointer
	}
	if rval.IsNil() {
		return errDecodeIntoNil
	}
	decoder, err := cachedDecoder(rtyp.Elem())
	if err != nil {
		return err
	}

	err = decoder(s, rval.Elem())
	if decErr, ok := err.(*decodeError); ok && len(decErr.ctx) > 0 {
		// Add decode target type to error so context has more meaning.
		decErr.ctx = append(decErr.ctx, fmt.Sprint("(", rtyp.Elem(), ")"))
	}
	return err
}

Similar to encode process. In the decode process, it will calls makeDecoder to decide the corresponding decode function for certain type.


// go-ethereum/rlp/decode.go

func makeDecoder(typ reflect.Type, tags rlpstruct.Tags) (dec decoder, err error) {
	kind := typ.Kind()
	switch {
	case typ == rawValueType:
		return decodeRawValue, nil
	case typ.AssignableTo(reflect.PointerTo(bigInt)):
		return decodeBigInt, nil
	case typ.AssignableTo(bigInt):
		return decodeBigIntNoPtr, nil
	case typ == reflect.PointerTo(u256Int):
		return decodeU256, nil
	case typ == u256Int:
		return decodeU256NoPtr, nil
	case kind == reflect.Ptr:
		return makePtrDecoder(typ, tags)
	case reflect.PointerTo(typ).Implements(decoderInterface):
		return decodeDecoder, nil
	case isUint(kind):
		return decodeUint, nil
	case kind == reflect.Bool:
		return decodeBool, nil
	case kind == reflect.String:
		return decodeString, nil
	case kind == reflect.Slice || kind == reflect.Array:
		return makeListDecoder(typ, tags)
	case kind == reflect.Struct:
		return makeStructDecoder(typ)
	case kind == reflect.Interface:
		return decodeInterface, nil
	default:
		return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ)
	}
}

Using big int decoding as an example, it basically do the reverse of encoding to decode data.


// go-ethereum/rlp/decode.go

func decodeBigInt(s *Stream, val reflect.Value) error {
	i := val.Interface().(*big.Int)
	if i == nil {
		i = new(big.Int)
		val.Set(reflect.ValueOf(i))
	}

	err := s.decodeBigInt(i)
	if err != nil {
		return wrapStreamError(err, val.Type())
	}
	return nil
}

func (s *Stream) decodeBigInt(dst *big.Int) error {
	var buffer []byte
	kind, size, err := s.Kind()
	switch {
	case err != nil:
		return err
	case kind == List:
		return ErrExpectedString
	case kind == Byte:
		buffer = s.uintbuf[:1]
		buffer[0] = s.byteval
		s.kind = -1 // re-arm Kind
	case size == 0:
		// Avoid zero-length read.
		s.kind = -1
	case size <= uint64(len(s.uintbuf)):
		// For integers smaller than s.uintbuf, allocating a buffer
		// can be avoided.
		buffer = s.uintbuf[:size]
		if err := s.readFull(buffer); err != nil {
			return err
		}
		// Reject inputs where single byte encoding should have been used.
		if size == 1 && buffer[0] < 128 {
			return ErrCanonSize
		}
	default:
		// For large integers, a temporary buffer is needed.
		buffer = make([]byte, size)
		if err := s.readFull(buffer); err != nil {
			return err
		}
	}

	// Reject leading zero bytes.
	if len(buffer) > 0 && buffer[0] == 0 {
		return ErrCanonInt
	}
	// Set the integer bytes.
	dst.SetBytes(buffer)
	return nil
}