Introduction
RLP, or Recursive Length Prefix, is a serialization method used in Ethereum to encode structured data. RLP standardizes the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and make sure encoded data is uniquely decodable.
Ethereum uses it for several core purposes, including:
- Encoding Transactions: RLP is used to encode transactions before they are sent to the Ethereum network. This includes the transaction’s nonce, gas price, gas limit, to address, value, data, and signature components.
- Encoding Blocks: Blocks and block headers are also serialized using RLP when they are transmitted across the network or stored.
- Account State: In Ethereum, the account state (nonce, balance, storage root, and code hash) is stored in the Ethereum state trie using RLP for serialization.
Definition of structures to be encoded:
From the definition, we can see the is arbitrarily nested bytes array.
Encoding Rule
In the encoding rule defined in the yellow paper, we can see the the idea of RLP is to use the first byte to distinguish different encoding type based on the data’s type and byte array length.
Byte array and nested byte array has different encoding scheme.
- for byte array, RLP encodes the first byte based on the size of the byte array
- if the byte array is small enough(), the first byte can contain the value itself.
- if size is bigger( , the first byte will contain information of the length of byte array
- if size is bigger (), the first byte will contain information of the length of byte array represents the length of to-encode byte array.
- for nested byte array, RLP first encodes its child arrays and concatenate results, then get the final result based on the size of the concatenated byte array.
Code Analysis
Encode
Encode
is the entry to encode value.Inside the
Encode
, it basically- try to get
encBuffer
fromw
, if it fails to get, then get one through pool.encBuffer
is a struct to store intermidiate value during encoding for later calculates final encode result.
- use
buf
(encBuffer
) to encode value where intermediate value will be stored inencBuffer
struct
- call
buf.writeTo
to integrate intermediate value and get encoding result.
Note: the function of
encBuffer
is to store intermidiate value calculated during the encoding, From the definition of RLP, we can see that when encoding data, the first byte is calculated based on the type and size of data, only when we first decides the size of data can we decide the first byte. In the encoding of nested array or struct, geth first encodes child array and then encodes parents array, so the encBuffer
is to store child array’s encoded data based on which we can calculate parent array’s first byte and concatenate child array.// go-ethereum/rlp/encode.go // Encode writes the RLP encoding of val to w. Note that Encode may // perform many small writes in some cases. Consider making w // buffered. // // Please see package-level documentation of encoding rules. func Encode(w io.Writer, val interface{}) error { // try to get encBuffer from w if buf := encBufferFromWriter(w); buf != nil { return buf.encode(val) } // if fails to get encBuffer from w, then get it from pool. buf := getEncBuffer() // after usage, return buf to pool for reuse. defer encBufferPool.Put(buf) // encode value if err := buf.encode(val); err != nil { return err } return buf.writeTo(w) }
Get encBuffer from cache
Geth uses
encBufferPool
to reuse encBuffer
. If it cant decode encBuffer
from w, geth gets it from pool which saves memory.// go-ethereum/rlp/encbuffer.go // EncoderBuffer is a buffer for incremental encoding. // // The zero value is NOT ready for use. To get a usable buffer, // create it using NewEncoderBuffer or call Reset. type EncoderBuffer struct { buf *encBuffer dst io.Writer ownBuffer bool } type encBuffer struct { str []byte // string data, contains everything except list headers lheads []listhead // all list headers lhsize int // sum of sizes of all encoded list headers sizebuf [9]byte // auxiliary buffer for uint encoding } // The global encBuffer pool. var encBufferPool = sync.Pool{ New: func() interface{} { return new(encBuffer) }, } func getEncBuffer() *encBuffer { buf := encBufferPool.Get().(*encBuffer) buf.reset() return buf } func (buf *encBuffer) reset() { buf.lhsize = 0 buf.str = buf.str[:0] buf.lheads = buf.lheads[:0] } func encBufferFromWriter(w io.Writer) *encBuffer { switch w := w.(type) { case EncoderBuffer: return w.buf case *EncoderBuffer: return w.buf case *encBuffer: return w default: return nil } }
Load encoder from cache
Because RLP has different encoding scheme for different types. So Geth implements corresponding encoder and decoder for each type.
Inside
encode
, it gets corresponding writer(encoder) based on value’s type, and use the right write to encode.// go-ethereum/rlp/encbuffer.go func (buf *encBuffer) encode(val interface{}) error { rval := reflect.ValueOf(val) writer, err := cachedWriter(rval.Type()) if err != nil { return err } return writer(rval, buf) }
To save memory, geth uses cache to reuse encoders and decoders. In the
cachedWriter
, it construct the key of the encoder, and try to get it from cache, if there is no instance in cache, then it will generate a new one.// go-ethereum/rlp/typecache.go // typekey is the key of a type in typeCache. It includes the struct tags because // they might generate a different decoder. type typekey struct { reflect.Type rlpstruct.Tags } // typeinfo is an entry in the type cache. type typeinfo struct { decoder decoder decoderErr error // error from makeDecoder writer writer writerErr error // error from makeWriter } // the up-to-date map is stored in cur, next is a staging value used during update of map type typeCache struct { cur atomic.Value // This lock synchronizes writers. mu sync.Mutex next map[typekey]*typeinfo } var theTC = newTypeCache() func newTypeCache() *typeCache { c := new(typeCache) c.cur.Store(make(map[typekey]*typeinfo)) return c } func cachedWriter(typ reflect.Type) (writer, error) { info := theTC.info(typ) return info.writer, info.writerErr }
Note
typekey
also contain Tags
except Type
of the value. This is because different struct type value can have different encoding scheme, so geth uses Tags
to differentiate those structs’ different encoders/decoders.// go-ethereum/rlp/internal/rlpstruct/rlpstruct.go // Tags represents struct tags. type Tags struct { // rlp:"nil" controls whether empty input results in a nil pointer. // nilKind is the kind of empty value allowed for the field. NilKind NilKind NilOK bool // rlp:"optional" allows for a field to be missing in the input list. // If this is set, all subsequent fields must also be optional. Optional bool // rlp:"tail" controls whether this field swallows additional list elements. It can // only be set for the last field, which must be of slice type. Tail bool // rlp:"-" ignores fields. Ignored bool }
Inside the
typeCache.info
, it tries to load from cache, if not exists, then generate a new one.func (c *typeCache) info(typ reflect.Type) *typeinfo { key := typekey{Type: typ} if info := c.cur.Load().(map[typekey]*typeinfo)[key]; info != nil { return info } // Not in the cache, need to generate info for this type. return c.generate(typ, rlpstruct.Tags{}) }
Inside the
generate
, it first locks generate
, then try to load data from cur, if the value doesn’t exist, then it will one and store it in the cur
.// go-ethereum/rlp/typecache.go func (c *typeCache) generate(typ reflect.Type, tags rlpstruct.Tags) *typeinfo { c.mu.Lock() defer c.mu.Unlock() cur := c.cur.Load().(map[typekey]*typeinfo) if info := cur[typekey{typ, tags}]; info != nil { return info } // Copy cur to next. c.next = maps.Clone(cur) // Generate. info := c.infoWhileGenerating(typ, tags) // next -> cur c.cur.Store(c.next) c.next = nil return info } func (c *typeCache) infoWhileGenerating(typ reflect.Type, tags rlpstruct.Tags) *typeinfo { key := typekey{typ, tags} if info := c.next[key]; info != nil { return info } // Put a dummy value into the cache before generating. // If the generator tries to lookup itself, it will get // the dummy value and won't call itself recursively. info := new(typeinfo) c.next[key] = info info.generate(typ, tags) return info } func (i *typeinfo) generate(typ reflect.Type, tags rlpstruct.Tags) { i.decoder, i.decoderErr = makeDecoder(typ, tags) i.writer, i.writerErr = makeWriter(typ, tags) }
makeDecoder
and makeWriter
basically return the corresponding encode/decode function according to values’ type and tags.// go-ethereum/rlp/encode.go // makeWriter creates a writer function for the given type. func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) { kind := typ.Kind() switch { case typ == rawValueType: return writeRawValue, nil case typ.AssignableTo(reflect.PointerTo(bigInt)): return writeBigIntPtr, nil case typ.AssignableTo(bigInt): return writeBigIntNoPtr, nil case typ == reflect.PointerTo(u256Int): return writeU256IntPtr, nil case typ == u256Int: return writeU256IntNoPtr, nil case kind == reflect.Ptr: return makePtrWriter(typ, ts) case reflect.PointerTo(typ).Implements(encoderInterface): return makeEncoderWriter(typ), nil case isUint(kind): return writeUint, nil case kind == reflect.Bool: return writeBool, nil case kind == reflect.String: return writeString, nil case kind == reflect.Slice && isByte(typ.Elem()): return writeBytes, nil case kind == reflect.Array && isByte(typ.Elem()): return makeByteArrayWriter(typ), nil case kind == reflect.Slice || kind == reflect.Array: return makeSliceWriter(typ, ts) case kind == reflect.Struct: return makeStructWriter(typ) case kind == reflect.Interface: return writeInterface, nil default: return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ) } }
// go-ethereum/rlp/decode.go func makeDecoder(typ reflect.Type, tags rlpstruct.Tags) (dec decoder, err error) { kind := typ.Kind() switch { case typ == rawValueType: return decodeRawValue, nil case typ.AssignableTo(reflect.PointerTo(bigInt)): return decodeBigInt, nil case typ.AssignableTo(bigInt): return decodeBigIntNoPtr, nil case typ == reflect.PointerTo(u256Int): return decodeU256, nil case typ == u256Int: return decodeU256NoPtr, nil case kind == reflect.Ptr: return makePtrDecoder(typ, tags) case reflect.PointerTo(typ).Implements(decoderInterface): return decodeDecoder, nil case isUint(kind): return decodeUint, nil case kind == reflect.Bool: return decodeBool, nil case kind == reflect.String: return decodeString, nil case kind == reflect.Slice || kind == reflect.Array: return makeListDecoder(typ, tags) case kind == reflect.Struct: return makeStructDecoder(typ) case kind == reflect.Interface: return decodeInterface, nil default: return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ) } }
Encode
After the encoder has been fetched, it just calls the writer to encode value, intermidiate data will be stored in
encBuffer
.// go-ethereum/rlp/encbuffer.go func (buf *encBuffer) encode(val interface{}) error { rval := reflect.ValueOf(val) writer, err := cachedWriter(rval.Type()) if err != nil { return err } return writer(rval, buf) }
Generate result
writeTo
calcualtes the result encoded data according to intermidiate value stored in encBuffer
.// go-ethereum/rlp/encbuffer.go // writeTo writes the encoder output to w. func (buf *encBuffer) writeTo(w io.Writer) (err error) { strpos := 0 for _, head := range buf.lheads { // write string data before header if head.offset-strpos > 0 { n, err := w.Write(buf.str[strpos:head.offset]) strpos += n if err != nil { return err } } // write the header enc := head.encode(buf.sizebuf[:]) if _, err = w.Write(enc); err != nil { return err } } if strpos < len(buf.str) { // write string data after the last list header _, err = w.Write(buf.str[strpos:]) } return err }
Encode Big int
writeBigIntNoPtr
checks whether value is non-negative, RLP doesn’t support encode negative value.// go-ethereum/rlp/encode.go func writeBigIntNoPtr(val reflect.Value, w *encBuffer) error { i := val.Interface().(big.Int) if i.Sign() == -1 { return ErrNegativeBigInt } w.writeBigInt(&i) return nil }
In the
writeBigInt
:- check whether its Uint64, if is, then calls
buf.writeUint64
to encode
- calculate the minimal length of data in byte array format
- encode string header. (string header means the encoded data except original data)
- copy the data itself into
buf.str
// go-ethereum/rlp/encbuffer.go type listhead struct { offset int // index of this header in string data size int // total size of encoded data (including list headers) } type encBuffer struct { str []byte // string data, contains everything except list headers lheads []listhead // all list headers lhsize int // sum of sizes of all encoded list headers sizebuf [9]byte // auxiliary buffer for uint encoding } // writeBigInt writes i as an integer. func (buf *encBuffer) writeBigInt(i *big.Int) { // get length of data in bit array format bitlen := i.BitLen() if bitlen <= 64 { buf.writeUint64(i.Uint64()) return } // Integer is larger than 64 bits, encode from i.Bits(). // The minimal byte length is bitlen rounded up to the next // multiple of 8, divided by 8. length := ((bitlen + 7) & -8) >> 3 buf.encodeStringHeader(length) // allocate memory to copy the data itself into buf.str. Because i is of big.Int type, // so geth needs to convert the big.Int format to big-endian byte array buf.str = append(buf.str, make([]byte, length)...) index := length bytesBuf := buf.str[len(buf.str)-length:] for _, d := range i.Bits() { for j := 0; j < wordBytes && index > 0; j++ { index-- bytesBuf[index] = byte(d) d >>= 8 } } }
Inside the
encodeStringHeader
, it basically checks whether size is smaller than 56:- if is, then the header will be 0x80 + size (byte conversion is to make sure the result is a byte, avoid overflow)
- if not, then the header will be 0xB7 +
sizesize
(minimal length of size in byte array) + byte array represents size of original data.
// go-ethereum/rlp/encbuffer.go func (buf *encBuffer) encodeStringHeader(size int) { if size < 56 { buf.str = append(buf.str, 0x80+byte(size)) } else { // putint copies the size into buffer in byte array format and return the minimal length of the // byte array represents the size sizesize := putint(buf.sizebuf[1:], uint64(size)) buf.sizebuf[0] = 0xB7 + byte(sizesize) // buf.sizebuf[1:sizesize+1] is byte array represents the size buf.str = append(buf.str, buf.sizebuf[:sizesize+1]...) } }
// go-ethereum/rlp/encode.go // putint writes i to the beginning of b in big endian byte // order, using the least number of bytes needed to represent i. func putint(b []byte, i uint64) (size int) { switch { case i < (1 << 8): b[0] = byte(i) return 1 case i < (1 << 16): b[0] = byte(i >> 8) b[1] = byte(i) return 2 case i < (1 << 24): b[0] = byte(i >> 16) b[1] = byte(i >> 8) b[2] = byte(i) return 3 case i < (1 << 32): b[0] = byte(i >> 24) b[1] = byte(i >> 16) b[2] = byte(i >> 8) b[3] = byte(i) return 4 case i < (1 << 40): b[0] = byte(i >> 32) b[1] = byte(i >> 24) b[2] = byte(i >> 16) b[3] = byte(i >> 8) b[4] = byte(i) return 5 case i < (1 << 48): b[0] = byte(i >> 40) b[1] = byte(i >> 32) b[2] = byte(i >> 24) b[3] = byte(i >> 16) b[4] = byte(i >> 8) b[5] = byte(i) return 6 case i < (1 << 56): b[0] = byte(i >> 48) b[1] = byte(i >> 40) b[2] = byte(i >> 32) b[3] = byte(i >> 24) b[4] = byte(i >> 16) b[5] = byte(i >> 8) b[6] = byte(i) return 7 default: b[0] = byte(i >> 56) b[1] = byte(i >> 48) b[2] = byte(i >> 40) b[3] = byte(i >> 32) b[4] = byte(i >> 24) b[5] = byte(i >> 16) b[6] = byte(i >> 8) b[7] = byte(i) return 8 } }
Because in the big int encoding process, we doesn’t encounter usage of
buf.lheads
, so writeTo
just write the data in buf.str
which is just the final encoded data into writer. buf.lheads
will be used when encodes nested array like struct.// // go-ethereum/rlp/encbuffer.go // writeTo writes the encoder output to w. func (buf *encBuffer) writeTo(w io.Writer) (err error) { strpos := 0 for _, head := range buf.lheads { // write string data before header if head.offset-strpos > 0 { n, err := w.Write(buf.str[strpos:head.offset]) strpos += n if err != nil { return err } } // write the header enc := head.encode(buf.sizebuf[:]) if _, err = w.Write(enc); err != nil { return err } } if strpos < len(buf.str) { // write string data after the last list header _, err = w.Write(buf.str[strpos:]) } return err }
Encode Struct
In the
makeWriter
, if type is struct, it wil call makeStructWriter
to generate encoder.// go-ethereum/rlp/encode.go // makeWriter creates a writer function for the given type. func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) { kind := typ.Kind() // ... case kind == reflect.Struct: return makeStructWriter(typ) // ... } }
Inside the
makeStructWriter
:- get fields info in struct which includes index, encoder/decoder.
// go-ethereum/rlp/encode.go type field struct { index int info *typeinfo optional bool } func makeStructWriter(typ reflect.Type) (writer, error) { fields, err := structFields(typ) if err != nil { return nil, err } for _, f := range fields { if f.info.writerErr != nil { return nil, structFieldError{typ, f.index, f.info.writerErr} } } var writer writer firstOptionalField := firstOptionalField(fields) if firstOptionalField == len(fields) { // This is the writer function for structs without any optional fields. writer = func(val reflect.Value, w *encBuffer) error { lh := w.list() for _, f := range fields { if err := f.info.writer(val.Field(f.index), w); err != nil { return err } } w.listEnd(lh) return nil } } else { // If there are any "optional" fields, the writer needs to perform additional // checks to determine the output list length. writer = func(val reflect.Value, w *encBuffer) error { lastField := len(fields) - 1 for ; lastField >= firstOptionalField; lastField-- { if !val.Field(fields[lastField].index).IsZero() { break } } lh := w.list() for i := 0; i <= lastField; i++ { if err := fields[i].info.writer(val.Field(fields[i].index), w); err != nil { return err } } w.listEnd(lh) return nil } } return writer, nil }
structFields
resolves the typeinfo of all public fields in the struct type. Basically, it will get/generate corresponding typeinfo which contains writer based on the field type and tags in the struct. The resolved typeinfo(writer) will be used later to encode the field.// go-ethereum/rlp/typecache.go // typeinfo is an entry in the type cache. type typeinfo struct { decoder decoder decoderErr error // error from makeDecoder writer writer writerErr error // error from makeWriter } type field struct { index int info *typeinfo optional bool } // structFields resolves the typeinfo of all public fields in a struct type. func structFields(typ reflect.Type) (fields []field, err error) { // Convert fields to rlpstruct.Field. var allStructFields []rlpstruct.Field for i := 0; i < typ.NumField(); i++ { rf := typ.Field(i) allStructFields = append(allStructFields, rlpstruct.Field{ Name: rf.Name, Index: i, Exported: rf.PkgPath == "", Tag: string(rf.Tag), Type: *rtypeToStructType(rf.Type, nil), }) } // Filter/validate fields. structFields, structTags, err := rlpstruct.ProcessFields(allStructFields) if err != nil { if tagErr, ok := err.(rlpstruct.TagError); ok { tagErr.StructType = typ.String() return nil, tagErr } return nil, err } // Resolve typeinfo. for i, sf := range structFields { typ := typ.Field(sf.Index).Type tags := structTags[i] info := theTC.infoWhileGenerating(typ, tags) fields = append(fields, field{sf.Index, info, tags.Optional}) } return fields, nil }
ProcessFields
will validate fields according to tags and extract fields and corresponding tags needed to be encoded. Different tags of same type field lead to different encoding.// go-ethereum/rlp/internal/rlpstruct/rlpstruct.go // Field represents a struct field. type Field struct { Name string Index int Exported bool Type Type Tag string } // NilKind is the RLP value encoded in place of nil pointers. type NilKind uint8 const ( NilKindString NilKind = 0x80 NilKindList NilKind = 0xC0 ) // Tags represents struct tags. type Tags struct { // rlp:"nil" controls whether empty input results in a nil pointer. // nilKind is the kind of empty value allowed for the field. NilKind NilKind NilOK bool // rlp:"optional" allows for a field to be missing in the input list. // If this is set, all subsequent fields must also be optional. Optional bool // rlp:"tail" controls whether this field swallows additional list elements. It can // only be set for the last field, which must be of slice type. Tail bool // rlp:"-" ignores fields. Ignored bool } // ProcessFields filters the given struct fields, returning only fields // that should be considered for encoding/decoding. func ProcessFields(allFields []Field) ([]Field, []Tags, error) { lastPublic := lastPublicField(allFields) // Gather all exported fields and their tags. var fields []Field var tags []Tags for _, field := range allFields { if !field.Exported { continue } // check whether field should be encoded according the type and tag ts, err := parseTag(field, lastPublic) if err != nil { return nil, nil, err } if ts.Ignored { continue } fields = append(fields, field) tags = append(tags, ts) } // Verify optional field consistency. If any optional field exists, // all fields after it must also be optional. Note: optional + tail // is supported. var anyOptional bool var firstOptionalName string for i, ts := range tags { name := fields[i].Name if ts.Optional || ts.Tail { if !anyOptional { firstOptionalName = name } anyOptional = true } else { if anyOptional { msg := fmt.Sprintf("must be optional because preceding field %q is optional", firstOptionalName) return nil, nil, TagError{Field: name, Err: msg} } } } return fields, tags, nil } // validate tag of field in struct and calculate corresponding tags. func parseTag(field Field, lastPublic int) (Tags, error) { name := field.Name tag := reflect.StructTag(field.Tag) var ts Tags for _, t := range strings.Split(tag.Get("rlp"), ",") { switch t = strings.TrimSpace(t); t { case "": // empty tag is allowed for some reason case "-": ts.Ignored = true case "nil", "nilString", "nilList": ts.NilOK = true if field.Type.Kind != reflect.Ptr { return ts, TagError{Field: name, Tag: t, Err: "field is not a pointer"} } switch t { case "nil": ts.NilKind = field.Type.Elem.DefaultNilValue() case "nilString": ts.NilKind = NilKindString case "nilList": ts.NilKind = NilKindList } case "optional": ts.Optional = true if ts.Tail { return ts, TagError{Field: name, Tag: t, Err: `also has "tail" tag`} } case "tail": ts.Tail = true if field.Index != lastPublic { return ts, TagError{Field: name, Tag: t, Err: "must be on last field"} } if ts.Optional { return ts, TagError{Field: name, Tag: t, Err: `also has "optional" tag`} } if field.Type.Kind != reflect.Slice { return ts, TagError{Field: name, Tag: t, Err: "field type is not slice"} } default: return ts, TagError{Field: name, Tag: t, Err: "unknown tag"} } } return ts, nil }
After geth have resolved all the fields including corresponding writer of each field, geth will construct writer for this kind of struct. Also the writer is based on optional field status. In struct, field cam be marked as optional which means this field can be omitted if the value is zero (depends on type). Geth checks whether those fields marked as “optional” is consistent (If any optional field exists, all fields after it must also be optional). In the encoding, geth locates last optional field which is not zero and omits zero fields after this field in the encoding.
// go-ethereum/rlp/encode.go type field struct { index int info *typeinfo optional bool } func makeStructWriter(typ reflect.Type) (writer, error) { fields, err := structFields(typ) if err != nil { return nil, err } for _, f := range fields { if f.info.writerErr != nil { return nil, structFieldError{typ, f.index, f.info.writerErr} } } var writer writer firstOptionalField := firstOptionalField(fields) // there is no optional field if firstOptionalField == len(fields) { // This is the writer function for structs without any optional fields. writer = func(val reflect.Value, w *encBuffer) error { lh := w.list() for _, f := range fields { if err := f.info.writer(val.Field(f.index), w); err != nil { return err } } w.listEnd(lh) return nil } } else { // If there are any "optional" fields, the writer needs to perform additional // checks to determine the output list length. writer = func(val reflect.Value, w *encBuffer) error { lastField := len(fields) - 1 // calculate the last optional field whose value is not zero. for ; lastField >= firstOptionalField; lastField-- { if !val.Field(fields[lastField].index).IsZero() { break } } // encode struct lh := w.list() for i := 0; i <= lastField; i++ { if err := fields[i].info.writer(val.Field(fields[i].index), w); err != nil { return err } } w.listEnd(lh) return nil } } return writer, nil }
In the struct encoder, we can see that it first calls
w.list
, then loop fields to use corresponding encoder to encode each field, and finally calls w.listEnd
.// go-ethereum/rlp/encode.go func makeStructWriter(typ reflect.Type) (writer, error) { writer = func(val reflect.Value, w *encBuffer) error { lh := w.list() for _, f := range fields { if err := f.info.writer(val.Field(f.index), w); err != nil { return err } } w.listEnd(lh) return nil } }
list
and listEnd
are used to generate struct’s header information. In the encoding of simple type data, like int, geth can simply calcualte the encoded data and append it into
encBuffer.str
. But when deal with struct, it become more complex. Because we can’t calculate the struct’ header before we have calculated struct’s elements’ data. So in the encoding of struct data, RLP firsts encodes all element’s data, append those data into
encBuffer.str
, and also use list
and listEnd
to generate and record related information of the struct header for the calcualtion of struct header later. Geth uses
encBuffer.lheads
to record essential information to calculate struct’s header. Struct’s field may also be struct, so we need to record the start position of struct’s elements’ data, plus all child structs’ head information of the struct, so that we can add the elements’ data’s byte length and all child structs’ header length to calcualte the parent struct’s data length, then calculate the correct header.In the
list
, it appends a new lhead
(list head) to encBuffer.lheads
which records the start position of struct’s encoded data, also the current list header sizes.After the encoding, we can calcualte the size of struct encoded data including all child struct’s header as:
is encoded data doesn’t includes header data.
is all child struct’s headers’ size.
The addition of both is the byte length of the struct’s elements rlp encoded data.
// go-ethereum/rlp/encbuffer.go // list adds a new list header to the header stack. It returns the index of the header. // Call listEnd with this index after encoding the content of the list. func (buf *encBuffer) list() int { // len(buf.str) is the start position of this struct's rlp encoded data // buf.lhsize is the current size of all list headers buf.lheads = append(buf.lheads, listhead{offset: len(buf.str), size: buf.lhsize}) return len(buf.lheads) - 1 } func (buf *encBuffer) listEnd(index int) { lh := &buf.lheads[index] // calculate the byte array length of the struct elements' rlp encoded data. lh.size = buf.size() - lh.offset - lh.size // if data size is smaller than 56, then the header size is 1 byte if lh.size < 56 { buf.lhsize++ // length encoded into kind tag } else { // if data size is bigger than 56, then the header is 1 bytes plus minimal byte array reprensents the length of data buf.lhsize += 1 + intsize(uint64(lh.size)) } } // size returns the length of the encoded data. func (buf *encBuffer) size() int { return len(buf.str) + buf.lhsize } // intsize computes the minimum number of bytes required to store i. func intsize(i uint64) (size int) { for size = 1; ; size++ { if i >>= 8; i == 0 { return size } } }
The implementation of struct/list encoding is quite complexed, let’s use an exampl to illustrate this process, assume we want to encode struct:
{ 5, { 10 } }
Geth first encodes the first field which is int 5, at this time, and fields of
encBuffer
will be :str = [0x05] lheads =[ { offset:0, size:0 } ] lhsize = 0
After encodes the second struct field, fields of
encBuffer
:str = [0x05,0x0A] lheads =[ { offset:0, size:0 } { offset:1, size:1 } ] lhsize = 1
After the outer loop calls
buf.listEnd()
to update the outer struct’s hearder:str = [0x05,0x0A] lheads =[ { offset:0, size:3 } { offset:1, size:1 } ] lhsize = 2
Based on the list header, we can know that the child struct’ elements encoding is
0x0A
, whose byte length is 1, so the header wil be 0xC2
(0xC0
(192) plus byte length 1), thus the child struct’s encoding is 0xC10A
. The parent struct’s prefix will be 0xC3
(0xC0
plus byte length 3), the encoding will be 0xC305C10A
.After writers having encode all fields, geth will call
encBuffer.writeTo
to integrate all intermidiate data(element’s data and list headers) to calculate final encoding of struct.Inside
writeTo
, it essentially calcualtes each header’s encoding, and concatenate with element’s data to get the finally encoding of the struct.// go-ethereum/rlp/encbuffer.go // writeTo writes the encoder output to w. func (buf *encBuffer) writeTo(w io.Writer) (err error) { strpos := 0 for _, head := range buf.lheads { // write front non-list element's data before header if head.offset-strpos > 0 { n, err := w.Write(buf.str[strpos:head.offset]) strpos += n if err != nil { return err } } // calculate the header's encoding, and write into w enc := head.encode(buf.sizebuf[:]) if _, err = w.Write(enc); err != nil { return err } } if strpos < len(buf.str) { // write string data after the last list header _, err = w.Write(buf.str[strpos:]) } return err }
// go-ethereum/rlp/encode.go // encode writes head to the given buffer, which must be at least // 9 bytes long. It returns the encoded bytes. func (head *listhead) encode(buf []byte) []byte { return buf[:puthead(buf, 0xC0, 0xF7, uint64(head.size))] } // puthead writes a list or string header to buf. // buf must be at least 9 bytes long. func puthead(buf []byte, smalltag, largetag byte, size uint64) int { if size < 56 { buf[0] = smalltag + byte(size) return 1 } // size of the minimal byte array represents the size sizesize := putint(buf[1:], size) buf[0] = largetag + byte(sizesize) return sizesize + 1 }
Encode Array/Slice
// makeWriter creates a writer function for the given type. func makeWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) { kind := typ.Kind() // ... case kind == reflect.Slice || kind == reflect.Array: return makeSliceWriter(typ, ts) // ... }
In the
makeSliceWriter
, the encoding is similar to struct(struct can be seen as array whose elements can be of different types). One difference is the Tail
flag. If the slice/array is the last public field of a struct, it will have the Tail
flag, then geth won’t computes its list header.This is because we can calculate the byte length of encoded elements data of the array/slice based on the information of the parent struct’ list header and byte length of preceding fields’ encoded data. So this process actually make the RLP encoding more compact.
func makeSliceWriter(typ reflect.Type, ts rlpstruct.Tags) (writer, error) { etypeinfo := theTC.infoWhileGenerating(typ.Elem(), rlpstruct.Tags{}) if etypeinfo.writerErr != nil { return nil, etypeinfo.writerErr } var wfn writer if ts.Tail { // This is for struct tail slices. // w.list is not called for them. wfn = func(val reflect.Value, w *encBuffer) error { vlen := val.Len() for i := 0; i < vlen; i++ { if err := etypeinfo.writer(val.Index(i), w); err != nil { return err } } return nil } } else { // This is for regular slices and arrays. wfn = func(val reflect.Value, w *encBuffer) error { vlen := val.Len() if vlen == 0 { w.str = append(w.str, 0xC0) return nil } listOffset := w.list() for i := 0; i < vlen; i++ { if err := etypeinfo.writer(val.Index(i), w); err != nil { return err } } w.listEnd(listOffset) return nil } } return wfn, nil }
Decode
Decode
is the entry to decode RLP-encoded data. It uses struct Stream
to help decode data.// go-ethereum/rlp/decode.go // Kind represents the kind of value contained in an RLP stream. type Kind int8 const ( Byte Kind = iota String List ) // Stream can be used for piecemeal decoding of an input stream. This // is useful if the input is very large or if the decoding rules for a // type depend on the input structure. Stream does not keep an // internal buffer. After decoding a value, the input reader will be // positioned just before the type information for the next value. // // When decoding a list and the input position reaches the declared // length of the list, all operations will return error EOL. // The end of the list must be acknowledged using ListEnd to continue // reading the enclosing list. // // Stream is not safe for concurrent use. type Stream struct { r ByteReader remaining uint64 // number of bytes remaining to be read from r size uint64 // size of value ahead kinderr error // error from last readKind stack []uint64 // list sizes uintbuf [32]byte // auxiliary buffer for integer decoding kind Kind // kind of value ahead byteval byte // value of single byte in type tag limited bool // true if input limit is in effect } streamPool = sync.Pool{ New: func() interface{} { return new(Stream) }, } // Decode parses RLP-encoded data from r and stores the result in the value pointed to by // val. Please see package-level documentation for the decoding rules. Val must be a // non-nil pointer. // // If r does not implement ByteReader, Decode will do its own buffering. // // Note that Decode does not set an input limit for all readers and may be vulnerable to // panics cause by huge value sizes. If you need an input limit, use // // NewStream(r, limit).Decode(val) func Decode(r io.Reader, val interface{}) error { stream := streamPool.Get().(*Stream) defer streamPool.Put(stream) stream.Reset(r, 0) return stream.Decode(val) }
Stream.Reset
discards any information about the current decoding context and starts reading from r
.// go-ethereum/rlp/decode.go // Reset discards any information about the current decoding context // and starts reading from r. This method is meant to facilitate reuse // of a preallocated Stream across many decoding operations. // // If r does not also implement ByteReader, Stream will do its own // buffering. func (s *Stream) Reset(r io.Reader, inputLimit uint64) { if inputLimit > 0 { s.remaining = inputLimit s.limited = true } else { // Attempt to automatically discover // the limit when reading from a byte slice. switch br := r.(type) { case *bytes.Reader: s.remaining = uint64(br.Len()) s.limited = true case *bytes.Buffer: s.remaining = uint64(br.Len()) s.limited = true case *strings.Reader: s.remaining = uint64(br.Len()) s.limited = true default: s.limited = false } } // Wrap r with a buffer if it doesn't have one. bufr, ok := r.(ByteReader) if !ok { bufr = bufio.NewReader(r) } s.r = bufr // Reset the decoding context. s.stack = s.stack[:0] s.size = 0 s.kind = -1 s.kinderr = nil s.byteval = 0 s.uintbuf = [32]byte{} }
Stream.Decode
do sanity checks, and load decoder function from cache to decode data.// go-ethereum/rlp/decode.go // Decode decodes a value and stores the result in the value pointed // to by val. Please see the documentation for the Decode function // to learn about the decoding rules. func (s *Stream) Decode(val interface{}) error { if val == nil { return errDecodeIntoNil } rval := reflect.ValueOf(val) rtyp := rval.Type() if rtyp.Kind() != reflect.Ptr { return errNoPointer } if rval.IsNil() { return errDecodeIntoNil } decoder, err := cachedDecoder(rtyp.Elem()) if err != nil { return err } err = decoder(s, rval.Elem()) if decErr, ok := err.(*decodeError); ok && len(decErr.ctx) > 0 { // Add decode target type to error so context has more meaning. decErr.ctx = append(decErr.ctx, fmt.Sprint("(", rtyp.Elem(), ")")) } return err }
Similar to encode process. In the decode process, it will calls
makeDecoder
to decide the corresponding decode function for certain type.// go-ethereum/rlp/decode.go func makeDecoder(typ reflect.Type, tags rlpstruct.Tags) (dec decoder, err error) { kind := typ.Kind() switch { case typ == rawValueType: return decodeRawValue, nil case typ.AssignableTo(reflect.PointerTo(bigInt)): return decodeBigInt, nil case typ.AssignableTo(bigInt): return decodeBigIntNoPtr, nil case typ == reflect.PointerTo(u256Int): return decodeU256, nil case typ == u256Int: return decodeU256NoPtr, nil case kind == reflect.Ptr: return makePtrDecoder(typ, tags) case reflect.PointerTo(typ).Implements(decoderInterface): return decodeDecoder, nil case isUint(kind): return decodeUint, nil case kind == reflect.Bool: return decodeBool, nil case kind == reflect.String: return decodeString, nil case kind == reflect.Slice || kind == reflect.Array: return makeListDecoder(typ, tags) case kind == reflect.Struct: return makeStructDecoder(typ) case kind == reflect.Interface: return decodeInterface, nil default: return nil, fmt.Errorf("rlp: type %v is not RLP-serializable", typ) } }
Using big int decoding as an example, it basically do the reverse of encoding to decode data.
// go-ethereum/rlp/decode.go func decodeBigInt(s *Stream, val reflect.Value) error { i := val.Interface().(*big.Int) if i == nil { i = new(big.Int) val.Set(reflect.ValueOf(i)) } err := s.decodeBigInt(i) if err != nil { return wrapStreamError(err, val.Type()) } return nil } func (s *Stream) decodeBigInt(dst *big.Int) error { var buffer []byte kind, size, err := s.Kind() switch { case err != nil: return err case kind == List: return ErrExpectedString case kind == Byte: buffer = s.uintbuf[:1] buffer[0] = s.byteval s.kind = -1 // re-arm Kind case size == 0: // Avoid zero-length read. s.kind = -1 case size <= uint64(len(s.uintbuf)): // For integers smaller than s.uintbuf, allocating a buffer // can be avoided. buffer = s.uintbuf[:size] if err := s.readFull(buffer); err != nil { return err } // Reject inputs where single byte encoding should have been used. if size == 1 && buffer[0] < 128 { return ErrCanonSize } default: // For large integers, a temporary buffer is needed. buffer = make([]byte, size) if err := s.readFull(buffer); err != nil { return err } } // Reject leading zero bytes. if len(buffer) > 0 && buffer[0] == 0 { return ErrCanonInt } // Set the integer bytes. dst.SetBytes(buffer) return nil }