placeholder
thoughts and learnings in software engineering by Rotem Tamir

Binary encoding of variable length options with Golang

Reading and writing TLV-encoded messages with Go

Recently, I’ve worked on implementing a high-throughput networking component at work. In an effort to conserve CPU time and reduce payload sizes, this service is using a binary encoding as the wire format. An attribute of the protocol we were implementing was that the messages could include optional fields of variable length. This, I learned, is not as simple as it sounds.

Adding an extra field to a message in text-based encoding, such as CSV or JSON, is quite trivial. With JSON, for instance, if we want to add an extra field to a message such as:

{
  "hello": "world"
}

we simply need to append ,"favorite_number": 42 after the first line of data and before the terminated curly brace }.

When working in binary protocols we usually don’t have the luxury of picking some character as a delimiter between fields or messages. This is because when transmitting a stream of arbitrary data, any delimiter you pick might included in the data stream. You cannot reserve any of the 256 permutations of eight zeros-and-ones that may appear in a byte, for your delimiter, because they may appear in the data.

It is possible, of course, to pick a delimiter if we decide on an escape character (in the same way we can use quotation marks inside a JSON string if we prepend a backslash, for example: "giant \"laser\"".

a giant "laser"

However, by doing so, we must now pay the cost of actually reading the message byte-by-byte looking for these delimiters and accounting for escape characters. In my eyes, a major reason to use a binary protocol is being able to copy bytes from the network interface into memory directly without needing to use CPU to parse at all.

As a consequence, binary protocols usually skip the indulgence of using delimiters to determine field boundaries and resort to fixed length field message structures, with a special field to denote the payload length. For instance, IPv6 headers are always 40-bytes long, with bytes 5–6 reserved for a 16-bit unsigned integer for the payload length.

This allows a parser of the packet to begin by reading the 40-bytes of headers, then peek in bytes 5–6 and know exactly how much to read further to the end of the packet.

3.  IPv6 Header Format

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version| Traffic Class |           Flow Label                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Payload Length        |  Next Header  |   Hop Limit   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                         Source Address                        +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                      Destination Address                      +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Version              4-bit Internet Protocol version number = 6.

   Traffic Class        8-bit traffic class field.  See section 7.

   Flow Label           20-bit flow label.  See section 6.

   Payload Length       16-bit unsigned integer.  Length of the IPv6
                        payload, i.e., the rest of the packet following
                        this IPv6 header, in octets.  (Note that any

(IPv6 Header structure, Source: IETF RFC 2460)

The problem at hand requires something a bit more elaborate. In our case, we have optional fields, and a multitude of them. Can we have optional, variable length fields without using delimiters?

A common solution for this is an encoding called TLV, shorthand for Type-Length-Value, it is very common in network protocols, and it’s actually the way options are specified within IPv6 extension headers.

A TLV-encoded message is made of 3 parts: bytes signifying its type, bytes signifying the payload length, and zero or more payload bytes. The length of the type field and the length field must be determined a priori to parsing the message in order to allow readers to know what the boundaries of each field are.

Let’s see how we might implement something like this with Golang.

package tlv
// ByteSize is the size of a field in bytes. Used to define the size of the type and length field in a message.

type ByteSize int

const (
   OneByte    ByteSize = 1
   TwoBytes   ByteSize = 2
   FourBytes  ByteSize = 4
   EightBytes ByteSize = 8
)

We start off by defining a custom ByteSize type this type will be used in our configuration object Codec to indicate how large our type and length are. Picking the right size of these fields is a tradeoff between future compatibility and total payload size. For instance, if you pick OneByte for your length field size, you will be limited to payloads up to 256-byte in size. If you pick an EightByte size for the type field, for a protocol which you only expect to have a handful of different types, you will end up always sending seven zeroes of wasted bandwidth in each messages. So pick what is right for your protocol.

// Record represents a record of data encoded in the TLV message.

type Record struct {
   Payload []byte
   Type    uint
}

// Codec is the configuration for a specific TLV encoding/decoding tasks.

type Codec struct {

   // TypeBytes defines the size in bytes of the message type field.

   TypeBytes ByteSize

   // LenBytes defines the size in bytes of the message length field.

   LenBytes  ByteSize
}

Our data container is called Record and has an unsigned int Type field (the T in TLV) and a byte slice for the Payload.

Let’s see how we encode some messages:

// Writer encodes records into TLV format using a Codec and writes them into a provided io.Writer

type Writer struct {
   writer io.Writer
   codec  *Codec
}

func NewWriter(w io.Writer, codec *Codec) *Writer {
   return &Writer{
      codec:  codec,
      writer: w,
   }
}

// Write encodes records into TLV format using a Codec and writes them into a provided io.Writer

func (w *Writer) Write(rec *Record) (error) {
   err := writeUint(w.writer, w.codec.TypeBytes, rec.Type)
   if err != nil {
      return err
   }

   ulen := uint(len(rec.Payload))
   err = writeUint(w.writer, w.codec.LenBytes, ulen)
   if err != nil {
      return err
   }

   _, err = w.writer.Write(rec.Payload)
   return err
}

func writeUint(w io.Writer, b ByteSize, i uint) error {
   var num interface{}
   switch b {
   case OneByte:
      num = uint8(i)
   case TwoBytes:
      num = uint16(i)
   case FourBytes:
      num = uint32(i)
   case EightBytes:
      num = uint64(i)
   }
   return binary.Write(w, binary.BigEndian, num)
}

Let’s run some test code to see how it works:

package main

import (
   "bytes"
   "encoding/hex"
   "fmt"
   "tlv"
)

func main() {
   buf := new(bytes.Buffer)
   codec := &tlv.Codec{TypeBytes: tlv.TwoBytes, LenBytes: tlv.TwoBytes}
   wr := tlv.NewWriter(buf, codec)

   record := &tlv.Record{
      Payload: []byte("hello, go!"),
      Type: 8,
   }

   wr.Write(record)

   fmt.Println(hex.Dump(buf.Bytes()))
}

which prints:

00000000 00 08 00 0a 68 65 6c 6c 6f 2c 20 67 6f 21 |....hello, go!|

Let’s analyze the output. Per our codec definition, the first two bytes will indicate the type of the message, 00 08 in hexadecimal is 8 in decimal which is what we wanted. The next two bytes indicate the length of the payload, 00 0a in hexa is 10 in decimal which is the length in characters of hello, go!. Finally, the next bytes are our payload, 68 65 6c 6c 6f 2c 20 67 6f 21 are the hexa values for hello, go! in ASCII (0x68 = 104 = h).

Our next step will be to write code to parse a TLV-encoded message back into the Record struct:

// Reader decodes records from TLV format using a Codec from provided io.Reader

type Reader struct {
   codec  *Codec
   reader io.Reader
}

func NewReader(reader io.Reader, codec *Codec) *Reader {
   return &Reader{codec: codec, reader: reader}
}

// Next tries to read a single Record from the io.Reader

func (r *Reader) Next() (*Record, error) {
   // get type

   typeBytes := make([]byte, r.codec.TypeBytes)
   _, err := r.reader.Read(typeBytes)
   if err != nil {
      return nil, err
   }
   typ := readUint(typeBytes, r.codec.TypeBytes)

   // get len

   payloadLenBytes := make([]byte, r.codec.LenBytes)
   _, err = r.reader.Read(payloadLenBytes)
   if err != nil && err != io.EOF {
      return nil, err
   }
   payloadLen := readUint(payloadLenBytes, r.codec.LenBytes)

   if err == io.EOF && payloadLen != 0 {
      return nil, err
   }

   // get value

   v := make([]byte, payloadLen)
   _, err = r.reader.Read(v)
   if err != nil && err != io.EOF {
      return nil, err
   }

   return &Record{
      Type: typ,
      Payload: v,
   }, nil

}

func readUint(b []byte, sz ByteSize) uint {
   reader := bytes.NewReader(b)
   switch sz {
   case OneByte:
      var i uint8
      binary.Read(reader, binary.BigEndian, &i)
      return uint(i)
   case TwoBytes:
      var i uint16
      binary.Read(reader, binary.BigEndian, &i)
      return uint(i)
   case FourBytes:
      var i uint32
      binary.Read(reader, binary.BigEndian, &i)
      return uint(i)
   case EightBytes:
      var i uint64
      binary.Read(reader, binary.BigEndian, &i)
      return uint(i)
   default:
      return 0
   }
}

Next(), our function which will try to move forward on the io.Reader to read a full Record, starts by reading the first bytes into the typ variable, continues by reading the payload length into payloadLenBytes, finally, once we know the payload length we read that many bytes and put into the Record object and return.

Let’s try parsing our TLV-encoded message:

func main() {
  // continuing from previous main(), redacted for brevity

  // ...

  reader := bytes.NewReader(buf.Bytes())
  tlvReader := tlv.NewReader(reader, codec)
  next, _ := tlvReader.Next()
  fmt.Println("type:", next.Type)
  fmt.Println("payload: ", string(next.Payload))
}

Which prints out:

type: 8
payload:  hello, go!

Hooray! We managed to read our TLV-encoded message back into our Record!

Let’s conclude with Borat’s traditional congratulation:

Great Success