Binary encoding of variable length options with Golang
Reading and writing TLV-encoded messages with Go
Recently, I’ve worked on implementing a high-throughput networking component at work. In an effort to conserve CPU time and reduce payload sizes, this service is using a binary encoding as the wire format. An attribute of the protocol we were implementing was that the messages could include optional fields of variable length. This, I learned, is not as simple as it sounds.
Adding an extra field to a message in text-based encoding, such as CSV or JSON, is quite trivial. With JSON, for instance, if we want to add an extra field to a message such as:
we simply need to append ,"favorite_number": 42
after the first line of data and before the terminated curly brace }
.
When working in binary protocols we usually don’t have the luxury of picking some character as a delimiter between fields or messages. This is because when transmitting a stream of arbitrary data, any delimiter you pick might included in the data stream. You cannot reserve any of the 256 permutations of eight zeros-and-ones that may appear in a byte, for your delimiter, because they may appear in the data.
It is possible, of course, to pick a delimiter if we decide on an escape character (in the same way we can use quotation marks inside a JSON string if we prepend a backslash, for example: "giant \"laser\""
.
However, by doing so, we must now pay the cost of actually reading the message byte-by-byte looking for these delimiters and accounting for escape characters. In my eyes, a major reason to use a binary protocol is being able to copy bytes from the network interface into memory directly without needing to use CPU to parse at all.
As a consequence, binary protocols usually skip the indulgence of using delimiters to determine field boundaries and resort to fixed length field message structures, with a special field to denote the payload length. For instance, IPv6 headers are always 40-bytes long, with bytes 5–6 reserved for a 16-bit unsigned integer for the payload length.
This allows a parser of the packet to begin by reading the 40-bytes of headers, then peek in bytes 5–6 and know exactly how much to read further to the end of the packet.
3. IPv6 Header Format
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class | Flow Label |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length | Next Header | Hop Limit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Source Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Destination Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Version 4-bit Internet Protocol version number = 6.
Traffic Class 8-bit traffic class field. See section 7.
Flow Label 20-bit flow label. See section 6.
Payload Length 16-bit unsigned integer. Length of the IPv6
payload, i.e., the rest of the packet following
this IPv6 header, in octets. (Note that any
(IPv6 Header structure, Source: IETF RFC 2460)
The problem at hand requires something a bit more elaborate. In our case, we have optional fields, and a multitude of them. Can we have optional, variable length fields without using delimiters?
A common solution for this is an encoding called TLV, shorthand for Type-Length-Value, it is very common in network protocols, and it’s actually the way options are specified within IPv6 extension headers.
A TLV-encoded message is made of 3 parts: bytes signifying its type, bytes signifying the payload length, and zero or more payload bytes. The length of the type field and the length field must be determined a priori to parsing the message in order to allow readers to know what the boundaries of each field are.
Let’s see how we might implement something like this with Golang.
We start off by defining a custom ByteSize type this type will be used in our configuration object Codec to indicate how large our type and length are. Picking the right size of these fields is a tradeoff between future compatibility and total payload size. For instance, if you pick OneByte for your length field size, you will be limited to payloads up to 256-byte in size. If you pick an EightByte size for the type field, for a protocol which you only expect to have a handful of different types, you will end up always sending seven zeroes of wasted bandwidth in each messages. So pick what is right for your protocol.
Our data container is called Record and has an unsigned int Type field (the T in TLV) and a byte slice for the Payload.
Let’s see how we encode some messages:
Let’s run some test code to see how it works:
which prints:
00000000 00 08 00 0a 68 65 6c 6c 6f 2c 20 67 6f 21 |....hello, go!|
Let’s analyze the output. Per our codec definition, the first two bytes will indicate the type of the message, 00 08
in hexadecimal is 8 in decimal which is what we wanted. The next two bytes indicate the length of the payload, 00 0a
in hexa is 10
in decimal which is the length in characters of hello, go!
. Finally, the next bytes are our payload, 68 65 6c 6c 6f 2c 20 67 6f 21
are the hexa values for hello, go!
in ASCII (0x68 = 104 = h
).
Our next step will be to write code to parse a TLV-encoded message back into the Record struct:
Next()
, our function which will try to move forward on the io.Reader
to read a full Record
, starts by reading the first bytes into the typ
variable, continues by reading the payload length into payloadLenBytes
, finally, once we know the payload length we read that many bytes and put into the Record
object and return.
Let’s try parsing our TLV-encoded message:
Which prints out:
type: 8
payload: hello, go!
Hooray! We managed to read our TLV-encoded message back into our Record!
Let’s conclude with Borat’s traditional congratulation: