placeholder
thoughts and learnings in software engineering by Rotem Tamir

Creating a protoc plugin to generate Go code with protogen

Protocol Buffers (Protobufs) are a popular open-source interface definition language (IDL) that was originally developed at Google. Aside from its benefits as a serialization format - messages are fairly compact and fast to serialize and parse - Protobufs really shine due to their code generation facilities.

The basic idea of any IDL is that data structures (called “Messages” in Protobuf) are described using a relatively simple language-agnostic declaration language, from which programming language specific code is generated automatically. Specifically with Protocol Buffers, if I were to define a simple message:

syntax = "proto3";

package example;

option go_package = "github.com/rotemtam/protoc-gen-go-ascii/example";

message Hello {
  string greeting = 1;
}

I could then use the Protobuf compiler (i.e. code generator) named protoc to generate code (structs, classes, functions, etc.) in any supported language that is able to read and write the binary data into language specific objects in my application code. In my case, I’m interested in Go code, so I need to run:

protoc --go_out=. --go_opt=paths=source_relative -I . example.proto

The protoc code-generator produces a new file next to example.proto, example.pb.go. The file has a lot going on inside, but among other things we can find the struct definition for example.Hello:

type Hello struct {
	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields

	Greeting string `protobuf:"bytes,1,opt,name=greeting,proto3" json:"greeting,omitempty"`
}

The Protobuf ecosystem works so well because the language specific code-generators are completely decoupled from the protoc project, they are implemented as standalone executables called plugins. The documentation explains:

protoc (aka the Protocol Compiler) can be extended via plugins. A plugin is just a program that reads a CodeGeneratorRequest from stdin and writes a CodeGeneratorResponse to stdout. [..] A plugin executable needs only to be placed somewhere in the path. The plugin should be named “protoc-gen-$NAME”, and will then be used when the flag “–${NAME}_out” is passed to protoc.

Broadly speaking, it works like this:

  • protoc parses and validates all of the .proto files passed to it (resolving any dependencies passed to it via the -I flag and from well-known types that are commonly shipped with protoc.
  • protoc looks at its command line arguments, any flag that matches --<plugin>_out is considered a request to invoke plugin. The compiler then looks for an executable named protoc-gen-<plugin> in the current PATH and runs the plugin.
  • protoc serializes the message descriptions it parsed into a CodeGenerationRequest message and writes it to the plugin via stdin
  • The plugin parses the request, generates the source code that the user requested and writes it back as a CodeGenerationResponse serialized to binary representation via stdout.

As you can see, creating custom protoc plugins is pretty straight forward, and in the rest of this post we will see how to create a new one in a few lines of code.

Creating our own protoc plugin

With the introduction complete, we can now get started in implementing our very own protoc plugin in Go! The great news is that when they released the v2 API for Protobuf the Go team included a wonderful little library named protogen that makes it super-easy to write protoc plugins. In this post we will be using it to build a useless protoc plugin that adds a method to each generated Protobuf struct that can print its own type name in ASCII art. You can browse the final code at rotemtam/protoc-gen-go-ascii.

Our final goal is to have our protoc plugin invoked like this:

protoc --go-ascii_out=. --go-ascii_opt=paths=source_relative --go_out=. --go_opt=paths=source_relative -I . example.proto

And have it generate a file named example_ascii.pb.go with contents like:

// Code generated by protoc-gen-go-ascii. DO NOT EDIT.

package example

func (x *Hello) Ascii() string {
	return ` _   _        _  _
| | | |      | || |
| |_| |  ___ | || |  ___
|  _  | / _ \| || | / _ \
| | | ||  __/| || || (_) |
\_| |_/ \___||_||_| \___/
`
}

So when our users use our generated code they can:

package main

import (
	"fmt"

	"github.com/rotemtam/protoc-gen-go-ascii/example"
)

func main() {
	ex := &example.Hello{}
	fmt.Println("going to print example.Hello's ASCII art representation:")
	fmt.Println(ex.Ascii())
}
//Output: 
// going to print example.Hello's ASCII art representation:
//  _   _        _  _
// | | | |      | || |
// | |_| |  ___ | || |  ___
// |  _  | / _ \| || | / _ \
// | | | ||  __/| || || (_) |
// \_| |_/ \___||_||_| \___/

To do this we are going to use a fun Go library named go-figure (link) that can generate lovely ASCII art in over 140 different fonts!

Setting Up

Prerequisites:

We start by creating a new directory for our project and initialzing a go module in it:

mkdir protoc-gen-go-ascii
cd protoc-gen-go-ascii && go mod init protoc-gen-go-ascii

Install our dependencies:

go get -u github.com/common-nighthawk/go-figure
go get -u github.com/golang/protobuf
go get -u google.golang.org/protobuf

Create our example .proto file which we will feed into our protoc plugin. Under example/example.proto put:

syntax = "proto3";

package example;

option go_package = "protoc-gen-go-ascii/example";

message Hello {

}

Let’s write some skeleton code so we can test that everything is wired correctly. We will get deeper into it in a bit. Under cmd/protoc-gen-go-ascii/main.go put:

package main

import (
	"google.golang.org/protobuf/compiler/protogen"
)

func main() {

	protogen.Options{}.Run(func(gen *protogen.Plugin) error {
		for _, f := range gen.Files {
			if !f.Generate {
				continue
			}
			generateFile(gen, f)
		}
		return nil
	})
}

// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) {
	filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
	g := gen.NewGeneratedFile(filename, file.GoImportPath)
	g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
	g.P()
	g.P("package ", file.GoPackageName)
	g.P()

	return g
}

We will unpack what’s going on here soon, but first let’s setup our dev-loop. With protoc installed, we need to compile our plugin, put it in our $PATH and then run protoc with a flag. Here’s a one liner you can run while developing to run your code:

go get -u protoc-gen-go-ascii/cmd/protoc-gen-go-ascii && \ 
    protoc --go_out=. --go_opt=paths=source_relative \
    --go-ascii_out=. --go-ascii_opt=paths=source_relative \
    example/example.proto

The first half of the command builds cmd/protoc-gen-go-ascii and puts it under $GOBIN (which should be in your $PATH). The second half runs protoc with two plugins protoc-gen-go (invoked by the --go_out flag), and our very own protoc-gen-go-ascii (invoked by the --go-ascii_out) flag. The paths=source_relative tells protoc to put the generated files relative to the source proto.

After running the above command you should now find three files under example/:

example
├── example.pb.go
├── example.proto
└── example_ascii.pb.go
  • example.proto is our original .proto definition file which we created in an earlier step
  • example.pb.go contains the Protobuf structs for the Example messaged we defined in our .proto file
  • example_ascii.pb.go was generated from our own plugin, and as we can see it currently only contains:
// Code generated by protoc-gen-go-ascii. DO NOT EDIT.

package example

What’s Happening Here?

Let’s trace back a bit and unpack what our code is doing:

protogen.Options{}.Run(func(gen *protogen.Plugin) error {
	for _, f := range gen.Files {
		if !f.Generate {
			continue
		}
		generateFile(gen, f)
	}
	return nil
})

protogen is a really cool library published by the Go team to help us easily build protoc plugins that generate Go code. According to the protogen.Options.Run,

.. executes a function as a protoc plugin. It reads a CodeGeneratorRequest message from os.Stdin, invokes the plugin function, and writes a CodeGeneratorResponse message to os.Stdout. If a failure occurs while reading or writing, Run prints an error to os.Stderr and calls os.Exit(1).

In other words, to write a protoc plugin all we need to do is implement a function that receives a protogen.Plugin and returns an error if something fails. On this plugin object we can find a Files field, which according to the docs:

Files is the set of files to generate and everything they import. Files appear in topological order, so each file appears before any file that imports it.

Next, in our code, we iterate over the Files, and generate a file from each file that has Generate set to true. Here’s what we do with each file:

// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) {
	filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
	g := gen.NewGeneratedFile(filename, file.GoImportPath)
	g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
	g.P()
	g.P("package ", file.GoPackageName)
	g.P()
}

The Plugin has a function named NewGeneratedFile which creates a super-useful GeneratedFile. Through this object we generate our “response” to protoc. For each file we want to generate, we will create a new GeneratedFile.

GeneratedFile has a “print-like” method named P(). This method behaves very similarly to fmt.Print, so you can pass to it strings and things that implement Stringer, but it has a very useful feature that can manage package imports and qualifying of Go identifiers for us. You can read all about those in the documentation.

And indeed, when our code runs, it creates a GeneratedFile with a filename of <prefix>_ascii.pb followed by the code-gen comment and the package name declaration.

Our ASCII-art Generator

To complete our ASCII-art generator, we will use the go-figure package to generate our ASCII-art. Notice that this happens at code-generation time, so our generated protos do not need to know anything about this package. We will modifygenerateFile:

// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) *protogen.GeneratedFile {
	filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
	g := gen.NewGeneratedFile(filename, file.GoImportPath)
	g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
	g.P()
	g.P("package ", file.GoPackageName)
	g.P()

	for _, msg := range file.Messages {
		fig := figure.NewFigure(msg.GoIdent.GoName, "doom", false)
		g.P("func (x *", msg.GoIdent, ") Ascii() string {")
		g.P("return `", fig.String(), "`")
		g.P("}")
	}

	return g
}

For each message in each file passed to us by protoc we will print back via the P() method a receiver method named Ascii(), that will return the ASCII-art representation (in a font with the pleasant name of doom) of the message name. Re-run our one-liner from above, and observe the changes to example_ascii.pb.go:

// Code generated by protoc-gen-go-ascii. DO NOT EDIT.

package example

func (x *Hello) Ascii() string {
	return ` _   _        _  _
| | | |      | || |
| |_| |  ___ | || |  ___
|  _  | / _ \| || | / _ \
| | | ||  __/| || || (_) |
\_| |_/ \___||_||_| \___/
`
}

Pretty cool! We have just generated an extra receiver method on our Hello struct! Next let’s see how to pass and parse options to our plugin using protoc’s native capabilities.

Options

Users can pass each protoc plugin some custom options using the --<plugin>_opt flag. For our dummy plugin, it would be useful to choose which font we want go-figure to use (remember, there are over 140 of them!). So we’re aiming for something like --go-ascii_opt=font=coolfont. Luckily, this is straightforward to do with protogen. All we need to do is to use Go’s native flag package and hook it into our protogen program using a parameter named ParamFunct that needs to be a function with the signature func(name, value string) error. The cool thing is, as the docs state is:

The (flag.FlagSet).Set method matches this function signature, so parameters can be converted into flags as in the following:

var flags flag.FlagSet
value := flags.Bool("param", false, "")
opts := &protogen.Options{
    ParamFunc: flags.Set,
}
protogen.Run(opts, func(p *protogen.Plugin) error {
    if *value { ... }
    }
) 

That’s really neat! What can I say, the Go team sure can write elegant Go libraries. Applied to our plugin, modify our main function:

var font *string

func main() {
	var flags flag.FlagSet
	font = flags.String("font", "doom", "font list available in github.com/common-nighthawk/go-figure")

	protogen.Options{
		ParamFunc: flags.Set,
	}.Run(func(gen *protogen.Plugin) error {
		for _, f := range gen.Files {
			if !f.Generate {
				continue
			}
			generateFile(gen, f)
		}
		return nil
	})
}

We define a flag named font and hook it into our protogen.Options using flags.Set.

Updating our generateFile function to use this variable:

fig := figure.NewFigure(msg.GoIdent.GoName, *font, false)

If we modify our --go-ascii_opt flag to be: go-ascii_opt=paths=source_relative,font=doh and re-run, we will see our generated code now looks like:

// Code generated by protoc-gen-go-ascii. DO NOT EDIT.

package example

func (x *Hello) Ascii() string {
	return `

HHHHHHHHH     HHHHHHHHH                    lllllll lllllll
H:::::::H     H:::::::H                    l:::::l l:::::l
H:::::::H     H:::::::H                    l:::::l l:::::l
HH::::::H     H::::::HH                    l:::::l l:::::l
  H:::::H     H:::::H      eeeeeeeeeeee     l::::l  l::::l    ooooooooooo
  H:::::H     H:::::H    ee::::::::::::ee   l::::l  l::::l  oo:::::::::::oo
  H::::::HHHHH::::::H   e::::::eeeee:::::ee l::::l  l::::l o:::::::::::::::o
  H:::::::::::::::::H  e::::::e     e:::::e l::::l  l::::l o:::::ooooo:::::o
  H:::::::::::::::::H  e:::::::eeeee::::::e l::::l  l::::l o::::o     o::::o
  H::::::HHHHH::::::H  e:::::::::::::::::e  l::::l  l::::l o::::o     o::::o
  H:::::H     H:::::H  e::::::eeeeeeeeeee   l::::l  l::::l o::::o     o::::o
  H:::::H     H:::::H  e:::::::e            l::::l  l::::l o::::o     o::::o
HH::::::H     H::::::HHe::::::::e          l::::::ll::::::lo:::::ooooo:::::o
H:::::::H     H:::::::H e::::::::eeeeeeee  l::::::ll::::::lo:::::::::::::::o
H:::::::H     H:::::::H  ee:::::::::::::e  l::::::ll::::::l oo:::::::::::oo
HHHHHHHHH     HHHHHHHHH    eeeeeeeeeeeeee  llllllllllllllll   ooooooooooo
`
}

Wow! With just a few lines of code, we can now receive custom options from our users to modify the code-generation behavior of our plugin.

Conclusion

Code generation is an extremely useful practice, especially when working with statically-typed languages. Code-generation automates and standardizes the creation of boilerplate code without sacrificing compile time type-safety and the wonderful support modern IDEs give us in the form of code suggestion and auto-completion.

In organizations that embrace IDLs, the schema descriptions tend to become a central place to describe many aspects of the systems that they are developing. Being able to automatically generate code from these descriptions can be immensely useful. To name a few, validation functions, technical documentation, and database DDL statements can all be generated from schema definitions. I hope this post demonstrated that with the help of protogen writing our own protoc plugins is easy and approachable.