Creating a protoc plugin to generate Go code with protogen
Protocol Buffers (Protobufs) are a popular open-source interface definition language (IDL) that was originally developed at Google. Aside from its benefits as a serialization format - messages are fairly compact and fast to serialize and parse - Protobufs really shine due to their code generation facilities.
The basic idea of any IDL is that data structures (called “Messages” in Protobuf) are described using a relatively simple language-agnostic declaration language, from which programming language specific code is generated automatically. Specifically with Protocol Buffers, if I were to define a simple message:
syntax = "proto3";
package example;
option go_package = "github.com/rotemtam/protoc-gen-go-ascii/example";
message Hello {
string greeting = 1;
}
I could then use the Protobuf compiler (i.e. code generator) named protoc
to generate code (structs, classes, functions, etc.) in any supported language that is able to read and write the binary data into language specific objects in my application code. In my case, I’m interested in Go code, so I need to run:
protoc --go_out=. --go_opt=paths=source_relative -I . example.proto
The protoc
code-generator produces a new file next to example.proto
, example.pb.go
. The file has a lot going on inside, but among other things we can find the struct definition for example.Hello
:
type Hello struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
Greeting string `protobuf:"bytes,1,opt,name=greeting,proto3" json:"greeting,omitempty"`
}
The Protobuf ecosystem works so well because the language specific code-generators are completely decoupled from the protoc
project, they are implemented as standalone executables called plugins. The documentation explains:
protoc (aka the Protocol Compiler) can be extended via plugins. A plugin is just a program that reads a CodeGeneratorRequest from stdin and writes a CodeGeneratorResponse to stdout. [..] A plugin executable needs only to be placed somewhere in the path. The plugin should be named “protoc-gen-$NAME”, and will then be used when the flag “–${NAME}_out” is passed to protoc.
Broadly speaking, it works like this:
protoc
parses and validates all of the.proto
files passed to it (resolving any dependencies passed to it via the-I
flag and from well-known types that are commonly shipped withprotoc
.protoc
looks at its command line arguments, any flag that matches--<plugin>_out
is considered a request to invokeplugin
. The compiler then looks for an executable namedprotoc-gen-<plugin>
in the currentPATH
and runs the plugin.protoc
serializes the message descriptions it parsed into a CodeGenerationRequest message and writes it to the plugin viastdin
- The plugin parses the request, generates the source code that the user requested and writes it back as a CodeGenerationResponse serialized to binary representation via
stdout
.
As you can see, creating custom protoc
plugins is pretty straight forward, and in the rest of this post we will see how to create a new one in a few lines of code.
Creating our own protoc plugin
With the introduction complete, we can now get started in implementing our very own protoc plugin in Go! The great news is that when they released the v2 API for Protobuf the Go team included a wonderful little library named protogen
that makes it super-easy to write protoc plugins. In this post we will be using it to build a useless protoc plugin that adds a method to each generated Protobuf struct that can print its own type name in ASCII art. You can browse the final code at rotemtam/protoc-gen-go-ascii.
Our final goal is to have our protoc plugin invoked like this:
protoc --go-ascii_out=. --go-ascii_opt=paths=source_relative --go_out=. --go_opt=paths=source_relative -I . example.proto
And have it generate a file named example_ascii.pb.go
with contents like:
// Code generated by protoc-gen-go-ascii. DO NOT EDIT.
package example
func (x *Hello) Ascii() string {
return ` _ _ _ _
| | | | | || |
| |_| | ___ | || | ___
| _ | / _ \| || | / _ \
| | | || __/| || || (_) |
\_| |_/ \___||_||_| \___/
`
}
So when our users use our generated code they can:
package main
import (
"fmt"
"github.com/rotemtam/protoc-gen-go-ascii/example"
)
func main() {
ex := &example.Hello{}
fmt.Println("going to print example.Hello's ASCII art representation:")
fmt.Println(ex.Ascii())
}
//Output:
// going to print example.Hello's ASCII art representation:
// _ _ _ _
// | | | | | || |
// | |_| | ___ | || | ___
// | _ | / _ \| || | / _ \
// | | | || __/| || || (_) |
// \_| |_/ \___||_||_| \___/
To do this we are going to use a fun Go library named go-figure
(link) that can generate lovely ASCII art in over 140 different fonts!
Setting Up
Prerequisites:
- Install protoc
- Install protoc-gen-go
We start by creating a new directory for our project and initialzing a go module in it:
mkdir protoc-gen-go-ascii
cd protoc-gen-go-ascii && go mod init protoc-gen-go-ascii
Install our dependencies:
go get -u github.com/common-nighthawk/go-figure
go get -u github.com/golang/protobuf
go get -u google.golang.org/protobuf
Create our example .proto
file which we will feed into our protoc plugin. Under example/example.proto
put:
syntax = "proto3";
package example;
option go_package = "protoc-gen-go-ascii/example";
message Hello {
}
Let’s write some skeleton code so we can test that everything is wired correctly. We will get deeper into it in a bit. Under cmd/protoc-gen-go-ascii/main.go
put:
package main
import (
"google.golang.org/protobuf/compiler/protogen"
)
func main() {
protogen.Options{}.Run(func(gen *protogen.Plugin) error {
for _, f := range gen.Files {
if !f.Generate {
continue
}
generateFile(gen, f)
}
return nil
})
}
// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) {
filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
g := gen.NewGeneratedFile(filename, file.GoImportPath)
g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
g.P()
g.P("package ", file.GoPackageName)
g.P()
return g
}
We will unpack what’s going on here soon, but first let’s setup our dev-loop. With protoc
installed, we need to compile our plugin, put it in our $PATH and then run protoc with a flag. Here’s a one liner you can run while developing to run your code:
go get -u protoc-gen-go-ascii/cmd/protoc-gen-go-ascii && \
protoc --go_out=. --go_opt=paths=source_relative \
--go-ascii_out=. --go-ascii_opt=paths=source_relative \
example/example.proto
The first half of the command builds cmd/protoc-gen-go-ascii
and puts it under $GOBIN (which should be in your $PATH). The second half runs protoc
with two plugins protoc-gen-go
(invoked by the --go_out
flag), and our very own protoc-gen-go-ascii
(invoked by the --go-ascii_out
) flag. The paths=source_relative
tells protoc
to put the generated files relative to the source proto.
After running the above command you should now find three files under example/
:
example
├── example.pb.go
├── example.proto
└── example_ascii.pb.go
example.proto
is our original .proto definition file which we created in an earlier stepexample.pb.go
contains the Protobuf structs for theExample
messaged we defined in our .proto fileexample_ascii.pb.go
was generated from our own plugin, and as we can see it currently only contains:
// Code generated by protoc-gen-go-ascii. DO NOT EDIT.
package example
What’s Happening Here?
Let’s trace back a bit and unpack what our code is doing:
protogen.Options{}.Run(func(gen *protogen.Plugin) error {
for _, f := range gen.Files {
if !f.Generate {
continue
}
generateFile(gen, f)
}
return nil
})
protogen
is a really cool library published by the Go team to help us easily build protoc
plugins that generate Go code. According to the protogen.Options.Run
,
.. executes a function as a protoc plugin. It reads a CodeGeneratorRequest message from os.Stdin, invokes the plugin function, and writes a CodeGeneratorResponse message to os.Stdout. If a failure occurs while reading or writing, Run prints an error to os.Stderr and calls os.Exit(1).
In other words, to write a protoc
plugin all we need to do is implement a function that receives a protogen.Plugin
and returns an error if something fails. On this plugin object we can find a Files
field, which according to the docs:
Files is the set of files to generate and everything they import. Files appear in topological order, so each file appears before any file that imports it.
Next, in our code, we iterate over the Files
, and generate a file from each file that has Generate
set to true. Here’s what we do with each file:
// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) {
filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
g := gen.NewGeneratedFile(filename, file.GoImportPath)
g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
g.P()
g.P("package ", file.GoPackageName)
g.P()
}
The Plugin
has a function named NewGeneratedFile
which creates a super-useful GeneratedFile
. Through this object we generate our “response” to protoc
. For each file we want to generate, we will create a new GeneratedFile
.
GeneratedFile
has a “print-like” method named P()
. This method behaves very similarly to fmt.Print
, so you can pass to it strings and things that implement Stringer
, but it has a very useful feature that can manage package imports and qualifying of Go identifiers for us. You can read all about those in the documentation.
And indeed, when our code runs, it creates a GeneratedFile with a filename of <prefix>_ascii.pb
followed by the code-gen comment and the package name declaration.
Our ASCII-art Generator
To complete our ASCII-art generator, we will use the go-figure
package to generate our ASCII-art. Notice that this happens at code-generation time, so our generated protos do not need to know anything about this package. We will modifygenerateFile
:
// generateFile generates a _ascii.pb.go file containing gRPC service definitions.
func generateFile(gen *protogen.Plugin, file *protogen.File) *protogen.GeneratedFile {
filename := file.GeneratedFilenamePrefix + "_ascii.pb.go"
g := gen.NewGeneratedFile(filename, file.GoImportPath)
g.P("// Code generated by protoc-gen-go-ascii. DO NOT EDIT.")
g.P()
g.P("package ", file.GoPackageName)
g.P()
for _, msg := range file.Messages {
fig := figure.NewFigure(msg.GoIdent.GoName, "doom", false)
g.P("func (x *", msg.GoIdent, ") Ascii() string {")
g.P("return `", fig.String(), "`")
g.P("}")
}
return g
}
For each message in each file passed to us by protoc
we will print back via the P()
method a receiver method named Ascii()
, that will return the ASCII-art representation (in a font with the pleasant name of doom
) of the message name. Re-run our one-liner from above, and observe the changes to example_ascii.pb.go
:
// Code generated by protoc-gen-go-ascii. DO NOT EDIT.
package example
func (x *Hello) Ascii() string {
return ` _ _ _ _
| | | | | || |
| |_| | ___ | || | ___
| _ | / _ \| || | / _ \
| | | || __/| || || (_) |
\_| |_/ \___||_||_| \___/
`
}
Pretty cool! We have just generated an extra receiver method on our Hello
struct! Next let’s see how to pass and parse options to our plugin using protoc
’s native capabilities.
Options
Users can pass each protoc
plugin some custom options using the --<plugin>_opt
flag. For our dummy plugin, it would be useful to choose which font we want go-figure
to use (remember, there are over 140 of them!). So we’re aiming for something like --go-ascii_opt=font=coolfont
. Luckily, this is straightforward to do with protogen
. All we need to do is to use Go’s native flag
package and hook it into our protogen
program using a parameter named ParamFunc
t that needs to be a function with the signature func(name, value string) error
. The cool thing is, as the docs state is:
The (flag.FlagSet).Set method matches this function signature, so parameters can be converted into flags as in the following:
var flags flag.FlagSet value := flags.Bool("param", false, "") opts := &protogen.Options{ ParamFunc: flags.Set, } protogen.Run(opts, func(p *protogen.Plugin) error { if *value { ... } } )
That’s really neat! What can I say, the Go team sure can write elegant Go libraries. Applied to our plugin, modify our main
function:
var font *string
func main() {
var flags flag.FlagSet
font = flags.String("font", "doom", "font list available in github.com/common-nighthawk/go-figure")
protogen.Options{
ParamFunc: flags.Set,
}.Run(func(gen *protogen.Plugin) error {
for _, f := range gen.Files {
if !f.Generate {
continue
}
generateFile(gen, f)
}
return nil
})
}
We define a flag named font
and hook it into our protogen.Options
using flags.Set
.
Updating our generateFile
function to use this variable:
fig := figure.NewFigure(msg.GoIdent.GoName, *font, false)
If we modify our --go-ascii_opt
flag to be: go-ascii_opt=paths=source_relative,font=doh
and re-run, we will see our generated code now looks like:
// Code generated by protoc-gen-go-ascii. DO NOT EDIT.
package example
func (x *Hello) Ascii() string {
return `
HHHHHHHHH HHHHHHHHH lllllll lllllll
H:::::::H H:::::::H l:::::l l:::::l
H:::::::H H:::::::H l:::::l l:::::l
HH::::::H H::::::HH l:::::l l:::::l
H:::::H H:::::H eeeeeeeeeeee l::::l l::::l ooooooooooo
H:::::H H:::::H ee::::::::::::ee l::::l l::::l oo:::::::::::oo
H::::::HHHHH::::::H e::::::eeeee:::::ee l::::l l::::l o:::::::::::::::o
H:::::::::::::::::H e::::::e e:::::e l::::l l::::l o:::::ooooo:::::o
H:::::::::::::::::H e:::::::eeeee::::::e l::::l l::::l o::::o o::::o
H::::::HHHHH::::::H e:::::::::::::::::e l::::l l::::l o::::o o::::o
H:::::H H:::::H e::::::eeeeeeeeeee l::::l l::::l o::::o o::::o
H:::::H H:::::H e:::::::e l::::l l::::l o::::o o::::o
HH::::::H H::::::HHe::::::::e l::::::ll::::::lo:::::ooooo:::::o
H:::::::H H:::::::H e::::::::eeeeeeee l::::::ll::::::lo:::::::::::::::o
H:::::::H H:::::::H ee:::::::::::::e l::::::ll::::::l oo:::::::::::oo
HHHHHHHHH HHHHHHHHH eeeeeeeeeeeeee llllllllllllllll ooooooooooo
`
}
Wow! With just a few lines of code, we can now receive custom options from our users to modify the code-generation behavior of our plugin.
Conclusion
Code generation is an extremely useful practice, especially when working with statically-typed languages. Code-generation automates and standardizes the creation of boilerplate code without sacrificing compile time type-safety and the wonderful support modern IDEs give us in the form of code suggestion and auto-completion.
In organizations that embrace IDLs, the schema descriptions tend to become a central place to describe many aspects of the systems that they are developing. Being able to automatically generate code from these descriptions can be immensely useful. To name a few, validation functions, technical documentation, and database DDL statements can all be generated from schema definitions. I hope this post demonstrated that with the help of protogen
writing our own protoc
plugins is easy and approachable.