protocol-buffers: fully qualified import names - protocol-buffers

While reading https://developers.google.com/protocol-buffers/docs/proto I ran into the following piece of text:
The protocol compiler searches for imported files in a set of
directories specified on the protocol compiler command line using the
-I/--proto_path flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the
--proto_path flag to the root of your project and use fully qualified names for all imports.
What exactly do they mean that --proto_path should point at the root of the project, and use fully-qualified import names?
For example my project tree has the following structure:
$HOME/
my_project/
docs/
src/
lib/
proto/
p1.proto
p2.proto
proto is where I keep .proto schemas. So for the above layout I should call protoc as --proto_path=$HOME/my_project/proto ? Not sure what should import look like?

Yes --proto_path=$HOME/my_project/proto and you should try this to prove it to yourself.
There are 2 "paths" that are important and I think I've not seen it documented:
the proto_path(s) which are the path(s) on the local file system to the root (!) of the import path of protos
the import paths extending from the proto_path(s) to specific proto files.
Extending your example. If you wanted to namespace the protos foo.protobuf such that p1.proto and p2.proto included package foo.protobuf;, p1.proto and p2.proto would need to be placed in subdirectory foo/protobuf beneath my_project/proto and the proto_path would be unchanged.
This can be seen with Google's Well-Known Types. These are in protoc/include alongside protoc/bin and each proto is in the subdirectory google/protobuf because each e.g. Any is package google.protobuf;.
Unfortunately (!) the well-known types are confusingly (usually) not included in proto_path because they are in a (well-)known location for protoc. However, they are implicitly (and you can explicitly references them as) --proto_path=/path/to/protoc/include

Related

VSCode look for Go packages in different directory

I successfully used rules_go to build a gRPC service:
go_proto_library(
name = "processor_go_proto",
compilers = ["#io_bazel_rules_go//proto:go_grpc"],
importpath = "/path/to/proto/package",
proto = ":processor_proto",
deps = ["//services/shared/proto/common:common_go_proto"],
)
However, I'm not sure how to import the resulting file in VSCode. The generated file is nested under bazel_bin and under the original proto file path; so to import this, it seems like I would need to write out the entire path (including the bazel_bin part) to the generated Go file. To my understanding, there doesn't seem to be a way to instruct VSCode to look under certain folders that only contain Go packages/files; everything seems to need a go.mod file. This makes it quite difficult to develop in.
For clarity, my directory structure looks something like this:
WORKSPACE
bazel-bin
- path
- to
- generated_Go_file.go
go.mod
go.sum
proto
- path
- to
- gRPC_proto.proto
main.go
main.go should use the generated_Go_file.go.
Is there a way around this?
I don't use Bazel and so cannot help with the Bazel configuration. It's likely there is a way to specify the generated code location so that you can revise this to reflect you preference.
The outline you provide of the generated code, is workable though and a common pattern. Often the generated proto|gRPC code is placed in a module's gen subdirectory.
This is somewhat similar to vendoring where your code incorporates what may often be a 3rd-party's stubs (client|server) into your code. The stubs must reflect the proto(s) package(s) and, when these are 3rd-party, using gen or bazel-bin provide a way to keep potentially multiple namespaces discrete.
You're correct that the import for main.go, could (!) be prefixed with the module name from go.mod (first line) followed by the folder path to the generated code. This is standard go packaging and treats the generated code in a similar way to vendored modules.
Another approach is to use|place the generated code in a different module.
For code generated from 3rd-party protos, this may be preferable and the generated code may be provided by the 3rd-party in a module that you can go get or add to your go.mod.
An example of this approach is Google Well-Known Types. The proto (sources) are bundled with protoc (lib directory) and, when protoc compiles sources that references any of these, the Go code that is generated includes imports that reference a Google-hosted location of the generated code (!) for these types (google.golang.org/protobuf/types/known).
Alternatively, you can replicate this behavior without having to use an external repo. The bazel-bin folder must be outside of the current module. Each distinct module in bazel-bin, would need its own go.mod file. You would include in a require block in your code's go.mod file references to the modules' (one or more) locations. You don't need to publish the module to a external repo but can simply require ( name => path/to/module ) to provide a local reference.

Module XXX found, but does not contain package XXX

Not so familiar with Golang, it's probably a stupid mistake I made... But still, I can't for the life of me figure it out.
So, I got a proto3 file (let's call it file.proto), whose header is as follows:
syntax = "proto3";
package [package_name];
option go_package = "github.com/[user]/[repository]";
And I use protoc:
protoc --go_out=$GOPATH/src --go-grpc_out=$GOPATH/src file.proto
So far so good, I end up with two generated files (file.pb.go and file_grpc.pb.go) inside /go/src/github.com/[user]/[repository]/, and they are defined inside the package [package_name].
Then, the code I'm trying to build has the following import:
import (
"github.com/[user]/[repository]/[package_name]"
)
And I naively thought it would work. However, it produces the following error when running go mod tidy:
go: downloading github.com/[user]/[repository] v0.0.0-20211105185458-d7aab96b7629
go: finding module for package github.com/[user]/[repository]/[package_name]
example/xxx imports
github.com/[user]/[repository]/[package_name]: module github.com/[user]/[repository]#latest found (v0.0.0-20211105185458-d7aab96b7629), but does not contain package github.com/[user]/[repository]/[package_name]
Any idea what I'm doing wrong here? Go version is go1.19 linux/amd64 within Docker (golang:1.19-alpine).
Note: I also tried to only import github.com/[user]/[repository], same issue obviously.
UPDATE:
OK so what I do is that I get the proto file from the git repository that only contains the proto file:
wget https://raw.githubusercontent.com/[user]/[repository]/file.proto
Then I generate go files from that file with protoc:
protoc --go_out=. --go-grpc_out=. file.proto
Right now, in current directory, it looks like:
- directory
| - process.go
| - file.proto
| - github.com
| - [user]
| - [repository]
| - file.pb.go
| - file_grpc.pb.go
In that same directory, I run:
go mod init xxx
go mod tidy
CGO_ENABLED=0 go build process.go
The import directive in process.go is as follows:
import (
"xxx/github.com/[user]/[repository]"
)
Now it looks like it finds it, but still getting a gRPC error, which is weird because nothing changed. I still have to figure out if it comes from the issue above or not. Thanks!
Your question is really a number of questions in one; I'll try to provide some info that will help. The initial issue you had was because
At least one file with the .go extension must be present in a directory for it to be considered a package.
This makes sense because importing github.com/[user]/[repository] would be fairly pointless if that repository does not contain any .go files (i.e. the go compiler could not really do anything with the files).
Your options are:
Copy the output from protoc directly into your project folder and change the package declarations to match your package. If you do this there is no need for any imports.
Copy (or set go_out argument to protoc) the output from protoc into a subfolder of your project. The import path will then be the value of the module declaration in your go.mod plus the path from the folder that the go.mod is in (this is what you have done).
Store the files in a repo (on github or somewhere else). This does not need to be the same repo as your .proto files if you "want it to be agnostic" (note that 2 & 3 can be combined if the generated files will only be used within one code base or the repo is accessible to all users).
Option 1 is simple but its often beneficial to keep the generated code separate (makes it clear what you should not edit and improves editor autocomplete etc).
Option 2 is OK (especially if protoc writes the files directly and you set go_package appropriately). However issues may arise when the generated files will be used in multiple modules (e.g. as part of your customers code) and your repo is private. They will need to change go_package before running protoc (or search/replace the package declarations) and importing other .proto files may not work well.
Option 3 is probably the best approach in most situations because this works with the go tooling. You can create github.com/[user]/goproto (or similar) and put all of your generated code in there. To use this your customers just need to import github.com/[user]/goproto (no need to run protoc etc).
Go Modules/package intro
The go spec does not detail the format of import paths, leaving it up to the implementation:
The interpretation of the ImportPath is implementation-dependent but it is typically a substring of the full file name of the compiled package and may be relative to a repository of installed packages.
As you are using go modules (pretty much the default now) the implementations rules for resolving package paths (synonym of import path) can be summarised as:
Each package within a module is a collection of source files in the same directory that are compiled together. A package path is the module path joined with the subdirectory containing the package (relative to the module root). For example, the module "golang.org/x/net" contains a package in the directory "html". That package’s path is "golang.org/x/net/html".
So if your "module path" (generally the top line in a go.mod) is set to xxx (go mod init xxx) then you would import the package in subfolder github.com/[user]/[repository] with import xxx/github.com/[user]/[repository] (as you have found). If you got rid of the intervening folders and put the files into the [repository] subfolder (directly off your main folder) then it would be import xxx/[repository]
You will note in the examples above that the module names I used are paths to repo (as opposed to the xxx you used in go mod init xxx). This is intentional because it allows the go tooling to find the package when you import it from a different module. For example if you had used go mod init github.com/[user]/[repository] and option go_package = "github.com/[user]/[repository]/myproto";" then the generated files should go into the myproto folder in your project and you import them with import github.com/[user]/[repository]/myproto.
While you do not have to follow this approach I'd highly recommend it (it will save you from a lot of pain!). It can take a while to understand the go way of doing this, but once you do, it works well and makes it very clear where a package is hosted.

Importing Protofile from a different path

I am trying to import a Proto file into another one from a different folder and haven't been successful in doing so. Here's the scenario:
I have a .Proto in folder ....\abc\protos\ProtoA.proto and another one in folder ....\def\protos\ProtoB.proto.
I need ProtoA to import ProtoB but it's in a different folder and using Import "....\def\protos\ProtoB.proto" doesn't work because it doesn't like "...." in the path.
What are the steps i need to follow to import the file in correctly from a different path?
It's confusing and I'm unsure whether it's effectively explained in the docs.
Proto file imports are absolute to the proto package and the package structure must be preserved in the filing system structure.
However, the absolute disk location is only important when using protoc per #Brits comment so that the compiler can find the protos.
So.... Your import for ....\def\protos\ProtoB.proto should reflect the specific package and service or method or message name that you're importing not its disk location (which is what you're using).
Then, when you protoc, you should --proto_path and give (I think absolute not relative) paths to the filing system locations that contain the protos needed to be imported.
Have a look at Any by way of example.
In a proto, you import "google.protobuf.Any", it's package plus the message name.
When you protoc it, Any is often already in the include path but, if it weren't, you'd need to --proto_path=/path/to/foo if foo is the root directory containing google/protobuf/any.proto; the proto file must be in a directory called protobuf in a directory called google for the import to work.
If you're familiar with Golang and GOPATH, this mirrors how Go packages are named by their directory (not file) name and referenced locally by their location being in the GOPATH; it's now different with Go Modules.

Understanding protobuf import and output relative paths

I am fairly certain this is operator error and I am at the point I am not thinking clearly.
Here is the setup:
$GOPATH/src/github.com/<company>/<service a>/proto/a.proto
$GOPATH/src/github.com/<company>/<service b>/proto/b.proto
etc.
Now in the proto file I am using imports similar to go (perhaps the issue) such that a.proto has:
import "github.com/<company>/<service b>/b.proto"
I have possibly two separate issues.
I cannot get the import to compile properly using go:generate protoc
I cannot get the output a.pb.go file to be placed in the $GOPATH/src/github.com/<company>/<service a>/proto/ path.
I have attempted multiple configurations probably not in the correct combination.
Using option go_package = "github.com/<company>/<service b>/proto" in each .proto file
Multiple variations of go generate;
go:generate protoc --proto_path=.:$GOPATH/src --go_out=$GOPATH/src a.proto
go:generate protoc --proto_path=.:$GOPATH/src --go_out=. a.proto
go:generate protoc --go_out=import_prefix=github.com/<company>/:. api.proto
I clearly have a poor understanding on how protoc looks at import paths and file outputs. Anyone point me in the direction of what I am doing wrong?
Thanks!
Update #1
In a.proto;
option go_package = "github.com/<company>/<service a>/proto";
import "github.com/<company>/<service b>/proto/b.proto";
and the go generate;
//go:generate protoc --proto_path=$GOPATH/src --go_out=$GOPATH/src/github.com/<company>/<service a>/proto a.proto
Which is called from a go file in the proto directory with the a.proto.
I received the error;
a.proto: File does not reside within any path specified using --proto_path (or -I). You must specify a --proto_path ch encompasses this file. Note that the proto_path must be an exact prefix of the .proto file names -- protoc is too dumb to figure out when two paths (e.g. absolute and relative) are equivalent (it's harder than you think).
I have confirmed $GOPATH is to the expected location.
Solution
Thanks to Shivam Jindal for pointing me in the correct direction. The import is exactly as described in his solution. The output was a problem of my misusing both --go_out and option go_package. I used the go_package to specify the output location and --go_out to specify the root similar to --proto_path. Now everything works.
option go_package = "github.com/<company>/<service a>/proto";
and
//go:generate protoc --proto_path=$GOPATH/src/ --go_out=$GOPATH/src/ $GOPATH/src/github.com/<company>/<service a>/proto/a.proto
Thanks!
Firstly, option go_package is not meant for other dependency import at all, it's the Go package name where the new proto bindings for Go (a.pb.go file) will be placed.
Now coming to the output file location, I can see you are using go-generate. Firstly it depends from which directory you are invoking that if the path used in --go_out= is relative path. I would say use absolute paths. If you want to put the output file in that location you mentioned, use --go_out=$GOPATH/src/github.com/<company>/<service a>/proto/ in go-generate.
To correctly import the other file b.proto in your a.proto use the fully qualified import path as you have done. Just that use --proto_path $GOPATH/src in go-generate. Also please update the question with the errors you are seeing in case it does not work.
Please see this for more information on import paths.

What is the purpose of the package declaration?

Every Go file starts with package <something>.
As far as I understand - and this is probably where I am missing some information - there are only two possible values for <something>: The name of the directory it is in*, or main. If it is main, all other files in that directory can only have main, too. If it is something else, the project is inconsistent/violating convention.
Now if it is the name of the directory, it's redundant, because the same information is, well, in the name of the directory.
If it is main, it's kind of useless, because as far as I can see there is no way to tell go build to "please build all main packages".
* Because, in other words, one directory is one package.
The name of the package does not have to coincide with the directory name. It is possible to have package foobar in the directory xyz/go-foobar. In this case, xyz/go-foobar becomes an import path, but the package name that you use to quality the identifiers (functions, types etc.) would be foobar.
Here's an example to make it more concrete: I created a test package http://godoc.org/github.com/dmitris/go-foobar (source in https://github.com/dmitris/go-foobar) - you can see from the documentation page, that the import path is "github.com/dmitris/go-foobar" but the package name is foobar, so you would call the function it provides as foobar.Demo() (not go-foobar.Demo()).
A similar real-life example - the import path for the NSQ Messaging platform is "github.com/nsqio/go-nsq" while the package name is "nsq": http://godoc.org/github.com/nsqio/go-nsq. However, for the sake of user-friendliness and simplicity, the standard and recommended practice is to keep the last portions of the import path and the package name being the same whenever possible.
package main is not useless - it tells the Go compiler to create an executable as opposed to a .a library file (with go install or go get; go build discards the compilation result). The executable is named after the directory name in which the package main file or files are placed. Again a concrete example - I made a test program https://github.com/dmitris/go-foobar-client, you install it with go get github.com/dmitris/go-foobar-client and you should get a go-foobar-client executable placed in your $GOPATH/bin directory. It is from the the directory name where the package main file is placed that the Go compiler takes the name of the executable from. The filename of the .go file that contains the main() function is not important - in the example above, we can rename main.go to client.go or something else, but as long as the enclosing directory is called go-foobar-client, that's how the resulting executable will be named.
For an additional accessible and practically oriented reading about Go packages, I recommend Dave Cheney's article "Five suggestions for setting up a Go project" http://dave.cheney.net/2014/12/01/five-suggestions-for-setting-up-a-go-project.
The missing information you "have" is that the package name does not need to be the same as the directory name.
It is perfectly fine to use a package name other than the folder name. If you do so, you still have to import the package based on the directory structure, but after the import you have to refer to it by the name you used in the package clause.
For example, if you have a folder $GOPATH/src/mypck, and in it you have a file a.go:
package apple
const Pi = 3.14
Using this package:
package main
import (
"mypck"
"fmt"
)
func main() {
fmt.Println(apple.Pi)
}
Just like you are allowed to use relative imports but is not advisable, you may use package names other that their containing folder, but this is not advisable also to avoid further misunderstanding.
Note that the specification doesn't even require all files belonging to the same package to be in the same folder (but it may be an implementation requirement). Spec: Package clause:
A set of files sharing the same PackageName form the implementation of a package. An implementation may require that all source files for a package inhabit the same directory.
What's the use of this?
Simple. A package name is a Go identifier:
identifier = letter { letter | unicode_digit } .
Which allows unicode letters to be used in identifiers, e.g. αβ is a valid identifier in Go. Folder and file names are not handled by Go but by the Operating System, and different file systems have different restrictions. There are actually many file systems which would not allow all valid Go identifiers as folder names, so you would not be able to name your packages what otherwise the language spec would allow.
So in one hand not all valid Go identifiers may be valid folder names. And on the other hand, not all valid folder names are valid Go identifiers, for example go-math is a valid folder name in most (all?) file systems, but it's not a valid Go identifier (as identifiers cannot contain the dash - character).
Having the option to use package names different than their containing folders, you have the option to really name your packages what the language spec allows, regardless of the underlying operating and file system, and put it in a folder named anything that the underlying OS and file system allows - regardless of the package name.

Resources