I am currently developing a gRPC service in Go with gRPC Gateway as an HTTP proxy. I am generating .pb.go bindings from my .proto files, but I noticed that there are subtle changes to my bindings in two separate but related situations when I wouldn't expect it. Each binding file has a mysterious field var fileDescriptorX = byte[]{.....} where X is actually a number. Both unexpected changes happen with this field and only this field.
My big question: Are these bindings compatible with each other or are changes to this field considered breaking changes, rendering different versions of the bindings incompatible?
First, if I add another proto file to the same folder and it comes alphabetically before the existing protos, the fileDescriptor field will be renamed when I re-generate my Go bindings. The number X at the end of the name of the field fileDescriptorX corresponds to its ordering compared to the other files in the folder. To be clear, if I have a folder with b.proto and b.pb.go, then I add a.proto and then re-reun my compiler to create a.pb.go, b.pb.go's file descriptor will get bumped from 0 to 1, and a.pb.go will get the new filedescriptor0.
Second, since I am using gRPC gateway, I wanted to change the paths in my HTTP options. Let's say I have an RPC in a.proto:
rpc GetFoo(GetFooRequest) returns (Foo) {
option (google.api.http) = {
get: "/v1alpha1/foo"
};
}
When I change the path above to "/api/foo/v1alpha1/foo", a.pb.gw.go changes understandably, but the bytes in a.pb.go's fileDescriptor0 field change as well.
There doesn't seem to be any documentation discussing how these fields are used and if changes in them are incompatible, breaking changes with other bindings. Any help is appreciated. Thanks!
These "file descriptors" are actually binary encodings of everything in your .proto file. The format is defined by descirptor.proto in the Protobuf source code. Any change you make to your .proto files is expected to cause the file descriptors to change.
This isn't documented because this is an internal implementation detail of the generated code. You don't need to worry about what's changing in the generated code. As long as your .proto changes follow the documented backwards-compatibility rules, your protocol will be compatible.
Related
I want to enforce that a .proto file uses only "approved" data types (custom-defined types are valid).
QUESTION
Is there a .proto-file level option where I can say use only: fixed32, fixed64, and any custom messages
Also, I would want to enforce that all bytes types use fixed_length = true
I know I can do this by parsing each file using Python but I'd prefer a builtin option.
The only way this could be enforced is through some form of style-enforcement (linting) when protos are checked in to your source control or prior to protoc compilation.
I don't use it but buf lint may help.
In the docs for FieldMask the paths use the field names (e.g., foo.bar.buzz), which means renaming the message field names can result in a breaking change.
Why doesn't FieldMask use the field numbers to define the path?
Something like 1.3.1?
You may want to consider filing an issue on the GitHub protocolbuffers repo for a definitive answer from the code's authors.
Your proposal seems logical. Using names may be a historical artifact. There's a possibly relevant comment on an issue thread in that repo:
https://github.com/protocolbuffers/protobuf/issues/3793#issuecomment-339734117
"You are right that if you use FieldMasks then you can't safely rename fields. But for that matter, if you use the JSON format or text format then you have the same issue that field names are significant and can't be changed easily. Changing field names really only works if you use the binary format only and avoid FieldMasks."
The answer for your question lies in the fact FieldMasks are a convention/utility developed on top of the proto3 schema definition language, and not a feature of it (and that utility is not present in all of the language bindings)
While you’re right in your observation that it can break easily (as schemas tend evolve and change), you need to consider this design choice from a user friendliness POV:
If you’re building an API and want to allow the user to select the field set present inside the response payload (the common use case for field masks), it’ll be much more convenient for you to allow that using field paths, rather then binary fields indices, as the latter would force the user of the gRPC/protocol generated code to be “aware” of the schema. That’s not always the desired case when providing API as a code software packages.
While implementing this as a proto schema feature can allow the user to have the best of both worlds (specify field paths, have them encoded as binary indices) for binary encoding, it would also:
Complicate code generation requirements
Still be an issue for plain text encoding.
So, you can understand why it was left as an “external utility”.
Without any encryption, if the recipient has the serialized Protobuf file but does not have the generated Protobuf class (they don't have access to the .proto file that define its structure), is it possible for them to get any data in the Protobuf file from the binary?
If they have access to a part of the .proto file (for example, just one related message in the file) can they get a part of that data out from the entire file while skipping other unknown parts?
yes, absolutely; the protoc tool can help with this (see: --decode_raw), as can https://protogen.marcgravell.com/decode - so it should not be treated as "secure" at all
yes, absolutely - that's a key part built into the protocol that allows messages to be extensible such that they can decode the bits they understand and either ignore or just store (for round-trip or "extension" fields) the bits they don't understand
protobuf is not a security device; to someone with the right tools it is just as readable as xml or json, with the slight issue that it can be uncertain how to interpret some values; but: you can infer and guess and reverse engineer
Ok, I have found this page https://developers.google.com/protocol-buffers/docs/encoding
The message discards all the names and is just a pair of key number and values. The generated class might offer some protection for safely reading these data and could not read unknown data. (Sure enough because the generated class was generated from known structure, .proto file)
But if I am an attacker I could reference that Encoding page and try to figure out which area in the binary corresponds to which data. For example, varint might be easy to spot after changing some data. And proceed to write my own .proto file to attack this unknown data or even a custom binary reader that can selectively read part of the binary.
Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?
You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.
I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).
I have several lines of code of a written class with Interface written in testclass.h and implementation written in testclass.m in Xcode. I wish when I update an entry in testclass.m, its counterpart in testclass.h can be updated automatically.
For example, I have an interface for following function in both testclass.h and testclass.m:
-(void)testfunction
And I modified its name to a different one due to some reason in testclass.m to:
-(void)another_test_function
If I want this code to run I need to manually change the entry in the header. Although I'm very new to programming but I can imagine it could be really frustrating if you are trying to modify something in a big program with a lot of different files invoking some modified entry name. I wish Xcode can auto-detect this change and modify the entry in the header file to -(void)another_test_function automatically.
Is there any way I can do that? All I know by searching the internet is that you can use a shortcut to "edit all in scope" but this only affect all the occurrence in the same file, not header file.
Right-click the method name you would like to change (in either the header or the implementation file) and then select Refactor > Rename. You can then change the name of the method, and Xcode will show you what it will change.
If that looks good, you can accept the changes and you're done.