Can proto2 talk to proto3? - protocol-buffers

I have two applications that talk to eachother via GPB messages. Both were using proto3, but found out one will have to use proto2. If the messages are the same, can one program use proto2 to compile while the other uses proto3? Or do they need to be compiled with the same version of proto.

The wire format is very similar, so it will work to some extent. However, there are some caveats:
Distinction of required/optional fields does not exist on proto3. You should make all the fields optional on proto2 side to avoid errors about missing required fields.
When proto3 encodes fields, any fields that have zero value will be missing when decoded on proto2 side. If you specify zero as default value on proto2 side, it should work out ok.
Extensions and Any type will be quite difficult to use in a way that would be compatible with both.

Related

Can the google protobuf library be used to serialize an unknown binary protobuf into a ruby object, then back again?

I understand that the binary protobuf format has no context (without the proto def), but all the same, I have a requirement to be able to deserialize it into a ruby object that can be enumerated and changed, and then reserialize the object back into binary protobuf format.
The google protobuf docs for ruby are really light in comparison to the other supported languages, so it isn't clear if this is possible, or how to go about it.
If the google protobuf library isn't the best choice, is there a better one (that supports all the protobuf versions)?
No, the protocol buffer wire format is not self-describing. Without a schema file there's only a limited amount of information which can be extracted.
There is protoscope, a sort of decompiler. That will show you how much information you can get from the wire format, though it is not exact.
After a bit of experimentation, I found that I couldn't get what I wanted from the Google library. However, the wire protocol is documented here, and there are only six variable types used.
In the end I created a simple class that can deserialize all the TLVs, and for the LEN type, recognise (and permute strings), recurse on embedded messages, and skip everything else.

Is it a good practice in protobuf3 using optional to check nullability?

I noticed that they bring optional back in protobuf 3.15. I'm trying to use optional to check field presence. But I'm still unclear regarding the philosophy behind this.
Here is my usecase:
I'm providing some services that accept protobuf as my input. But the client side is untrusted from my perspective, therefore I have to check the nullability for the input protobuf.
The way I expect is that,
for a required field, either it's set, or it's null,
for an optional field, I don't care I can just use a default value and that won't cause any problem from my system
So I end up adding optional to every field that should not be null so that I can use hasXXX to check the presence. This looks weird to me because those fileds are actually required from my perspective, but I have to add optioanl keyword for them all.......I'm not sure whether this is a good practice. Proto experts pls give me some suggestions.
Also the default value doesn't make sense to me at all regarding nullability checking, since zero or empty string usually have their own meaning in many scenarios.
The entire point of optional in proto3 is to be able to distinguish between, for example:
no value was specified for field Foo
the field Foo was explicitly assigned the value that happens to be the proto3 default (zero/empty-string/false/etc)
In proto3 without optional: the above both look identical (which is to say: the field is omitted)
If you don't need to distinguish between those two scenarios: you don't need optional, but using optional also isn't likely to hurt much - worst case, a few extra zero/empty-string/false values get written to the wire, but they're small anyway.
Google's API Design guide discourages the usage of the optional keyword. Better practice is to make use of the google.api.field_behavior annotation for describing the field's behaviour.
It is however not recommended to use the optional annotation at all [1]. If one consistently implements the field behaviour annotations then OPTIONAL is redundant and can be omitted.
Check out AIP 203 for an overview of the various behaviour types along with guidelines around the usage of OPTIONAL fields.
In general, Google's API Improvement Proposals are a great reference for good practices in your API design.

Decoding Protobuf encoded data using non-supported platform

I am new to Protobufs; I haven't had much exposure to them. One of the API endpoints we require data from, uses Protobuf encoded data. This generally wouldn't be an issue if I was using a 'supported' language such as JavaScript, Java, Python or even R to decode the data...
Unfortunately, I am trying to automate the process using Alteryx. Rather than this being an Alteryx specific question, I have a few questions about Protobufs themselves so I understand this situation better. I've read through the implementation of Protobufs in Java and Python, and have a basic understanding of how to use them.
To surmise (please correct me if I am wrong), a Protobuf is a method of serializing structured data where a .proto schema is used to encode / decode data into raw binary. My confusion lies with the compiler. Google documentation and examples for Python / Java show how a Protobuf compiler (library) is required in order to run the encoding and decoding process. Reading the Google website, it advises that the Protobufs are 'language neutral and platform neutral', but I can't see how that is possible if you need the compiler (and .proto file!) to do the decoding. For example, how would anyone using a language outside of the languages where Google have a compiler created possibly decode Protobuf encoded data? Am I missing something?
I figure I'm missing something, since it seems weird that a public API would force this constraint.
"language/platform neutral" here simply means that you can reliably get the same data back from any language/framework/platform. The serialization format is defined independently and does not rely on the nuances of any particular framework.
This might seem a low bar, but you'd be surprised how many serialization formats fail to clear it.
Because the format is specified, anyone can create a tool for some other platform. It is a little fiddly if you're not used to dealing in bits, but: totally doable. The protobuf landscape is not dependent on Google - here's a list of some of the known non-Google tools: https://github.com/protocolbuffers/protobuf/blob/master/docs/third_party.md
Also, note that technically you don't even need a .proto; you just need some mechanism for specifying which fields map to which field numbers (since protobuf doesn't include the names). Quite a few in that list can work either from a .proto, or from the field/number map being specified in some other way. The advantage of .proto is simply that it is easy to convey as the schema - and again: isn't tied to any particular language. You can write plugins for "protoc" to add your own tooling, so you don't need to write your own parser from scratch. Or you can write your own parser from scratch if you prefer.
You can't speak of non-supported platform in this case: it is more about languages for which you can't find a protobuf implementation.
My 2 cents is: if you can't find a protobuf implementation for your language, find another language you're familiar with (and popular in protobuf community) and handle the protobuf serialization/deserialization with it. Then call it via a REST API, a executable ... whatever

Is un-deprecating a protobuf field allowed?

We have a protobuf type where we added a field, and then never implemented the features using it (so it was [deprecated=true] to hint that it should not be used). Several years later, the time has come, and now we do want to use that field after all.
Is it safe to just remove the [deprecated=true] and start using the field, or is that likely to break anything?
A field with the same type and semantics already exists on another message, so it would be very nice to use the name we gave it initially, rather than adding a new field and bloating the definition with two similar fields.
Edit: The proto3 language guide section on options has this to say:
deprecated (field option): If set to true, indicates that the field is deprecated and should not be used by new code. In most languages this has no actual effect. In Java, this becomes a #Deprecated annotation. In the future, other language-specific code generators may generate deprecation annotations on the field's accessors, which will in turn cause a warning to be emitted when compiling code which attempts to use the field. If the field is not used by anyone and you want to prevent new users from using it, consider replacing the field declaration with a reserved statement.
The only thing your clients will "notice", will be the missing deprecation warning in Java that they may have been used to, if they are still using the deprecated field. All fields are optional since proto3, so this should not break anything.

How is protobuf 3 singular different than optional

Looking at the proto3 reference:
https://developers.google.com/protocol-buffers/docs/proto3#simple
It says this about singular:
singular: a well-formed message can have zero or one of this field (but not more than one).
It's not clear to me how this is different than optional. Is singular just an explicit way of stating that something is optional (which is now implicit for proto3)? Or is there something else this does that I'm missing?
Thanks.
Optional is proto2 syntax. Singular is proto3 syntax.
In proto3, singular is the default rule. As today the documentation needs to be improved, and there's an open issue: google/protobuf#3457.
See also google/protobuf#2497 why messge type remove 'required,optional'?, and also haberman's comment on GoogleCloudPlatform/google-cloud-python#1402:
I think the question is: what are you trying to do? Why is it relevant
to you whether a field is set or not and what do you intend to do with
this information?
In proto3, field presence for scalar fields simply doesn't exist. Your
mental model for proto3 should be that it's a C++ or Go struct. For
integers and strings, there is no such thing as being set or not, it
always has a value. For submessages, it's a pointer to the submessage
instance which can be NULL, that's why you can test presence for it.

Resources