With Protocol Buffers, is it safe to move enum from inside message to outside message? - enums

I've run into a use case where I'd like to move an enum declared inside a protocol buffer message to outside the message so that other messages van use the same Enum.
ie, I'm wondering if there are any issues moving from this
message Message {
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
optional Enum enum_value = 1;
}
to this
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
message Message {
optional Enum enum_value = 1;
}
Would this cause any issues de-serializing data created with the first protocol buffer definition into the second?

It doesn't change the serialization data at all - the location / name of the enums are irrelevant for the actual data, since it just stores the integer value.
What might change is how some languages consume the enum, i.e. how they qualify it. Is it X.Y.Foo, X.Foo, or just Foo. Note that since enums follow C++ naming/scoping rules, some things (such as conflicts) aren't an issue: but it may impact some languages as consumers.
So: if you're the only consumer of the .proto, you're absolutely fine here. If you have shared the .proto with other people, it may be problematic to change it unless they are happy to update their code to match any new qualification requirements.

Related

Can I update a message field from enum to string, and keep its name?

The Code
Consider the following protobuf message declaration:
syntax = "proto3";
enum Airport {
TLV = 0;
JFK = 1;
...
}
message Flight {
...
Airport origin_airport = 11;
...
}
The Problem
Due to some business requirements, we have to set the Airport to be a free string rather than choosing from a closed enumerated list. I know that I can add and remove fields at will, as long as I don't reuse the same number. However, I'm not sure if I can use the same name, a-la:
message Flight {
...
reserved 11; // The old enum field number
string origin_airport = 18; // Unused so far
...
}
My Question
Upon updaing a protobuf3 field type, can the field name be preserved, as long as its number changes?
If you aren't using the JSON variant, then names aren't used at all in the payload, so yes technically it is perfectly legal to reuse names; however: this might lead to unnecessary problems with existing code - depending on existing code and language / framework specific rules, and could cause confusion. Since that is avoidable, I would advocate using a name like origin_airport_code, or similar.
(The point I'm making here: any code that used the old field probably needs attention; I can see some scenarios where the existing code might still compile after the change, but mean something different, and therefore introduce a bug that would have been avoided if you'd changed the name and forced every usage to be visited)

Best way to model gRPC messages

I would like to model messages for bidirectional streaming. In both directions I can expect different types of messages and I am unsure as to what the better practice would be. The two ideas as of now:
message MyMessage {
MessageType type = 1;
string payload = 2;
}
In this case I would have an enum that defines which type of message that is and a JSON payload that will be serialized and deserialized into models both client and sever side. The second approach is:
message MyMessage {
oneof type {
A typeA = 1;
B typeB = 2;
C typeC = 3;
}
}
In the second example a oneof is defined such that only one of the message types can be set. Both sides a switch must be made on each of the cases (A, B, C or None).
If you know all of the possible types ahead of time, using oneof would be the way to go here as you have described.
The major reason for using protocol buffers is the schema definition. With the schema definition, you get types, code generation, safe schema evolution, efficient encoding, etc. With the oneof approach, you will get these benefits for your nested payload.
I would not recommend using string payload since using a string for the actual payload removes many of the benefits of using protocol buffers. Also, even though you don't need a switch statement to deserialize the payload, you'll likely need a switch statement at some point to make use of the data (unless you're just forwarding the data on to some other system).
Alternative option
If you don't know the possible types ahead of time, you can use the Any type to embed an arbitrary protocol buffer message into your message.
import "google/protobuf/any.proto";
message MyMessage {
google.protobuf.Any payload = 1;
}
The Any message is basically your string payload approach, but using bytes instead of string.
message Any {
string type_url = 1;
bytes value = 2;
}
Using Any has the following advantages over string payload:
Encourages the use of protocol buffer to define the dynamic payload contents
Tooling in the protocol buffer library for each language for packing and unpacking protocol buffer messages into the Any type
For more information on Any, see:
https://developers.google.com/protocol-buffers/docs/proto3#any
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto

Protobuf message / enum type rename and wire compatibility?

Is it (practically) possible to change the type name of a protobuf message type (or enum) without breaking communications?
Obviously the using code would need to be adpated to re-compile. The question is if old clients that use the same structure, but the old names, would continue to work?
Example, base on the real file:
test.proto:
syntax = "proto3";
package test;
// ...
message TestMsgA {
message TestMsgB { // should be called TestMsgZZZ going forward
// ...
enum TestMsgBEnum { // should be called TestMsgZZZEnum going forward
// ...
}
TestMsgBEnum foo = 1;
// ...
}
repeated TestMsgB bar = 1;
// ...
}
Does the on-the-wire format of the protobuf payload change in any way if type or enum names are changed?
If you're talking about the binary format, then no: names don't matter and will not impact your ability to load data; For enums, only the integer value is stored in the payload. For fields, only the field-number is stored.
Obviously if you swap two names, confusion could happen, but: it should load as long as the structure matches.
If you're talking about the JSON format, then it may matter.

Stopping omission of default values in Protocol Buffers

I have a proto schema defined as below,
message User {
int64 id = 1;
bool email_subscribed = 2;
bool sms_subscribed = 3;
}
Now as per official proto3 documentation, default values are not serialized to save space during wire transmission. But in my case I want to receive whether the client has explicitly set true/false for fields email_subscribed/sms_subscribed (because the values were true before but now the user wants to unsubscribe). Hence, when the client sends false for any of these fields, the generator code serializer just omits these fields.
How do I achieve this and avoid the omission of these fields for the above scenario?
PS: I am using Javascript as my GRPC client and Python and GRPC Server.
Update: this has changed recently with the re-introduction of presence tracking info proto3 via a new meaning of the optional keyword:
message User {
optional int64 id = 1;
optional bool email_subscribed = 2;
optional bool sms_subscribed = 3;
}
With this change (now available in protoc etc), explicit assignment is transmitted even if it is the implicit default value.
You cannot under proto3. Your best bet is probably to define a tri-bool enum with not-specified as the first item with value zero, and some true / false values after that.
This will require the same space as a protobuf bool, but will not be binary compatible - so you cannot simply change the declared member type on existing messages. Well, I guess if you make true === 1, then at least that still works - and for the transition you'd have to anticipate false / not specified being ambiguous until you've flushed any old data.
The other option is to add a bool fooSpecified member for every bool foo, but that takes more space and is error-prone due to being manual.
Another option will be to use wrappers with proto3. They basically wrap your value in a message so on the parent message it can be left null.
This way you can differentiate null / false / true on your bool field with a some extra work.

Protobuf: Nesting a message of arbitrary type

in short, is there a way to define a protobuf Message that contains another Message of arbitrary type? Something like:
message OuterMsg {
required int32 type = 1;
required Message nestedMsg = 2; //Any sort of message can go here
}
I suspect that there's a way to do this because in the various protobuf-implementations, compiled messages extend from a common Message base class.
Otherwise I guess I have to create a common base Message for all sorts of messages like this:
message BaseNestedMessage {
extensions 1 to max;
}
and then do
message OuterMessage {
required int32 type = 1;
required BaseNestedMessage nestedMsg = 2;
}
Is this the only way to achieve this?
The most popular way to do is to make optional fields for each message type:
message UnionMessage
{
optional MsgType1 msg1 = 1;
optional MsgType2 msg2 = 2;
optional MsgType3 msg3 = 3;
}
This technique is also described in the official Google documentation, and is well-supported across implementations:
https://developers.google.com/protocol-buffers/docs/techniques#union
Not directly, basically; protocol buffers very much wants to know the structure in advance, and the type of the message is not included on the wire. The common Message base-class is an implementation detail for providing common plumbing code - the protocol buffers specification does not include inheritance.
There are, therefore, limited options:
use different field-numbers per message-type
serialize the message separately, and include it as a bytes type, and convey the "what is this?" information separately (presumably a discriminator / enumeration)
I should also note that some implementations may provide more support for this; protobuf-net (C# / .NET) supports (separately) both inheritance and dynamic message-types (i.e. what you have above), but that is primarily intended for use only from that one library to that one library. Because this is all in addition to the specification (remaining 100% valid in terms of the wire format), it may be unnecessarily messy to interpret such data from other implementations.
Alternatively to multiple optional fields, the oneof keyword can be used since v2.6 of Protocol Buffers.
message UnionMessage {
oneof data {
string a = 1;
bytes b = 2;
int32 c = 3;
}
}

Resources