Protobuf: Nesting a message of arbitrary type - protocol-buffers

in short, is there a way to define a protobuf Message that contains another Message of arbitrary type? Something like:
message OuterMsg {
required int32 type = 1;
required Message nestedMsg = 2; //Any sort of message can go here
}
I suspect that there's a way to do this because in the various protobuf-implementations, compiled messages extend from a common Message base class.
Otherwise I guess I have to create a common base Message for all sorts of messages like this:
message BaseNestedMessage {
extensions 1 to max;
}
and then do
message OuterMessage {
required int32 type = 1;
required BaseNestedMessage nestedMsg = 2;
}
Is this the only way to achieve this?

The most popular way to do is to make optional fields for each message type:
message UnionMessage
{
optional MsgType1 msg1 = 1;
optional MsgType2 msg2 = 2;
optional MsgType3 msg3 = 3;
}
This technique is also described in the official Google documentation, and is well-supported across implementations:
https://developers.google.com/protocol-buffers/docs/techniques#union

Not directly, basically; protocol buffers very much wants to know the structure in advance, and the type of the message is not included on the wire. The common Message base-class is an implementation detail for providing common plumbing code - the protocol buffers specification does not include inheritance.
There are, therefore, limited options:
use different field-numbers per message-type
serialize the message separately, and include it as a bytes type, and convey the "what is this?" information separately (presumably a discriminator / enumeration)
I should also note that some implementations may provide more support for this; protobuf-net (C# / .NET) supports (separately) both inheritance and dynamic message-types (i.e. what you have above), but that is primarily intended for use only from that one library to that one library. Because this is all in addition to the specification (remaining 100% valid in terms of the wire format), it may be unnecessarily messy to interpret such data from other implementations.

Alternatively to multiple optional fields, the oneof keyword can be used since v2.6 of Protocol Buffers.
message UnionMessage {
oneof data {
string a = 1;
bytes b = 2;
int32 c = 3;
}
}

Related

grpc and protobuf - How to handle a new field when the other side is not releasing in sync

I've got a situation where the other end of the grpc communication is not in sync with their releases. My higher ups, would like me to therefore add a field that is going to work if the other side does or doesn't fill it out, for a short time period (like two weeks)
I believe I can do this by adding it to the end of the proto message such that the indices for the other fields do not change. From, what I've Googled, the optional field is not avail prior to version 3.15, so I have to use a work around.
The workaround that was described to me was to use oneof. However, I am not 100% sure what that looks like. All examples show the oneof field by itself. Are the indices that belong to the oneof values indendent of the indices that belong to the rest of the message?
message TestMessage {
string somefield = 1;
int someotherfield = 2;
oneof mynewoptionalfield
{
string mynewfield = ???? Does this have to be 3 or is it 1?
int ifihadanother = ???? Does this need to be 4 or 2?
}
}
Questions:
What are the indices I use where the ??? marks are
Is this the proper work around to use when the other side isn't going to recompile and deploy with the changes to the protofile?
How do I then check if the field was filled in my C++ code?
Your use-case is exactly what protobufs were designed to handle. All you need to do is: add a new field to the message. In the easiest case, the client application code simply doesn't look at the new field until the server roll-out is complete and so doesn't notice sometimes it is present and other times missing.
You are correct that you should not change the indices (field ids) of the pre-existing fields. Although I'll note that you can add your new field anywhere within the message; the order the fields are written in does not matter for protobuf.
So you'd just add another field like:
message TestMessage {
string somefield = 1;
int someotherfield = 2;
string mynewfield = 3;
}
You don't have to use 3 as the id. You could use 4, or 10, or 10000. But small numbers are more efficient for protobuf and it is typical to just choose the "next" id. On-the-wire protobuf uses the id to identify the field, so it is important you don't change the id later.
In protobuf 3, all fields are "optional" in the protobuf 2 sense; there are no "required" fields. However, protobuf 2 also provided "field presence" for all fields. Protobuf 3 only provided field presence for oneofs and messages... until the recent re-introduction of the "optional" keyword.
In protobuf 3 if you call textMessage.getMynewfield() it will always return a non-null string. If the string was not sent, it will use the empty string (""). For integers 0 is returned and for messages the "default message" (all defaults) is returned. This is plenty for many use-cases, and may be enough for you.
But let's say you need to distinguish between "" and <notsent>. That's what field presence provides. Messages in protobuf 3 have "has" methods that return true if a value is present. But primitives don't have that presence information. One option is to "box" the primitive with standard wrappers that make the primitive a message. Another option available in newer versions of protobuf is the optional keyword. Both options will provide a method like textMessage.hasMynewfield().
message TestMessage {
string somefield = 1;
int someotherfield = 2;
google.protobuf.StringValue mynewfield = 3;
// -or-
optional string mynewfield = 3;
}

Best way to model gRPC messages

I would like to model messages for bidirectional streaming. In both directions I can expect different types of messages and I am unsure as to what the better practice would be. The two ideas as of now:
message MyMessage {
MessageType type = 1;
string payload = 2;
}
In this case I would have an enum that defines which type of message that is and a JSON payload that will be serialized and deserialized into models both client and sever side. The second approach is:
message MyMessage {
oneof type {
A typeA = 1;
B typeB = 2;
C typeC = 3;
}
}
In the second example a oneof is defined such that only one of the message types can be set. Both sides a switch must be made on each of the cases (A, B, C or None).
If you know all of the possible types ahead of time, using oneof would be the way to go here as you have described.
The major reason for using protocol buffers is the schema definition. With the schema definition, you get types, code generation, safe schema evolution, efficient encoding, etc. With the oneof approach, you will get these benefits for your nested payload.
I would not recommend using string payload since using a string for the actual payload removes many of the benefits of using protocol buffers. Also, even though you don't need a switch statement to deserialize the payload, you'll likely need a switch statement at some point to make use of the data (unless you're just forwarding the data on to some other system).
Alternative option
If you don't know the possible types ahead of time, you can use the Any type to embed an arbitrary protocol buffer message into your message.
import "google/protobuf/any.proto";
message MyMessage {
google.protobuf.Any payload = 1;
}
The Any message is basically your string payload approach, but using bytes instead of string.
message Any {
string type_url = 1;
bytes value = 2;
}
Using Any has the following advantages over string payload:
Encourages the use of protocol buffer to define the dynamic payload contents
Tooling in the protocol buffer library for each language for packing and unpacking protocol buffer messages into the Any type
For more information on Any, see:
https://developers.google.com/protocol-buffers/docs/proto3#any
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto

Protobuf message / enum type rename and wire compatibility?

Is it (practically) possible to change the type name of a protobuf message type (or enum) without breaking communications?
Obviously the using code would need to be adpated to re-compile. The question is if old clients that use the same structure, but the old names, would continue to work?
Example, base on the real file:
test.proto:
syntax = "proto3";
package test;
// ...
message TestMsgA {
message TestMsgB { // should be called TestMsgZZZ going forward
// ...
enum TestMsgBEnum { // should be called TestMsgZZZEnum going forward
// ...
}
TestMsgBEnum foo = 1;
// ...
}
repeated TestMsgB bar = 1;
// ...
}
Does the on-the-wire format of the protobuf payload change in any way if type or enum names are changed?
If you're talking about the binary format, then no: names don't matter and will not impact your ability to load data; For enums, only the integer value is stored in the payload. For fields, only the field-number is stored.
Obviously if you swap two names, confusion could happen, but: it should load as long as the structure matches.
If you're talking about the JSON format, then it may matter.

Common proto3 fields with oneof or aggregation

I have to produce a proto class for an object which will have around 12 variations. All 12 variations share four fields which are the same, and then have specific fields. In most cases there will be many more non specific fields than there are common fields.
I was wondering what would be the most performant way to achieve this.
First option: defining the common fields in a common proto class and then declaring a field of this type in all the specific types:
message CommonFields {
// common_field1
// ... common_fieldN
}
message SpecificType1 {
CommonFields common = 1;
// specific fields...
}
Or would it be better to define one top level proto which contains the fields, and then having a oneof field, which can refer to another type containing the specific fields:
message BaseType {
// common_field_1
// ... common_field_N
oneof specific_fields {
SpecificTypeFields1 type1_fields = N;
SpecificTypeFields2 type1_fields = N+1;
}
}
message SpecificTypeFields1 {
// specific fields...
}
message SpecificTypeFields2 {
// specific fields...
}
I'm particularly interested in performance and also convention. Or if there are any more typical ways, such as just repeating the common fields.. Bearing in mind though my protos will only have 4 common fields, and typically 3-8 specific ones.
Depending on the protobuf library, there is usually some performance penalty for encoding submessages. For most libraries, such as the Google's own protobuf libraries, the difference is very small. With either of your options, you end up encoding 1 submessage per message, further reducing the impact.
I have seen both formats commonly used. If the decoder side already knows the message type (from e.g. rpc method name), aggregation is usually easier to implement as it doesn't require separately checking the oneof type.
However, if the message type is not known, oneof method is better as it allows easy detection of the type.

With Protocol Buffers, is it safe to move enum from inside message to outside message?

I've run into a use case where I'd like to move an enum declared inside a protocol buffer message to outside the message so that other messages van use the same Enum.
ie, I'm wondering if there are any issues moving from this
message Message {
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
optional Enum enum_value = 1;
}
to this
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
message Message {
optional Enum enum_value = 1;
}
Would this cause any issues de-serializing data created with the first protocol buffer definition into the second?
It doesn't change the serialization data at all - the location / name of the enums are irrelevant for the actual data, since it just stores the integer value.
What might change is how some languages consume the enum, i.e. how they qualify it. Is it X.Y.Foo, X.Foo, or just Foo. Note that since enums follow C++ naming/scoping rules, some things (such as conflicts) aren't an issue: but it may impact some languages as consumers.
So: if you're the only consumer of the .proto, you're absolutely fine here. If you have shared the .proto with other people, it may be problematic to change it unless they are happy to update their code to match any new qualification requirements.

Resources