Common proto3 fields with oneof or aggregation - protocol-buffers

I have to produce a proto class for an object which will have around 12 variations. All 12 variations share four fields which are the same, and then have specific fields. In most cases there will be many more non specific fields than there are common fields.
I was wondering what would be the most performant way to achieve this.
First option: defining the common fields in a common proto class and then declaring a field of this type in all the specific types:
message CommonFields {
// common_field1
// ... common_fieldN
}
message SpecificType1 {
CommonFields common = 1;
// specific fields...
}
Or would it be better to define one top level proto which contains the fields, and then having a oneof field, which can refer to another type containing the specific fields:
message BaseType {
// common_field_1
// ... common_field_N
oneof specific_fields {
SpecificTypeFields1 type1_fields = N;
SpecificTypeFields2 type1_fields = N+1;
}
}
message SpecificTypeFields1 {
// specific fields...
}
message SpecificTypeFields2 {
// specific fields...
}
I'm particularly interested in performance and also convention. Or if there are any more typical ways, such as just repeating the common fields.. Bearing in mind though my protos will only have 4 common fields, and typically 3-8 specific ones.

Depending on the protobuf library, there is usually some performance penalty for encoding submessages. For most libraries, such as the Google's own protobuf libraries, the difference is very small. With either of your options, you end up encoding 1 submessage per message, further reducing the impact.
I have seen both formats commonly used. If the decoder side already knows the message type (from e.g. rpc method name), aggregation is usually easier to implement as it doesn't require separately checking the oneof type.
However, if the message type is not known, oneof method is better as it allows easy detection of the type.

Related

What is the relation of protobuf message field id and field order?

I want to understand if the messages bellow are compatible from the perspective of protobuf and serialization/deserialization.
message HelloReply {
string message = 1;
string personalized_message = 2;
}
message HelloReply {
string personalized_message = 2;
string message = 1;
}
Does the order matter for compatibility in any situation?
The textual order is largely irrelevant, although it may impact some code generation tooling - but most languages don't care about declaration order, so even that: won't matter. The fields are still defined semantically equivalent - the numbers match the existing meaning (name) and type. It is the number that is the determining feature in identifying a field.
At the protocol level:
parsers must allow fields in any order
serializers should (but not must) write fields in ascending numerical field order

Can I update a message field from enum to string, and keep its name?

The Code
Consider the following protobuf message declaration:
syntax = "proto3";
enum Airport {
TLV = 0;
JFK = 1;
...
}
message Flight {
...
Airport origin_airport = 11;
...
}
The Problem
Due to some business requirements, we have to set the Airport to be a free string rather than choosing from a closed enumerated list. I know that I can add and remove fields at will, as long as I don't reuse the same number. However, I'm not sure if I can use the same name, a-la:
message Flight {
...
reserved 11; // The old enum field number
string origin_airport = 18; // Unused so far
...
}
My Question
Upon updaing a protobuf3 field type, can the field name be preserved, as long as its number changes?
If you aren't using the JSON variant, then names aren't used at all in the payload, so yes technically it is perfectly legal to reuse names; however: this might lead to unnecessary problems with existing code - depending on existing code and language / framework specific rules, and could cause confusion. Since that is avoidable, I would advocate using a name like origin_airport_code, or similar.
(The point I'm making here: any code that used the old field probably needs attention; I can see some scenarios where the existing code might still compile after the change, but mean something different, and therefore introduce a bug that would have been avoided if you'd changed the name and forced every usage to be visited)

Protobuf message / enum type rename and wire compatibility?

Is it (practically) possible to change the type name of a protobuf message type (or enum) without breaking communications?
Obviously the using code would need to be adpated to re-compile. The question is if old clients that use the same structure, but the old names, would continue to work?
Example, base on the real file:
test.proto:
syntax = "proto3";
package test;
// ...
message TestMsgA {
message TestMsgB { // should be called TestMsgZZZ going forward
// ...
enum TestMsgBEnum { // should be called TestMsgZZZEnum going forward
// ...
}
TestMsgBEnum foo = 1;
// ...
}
repeated TestMsgB bar = 1;
// ...
}
Does the on-the-wire format of the protobuf payload change in any way if type or enum names are changed?
If you're talking about the binary format, then no: names don't matter and will not impact your ability to load data; For enums, only the integer value is stored in the payload. For fields, only the field-number is stored.
Obviously if you swap two names, confusion could happen, but: it should load as long as the structure matches.
If you're talking about the JSON format, then it may matter.

Copy common fields between structs of different types

I have two structs, whose types are as follows:
type UserStruct struct {
UserID string `bson:"user_id" json:"user_id"`
Address string `bson:"address" json:"address"`
Email string `bson:"email" json:"email"`
CreatedAt time.Time `bson:"created_at" json:"created_at"`
PhoneNumber string `bson:"phone_number" json:"phone_number"`
PanCard string `bson:"pancard" json:"pancard"`
Details map[string]string `json:"details"`
}
type SecretsStruct struct {
UserID string `r:"user_id" json:"user_id"`
Secrets []string `r:"secrets" json:secrets`
Address string `r:"address" json:"address"`
Email string `r:"email"json:"email"`
CreatedAt time.Time `r:"created_at"json:"created_at"`
PhoneNumber string `r:"phone_number" json:"phone_number"`
PanCard string `r:"pancard" json:"pancard"`
}
I already have an instance of UserStruct. I want to copy the fields common to both structs from UserStruct to a new instance of SecretStruct, without using reflection.
Go is a statically typed language (and is not Python). If you want to copy fields between the structs, you must either cause code to be supplied at compile time which knows how to do this, or use the reflect library to perform the operation at runtime.
Note that I said "cause code to be supplied at compile time" because you don't have to explicitly write that code. You could use code generation to produce the copy code from the struct definitions, or from a higher-level definition (e.g. XML) which generates both the struct definition and the copying code.
However, good Go programmers prefer clear code over clever solutions. If this is a single localized requirement, writing a code generator to avoid "boilerplate" code is almost certainly overkill; its implementation will take longer than the code to copy the structs, and the associated complexity will introduce a risk of more bugs. Similarly, reflect-based solutions are complicated, not clear, and only recommended in cases where you require a generic or extensible solution, and where this cannot be fulfilled at compile time.
I recommend simply write the copying code, and add appropriate comments to the struct definitions and copy methods to ensure future maintainers are aware of their obligation to maintain the copy methods.
Example
// Define your types - bodies elided for brevity
// NOTE TO MAINTAINERS: if editing the fields in these structs, ensure
// the methods defined in source file <filename>.go are updated to
// ensure common fields are copied between structs on instantiation.
type UserStruct struct { ... }
type SecretStruct struct { ... }
// NewSecretStructFromUserStruct populates and returns a SecretStruct
// from the elements common to the two types. This method must be
// updated if the set of fields common to both structs is changed in
// future.
func NewSecretStructFromUserStruct(us *UserStruct) *SecretStruct {
// You should take care to deep copy where necessary,
// e.g. for any maps shared between the structs (not
// currently the case).
ss := new(SecretStruct)
ss.UserID = us.UserID
ss.Address = us.Address
ss.Email = us.Email
ss.CreatedAt = us.CreatedAt
ss.PhoneNumber = us.PhoneNumber
ss.PanCard = us.PanCard
return ss
}
// You may also consider this function to be better suited as
// a receiver method on UserStruct.

Protobuf: Nesting a message of arbitrary type

in short, is there a way to define a protobuf Message that contains another Message of arbitrary type? Something like:
message OuterMsg {
required int32 type = 1;
required Message nestedMsg = 2; //Any sort of message can go here
}
I suspect that there's a way to do this because in the various protobuf-implementations, compiled messages extend from a common Message base class.
Otherwise I guess I have to create a common base Message for all sorts of messages like this:
message BaseNestedMessage {
extensions 1 to max;
}
and then do
message OuterMessage {
required int32 type = 1;
required BaseNestedMessage nestedMsg = 2;
}
Is this the only way to achieve this?
The most popular way to do is to make optional fields for each message type:
message UnionMessage
{
optional MsgType1 msg1 = 1;
optional MsgType2 msg2 = 2;
optional MsgType3 msg3 = 3;
}
This technique is also described in the official Google documentation, and is well-supported across implementations:
https://developers.google.com/protocol-buffers/docs/techniques#union
Not directly, basically; protocol buffers very much wants to know the structure in advance, and the type of the message is not included on the wire. The common Message base-class is an implementation detail for providing common plumbing code - the protocol buffers specification does not include inheritance.
There are, therefore, limited options:
use different field-numbers per message-type
serialize the message separately, and include it as a bytes type, and convey the "what is this?" information separately (presumably a discriminator / enumeration)
I should also note that some implementations may provide more support for this; protobuf-net (C# / .NET) supports (separately) both inheritance and dynamic message-types (i.e. what you have above), but that is primarily intended for use only from that one library to that one library. Because this is all in addition to the specification (remaining 100% valid in terms of the wire format), it may be unnecessarily messy to interpret such data from other implementations.
Alternatively to multiple optional fields, the oneof keyword can be used since v2.6 of Protocol Buffers.
message UnionMessage {
oneof data {
string a = 1;
bytes b = 2;
int32 c = 3;
}
}

Resources