Support multiple schemas in Proto3 - protocol-buffers

I am creating a proto3 schema that allow multiple/arbitrary data in incoming jsonObject. I would like to convert the incoming json object in one shot.
for example
{"key1":"value",
"key2": { //schema A}
}
I also want to support for schema B for key2 in different request.
{"key1":"value",
"key2": { //schema B}
}
I tried couple of different approach such as oneof but for oneof it needs different key names, since I am using the same key2, it does not work for me in this case.
and here is the schema.
message IncomingRequest {
string key1 = 1;
//google.protobuf.Any key2 = 2; --> not working
oneof message{
A payload = 2;
B payload = 3; --> duplicate key
}
}
Anyone know how to achieve this?

Two ways I can think of:
Message types for each request
If you know, based on the request (e.g. HTTP URL called), if it should be schema A or B, I recommend to create separate message types for each request. This might result in more proto types you have to define, but will be simple to use in the actual code you have to write to consume the payloads.
Struct type
If you really want/have to reuse the same message type you can use the Struct proto type to encode/decode any JSON structure.
message IncomingRequest {
string key1 = 1;
google.protobuf.Struct key2 = 2;
}
Although from the proto type definition it doesn't look like it does what you want, Protobuf decoders/encoders will handle this type in a special way to give you the behavior wanted.
The issue with this option is that what you gain in flexibility in the proto, you lose in expressiveness in the generated code, as you have to do a lot edge-case checking if a specific value/type is set.

Related

Deriving a hashmap/Btree from a strcut of values in rust

I am trying to do the following:
Deserialise a struct of fields into another struct of different format.
I am trying to do the transformation via an Hashmap as this seems to be the best suited way.
I am required to do this as a part of transformation of a legacy system, and this being one of the intermediary phases.
I have raised another question which caters to a subset of the same use-case, but do not think it gives enough overview, hence raising this question with more details and code.
(Rust converting a struct o key value pair where the field is not none)
Will be merging both questions once I figure out how.
Scenario:
I am getting input via an IPC through a huge proto.
I am using prost to deserialise this data into a struct.
My deserialised struct has n no of fields and can increase as well.
I need to transform this deserialised struct into a key,value struct. (shown ahead).
The incoming data, will mostly have a majority of null keys .i.e out of n fields, most likely only 1,2 have values at a given time.
Input struct after prost deserialisation:
Using proto3 so unable to define optional fields.
Ideally I would prefer a prost struct of options on every field. .i.e Option instead of string.
struct Data{
int32 field1,
int64 field2,
string field3,
...
}
This needs to be transformed to a genric struct as below for further use:
struct TransformedData
{
string Key
string Value
}
So,
Data{
field1: 20
field2: null
field3: null
field4: null
....
}
becomes
TransformedData{
key:"field1"
Value: "20"
}
Methods I have tried:
Method1
Add serde to the prost struct definiton and deserialise it into a map.
Loop over each item in a map to get values which are non-null.
Use these to create a struct
https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=9645b1158de31fd54976926c9665d6b4
This has its challenges:
Each primitve data type has its own default values and needs to be checked for.
Nested structs will result in object data type which needs to be handled.
Need to iterate over every field of a struct to determine non null values.
1&2 can be mitigated by setting an Optional field(yet to figure out how in proto3 and prost)
But I am not sure how to get over 3.
Iterating over a large struct to find non null fields is not scalable and will be expensive.
Method 2:
I am using prost reflects dynamic reflection to deserialise and get only specified value.
Doing this by ensuring each proto message has two extra fields:
proto -> signifying the proto struct being used when serializing.
key -> signifying the filed which has value when serializing.
let fd = package::FILE_DESCRIPTOR_SET;
let pool = DescriptorPool::decode(fd).unwrap();
let proto_message: package::GenericProto = prost::Message::decode(message.as_ref()).expect("error when de-serialising protobuf message to struct");
let proto = proto_message.proto.as_str();
let key = proto_message.key.as_str();
Then using key , I derive the key from what looks to be a map implemented by prost:
let message_descriptor = pool.get_message_by_name(proto).unwrap();
let dynamic_message = DynamicMessage::decode(message_descriptor, message.as_ref()).unwrap();
let data = dynamic_message.get_field_by_name(key).unwrap().as_ref().clone();
Here :
1.This fails when someone sends a struct with multiple fields filled.
2.This does not work for nested objects, as nested objects in turn needs to be converted to a map.
I am looking for the least expensive way to do the above.
I understand the solution I am looking for is pretty out there and I am taking a one size fits all approach.
Any pointers appreciated.

Best way to model gRPC messages

I would like to model messages for bidirectional streaming. In both directions I can expect different types of messages and I am unsure as to what the better practice would be. The two ideas as of now:
message MyMessage {
MessageType type = 1;
string payload = 2;
}
In this case I would have an enum that defines which type of message that is and a JSON payload that will be serialized and deserialized into models both client and sever side. The second approach is:
message MyMessage {
oneof type {
A typeA = 1;
B typeB = 2;
C typeC = 3;
}
}
In the second example a oneof is defined such that only one of the message types can be set. Both sides a switch must be made on each of the cases (A, B, C or None).
If you know all of the possible types ahead of time, using oneof would be the way to go here as you have described.
The major reason for using protocol buffers is the schema definition. With the schema definition, you get types, code generation, safe schema evolution, efficient encoding, etc. With the oneof approach, you will get these benefits for your nested payload.
I would not recommend using string payload since using a string for the actual payload removes many of the benefits of using protocol buffers. Also, even though you don't need a switch statement to deserialize the payload, you'll likely need a switch statement at some point to make use of the data (unless you're just forwarding the data on to some other system).
Alternative option
If you don't know the possible types ahead of time, you can use the Any type to embed an arbitrary protocol buffer message into your message.
import "google/protobuf/any.proto";
message MyMessage {
google.protobuf.Any payload = 1;
}
The Any message is basically your string payload approach, but using bytes instead of string.
message Any {
string type_url = 1;
bytes value = 2;
}
Using Any has the following advantages over string payload:
Encourages the use of protocol buffer to define the dynamic payload contents
Tooling in the protocol buffer library for each language for packing and unpacking protocol buffer messages into the Any type
For more information on Any, see:
https://developers.google.com/protocol-buffers/docs/proto3#any
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto

Protobuf message / enum type rename and wire compatibility?

Is it (practically) possible to change the type name of a protobuf message type (or enum) without breaking communications?
Obviously the using code would need to be adpated to re-compile. The question is if old clients that use the same structure, but the old names, would continue to work?
Example, base on the real file:
test.proto:
syntax = "proto3";
package test;
// ...
message TestMsgA {
message TestMsgB { // should be called TestMsgZZZ going forward
// ...
enum TestMsgBEnum { // should be called TestMsgZZZEnum going forward
// ...
}
TestMsgBEnum foo = 1;
// ...
}
repeated TestMsgB bar = 1;
// ...
}
Does the on-the-wire format of the protobuf payload change in any way if type or enum names are changed?
If you're talking about the binary format, then no: names don't matter and will not impact your ability to load data; For enums, only the integer value is stored in the payload. For fields, only the field-number is stored.
Obviously if you swap two names, confusion could happen, but: it should load as long as the structure matches.
If you're talking about the JSON format, then it may matter.

With Protocol Buffers, is it safe to move enum from inside message to outside message?

I've run into a use case where I'd like to move an enum declared inside a protocol buffer message to outside the message so that other messages van use the same Enum.
ie, I'm wondering if there are any issues moving from this
message Message {
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
optional Enum enum_value = 1;
}
to this
enum Enum {
VALUE1 = 1;
VALUE2 = 2;
}
message Message {
optional Enum enum_value = 1;
}
Would this cause any issues de-serializing data created with the first protocol buffer definition into the second?
It doesn't change the serialization data at all - the location / name of the enums are irrelevant for the actual data, since it just stores the integer value.
What might change is how some languages consume the enum, i.e. how they qualify it. Is it X.Y.Foo, X.Foo, or just Foo. Note that since enums follow C++ naming/scoping rules, some things (such as conflicts) aren't an issue: but it may impact some languages as consumers.
So: if you're the only consumer of the .proto, you're absolutely fine here. If you have shared the .proto with other people, it may be problematic to change it unless they are happy to update their code to match any new qualification requirements.

Protobuf: Nesting a message of arbitrary type

in short, is there a way to define a protobuf Message that contains another Message of arbitrary type? Something like:
message OuterMsg {
required int32 type = 1;
required Message nestedMsg = 2; //Any sort of message can go here
}
I suspect that there's a way to do this because in the various protobuf-implementations, compiled messages extend from a common Message base class.
Otherwise I guess I have to create a common base Message for all sorts of messages like this:
message BaseNestedMessage {
extensions 1 to max;
}
and then do
message OuterMessage {
required int32 type = 1;
required BaseNestedMessage nestedMsg = 2;
}
Is this the only way to achieve this?
The most popular way to do is to make optional fields for each message type:
message UnionMessage
{
optional MsgType1 msg1 = 1;
optional MsgType2 msg2 = 2;
optional MsgType3 msg3 = 3;
}
This technique is also described in the official Google documentation, and is well-supported across implementations:
https://developers.google.com/protocol-buffers/docs/techniques#union
Not directly, basically; protocol buffers very much wants to know the structure in advance, and the type of the message is not included on the wire. The common Message base-class is an implementation detail for providing common plumbing code - the protocol buffers specification does not include inheritance.
There are, therefore, limited options:
use different field-numbers per message-type
serialize the message separately, and include it as a bytes type, and convey the "what is this?" information separately (presumably a discriminator / enumeration)
I should also note that some implementations may provide more support for this; protobuf-net (C# / .NET) supports (separately) both inheritance and dynamic message-types (i.e. what you have above), but that is primarily intended for use only from that one library to that one library. Because this is all in addition to the specification (remaining 100% valid in terms of the wire format), it may be unnecessarily messy to interpret such data from other implementations.
Alternatively to multiple optional fields, the oneof keyword can be used since v2.6 of Protocol Buffers.
message UnionMessage {
oneof data {
string a = 1;
bytes b = 2;
int32 c = 3;
}
}

Resources