What is the difference between any and bytes in protobuf 3.0? - byte

As we know, We can use serialize and unserialize API to convert between bytes and message, at the same time, we can use pack and unpack API to convert between any and message. My question is:
what is the difference between any and bytes in protobuf 3.0?
Such as store size, speed and so on.

The only major difference I can see is that Any adds an extra "#type" field, which is a string URL name of the message that it was packed from. Example of the URL field it adds:
#type = "type.googleapis.com/packagename.messagename"
This adds a non-negligible amount of bytes to your message.

I was also stuck up on this issue. But I could not find any answer to this on the net - the Any Type in Proto3 is not well documented online, especially for C++. So, I tried out both of them and the difference lies in what Any and bytes serialize. While Any serializes any arbitrary Message, bytes serializes any string (in C++).
Here is a code snippet for `Any:
// Proto file containing message description for Foo
#include "foo_proto.grpc.pb.h"
// Proto file containing message description for AnyMessage,
// which uses google.protobuf.Any
#include "any_proto.grpc.pb.h"
Foo *foo = new Foo(); // Foo is the message defined in "foo_proto.proto"
// ... Set the variables for message Foo
// Pack Foo into 'Any' message type
Any* any = new Any();
any->PackFrom(*foo);
// Use the above 'Any' message to create AnyMessage object
AnyMessage am;
am.set_allocated_object(any);
However, for bytes you need to pack a string type, instead of the Any object. So, code snippet for bytes may look like:
// Proto file containing message description for Foo
#include "foo_proto.grpc.pb.h"
// Proto file containing message description for BytesMessage,
// which uses the bytes type
#include "bytes_proto.grpc.pb.h"
Foo *foo = new Foo(); // Foo is the message defined in "foo_proto.proto"
// ... Set the variables for message Foo
std::string bytes_string; // Encode the object Foo in this string
// Now, create BytesMessage object
BytesMessage bm;
bm.set_object(bytes_string);
I hope this resolves the query raised in the question. Thanks!

There is no protocol difference between 2.0 and 3.0; in theory your data should be identical.
There may be some small differences relating to how defaults and zeros are handled at the library level - in 3.0 "required" and "optional" don't exist, instead: zeros aren't transmitted (everything is effectively optional with a zero default). This means that previously when you might have explicitly assigned a value of zero, it might have been transmitted. Now it will not be. Of course, this also means that non-zero defaults are simply not possible in 3.0.
Emphasis: everything in the second paragraph is at the serializer level, not the protocol level. The protocol is entirely unchanged.

Related

How to handle a change in the interpretation of a field in a protobuf message?

If a field stores a specific value and is interpreted in a specific manner, is it possible to change this interpretation in a backwards compatible way?
Let's say I have a field that stores values of different data types.
The most generic case is to store it as a byte array and let the apps encode and decode it to the correct data type.
Common cases for data types are integers and strings, so support for those types is present.
Using a oneof structure this looks as follows:
message Foo
{
...
oneof value
{
uint32 integer = 1;
string text = 2;
bytes data = 3;
}
}
Applications that want to store an ip prefix in the value field, have to use the generic data field and do the encoding and decoding correctly.
If I now want to add support for ip prefixes to the Foo message itself so the apps don't have to deal with the encoding and decoding anymore, I could add a new field to the oneof structure with an IpPrefix datatype:
message Foo
{
...
oneof value
{
uint32 integer = 1;
string text = 2;
bytes data = 3;
IpPrefix ip_prefix = 4;
}
}
Even though this makes life easier for the apps, I believe it breaks backwards compatibility.
If a sending app has support for the new field, it will put its ip prefix value in the ip_prefix field.
But if a receiving app does not have support for this new field yet, it will ignore the field.
It will look for the ip prefix value in the data field, as it always did, but it won't find it there.
So the receiving app can no longer correctly read the ip prefix value anymore.
Is there a way to make this scenario somehow backwards compatible?
PS: I realize this is a somewhat vague and perhaps unrealistic example. The exact case I need it for is for the representation of RADIUS attributes in a protobuf message. These attributes are in essence a byte array that is sent over the network, but the bytes in the array have meaning and could be stored as different fields in the protobuf message. A basic attribute exists of a Type field and a Value field where the value field can be a string, integer, ip address... From time to time new datatypes (even complex ones) are added and I would like to be able to add new datatypes in a backwards compatible way.
There are two ways to go about this:
1. Enforce an update schedule, readers before writers
Add the new type of field to the .proto definition, but document that it should not be used except for testing and reception. Document that all readers of the message must support both the old and the new field by a specific milestone/date, after which the writers can start using it. Eventually you can deprecate the old field and new readers don't need to support it anymore.
2. Have both fields during the transition period
message Foo
{
...
oneof value
{
uint32 integer = 1;
string text = 2;
bytes data = 3;
}
IpPrefix ip_prefix = 4;
}
Document that writers should set both data and ip_prefix during the transition period. The readers can start using ip_prefix as soon as writers have added support, and can optionally fall back to data.
Later, you can deprecate data and move ip_prefix to inside the oneof without breaking compatibility.

Safeness of changing proto field number

For example if I changed from:
message Request {
int foo = 1;
}
to
message Request {
int bar = 1;
int foo = 2;
}
Is it safe to change foo from 1 to 2? Docs say not to: These numbers are used to identify your fields in the message binary format, and should not be changed once your message type is in use., but I'd like to know why.
If you have a serialized version of the message generated with the first version, you will no be able to deserialize with the second version of the message.
If you use protobuf to generate a model to store in a DB or to be published in a broker like Apache Kafka, you need to follow the convention. If you use proto buffer to generate model and service for online usage, you should do not break anything (if you will regenerate all the models)
See also the reserved keyword in order to do not reuse an old numer. Further reading also here

Best way to model gRPC messages

I would like to model messages for bidirectional streaming. In both directions I can expect different types of messages and I am unsure as to what the better practice would be. The two ideas as of now:
message MyMessage {
MessageType type = 1;
string payload = 2;
}
In this case I would have an enum that defines which type of message that is and a JSON payload that will be serialized and deserialized into models both client and sever side. The second approach is:
message MyMessage {
oneof type {
A typeA = 1;
B typeB = 2;
C typeC = 3;
}
}
In the second example a oneof is defined such that only one of the message types can be set. Both sides a switch must be made on each of the cases (A, B, C or None).
If you know all of the possible types ahead of time, using oneof would be the way to go here as you have described.
The major reason for using protocol buffers is the schema definition. With the schema definition, you get types, code generation, safe schema evolution, efficient encoding, etc. With the oneof approach, you will get these benefits for your nested payload.
I would not recommend using string payload since using a string for the actual payload removes many of the benefits of using protocol buffers. Also, even though you don't need a switch statement to deserialize the payload, you'll likely need a switch statement at some point to make use of the data (unless you're just forwarding the data on to some other system).
Alternative option
If you don't know the possible types ahead of time, you can use the Any type to embed an arbitrary protocol buffer message into your message.
import "google/protobuf/any.proto";
message MyMessage {
google.protobuf.Any payload = 1;
}
The Any message is basically your string payload approach, but using bytes instead of string.
message Any {
string type_url = 1;
bytes value = 2;
}
Using Any has the following advantages over string payload:
Encourages the use of protocol buffer to define the dynamic payload contents
Tooling in the protocol buffer library for each language for packing and unpacking protocol buffer messages into the Any type
For more information on Any, see:
https://developers.google.com/protocol-buffers/docs/proto3#any
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto

Protobuf message / enum type rename and wire compatibility?

Is it (practically) possible to change the type name of a protobuf message type (or enum) without breaking communications?
Obviously the using code would need to be adpated to re-compile. The question is if old clients that use the same structure, but the old names, would continue to work?
Example, base on the real file:
test.proto:
syntax = "proto3";
package test;
// ...
message TestMsgA {
message TestMsgB { // should be called TestMsgZZZ going forward
// ...
enum TestMsgBEnum { // should be called TestMsgZZZEnum going forward
// ...
}
TestMsgBEnum foo = 1;
// ...
}
repeated TestMsgB bar = 1;
// ...
}
Does the on-the-wire format of the protobuf payload change in any way if type or enum names are changed?
If you're talking about the binary format, then no: names don't matter and will not impact your ability to load data; For enums, only the integer value is stored in the payload. For fields, only the field-number is stored.
Obviously if you swap two names, confusion could happen, but: it should load as long as the structure matches.
If you're talking about the JSON format, then it may matter.

Protocol buffer zero value for integer

I have a Go struct what we are using currently in our restful API which looks like this:
type Req struct {
Amount *int
}
I'm using pointer here, because if the Amount is nil, it means the Amount was not filled, if the Amount isn't nil, but it's zero, it means the field was filled, but the value is zero.
When we started to change to protofiles and we want to use it like, the main API get's the request as HTTP API and send it to the next service through gRPC with the same protofile I faced with the issue, the proto3 can't generate pointer for the Amount. That's fine because the protocol buffers designed for the purpose of sending data between separated systems, but how can I handle the issue above, because if I get the request I can't decide that the Amount is nil or just zero.
proto3 doesn't distinguish between zero and absent; the concepts of defaults and implicit vs explicit values disappeared:
the default value is always zero (or false, etc)
if the value is zero, it isn't sent; otherwise, it is
What you're after is more possible with proto2. Alternatively, just add a separate field to indicate that you have a value for something:
message Req {
int amount = 1;
bool amountHasValue = 2;
}
Or use a nested sub-message, i.e.
message Foo {
Bar bar = 1;
}
message Bar {
int amount = 1;
}
(so; without a value you just send a Foo; with a value, you send a Foo with a Bar, and whatever the amount is: it is)

Resources