I want th serialize int/int64/double/float/uint32/uint64 into protobuf, which one should I use ? which one is more effective ?
For example :
message Test {
google.protobuf.Any any = 1; // solution 1
google.protobuf.Value value = 2; // solution 2
};
message Test { // solution 3
oneof Data {
uint32 int_value = 1;
double double_value = 2;
bytes string_value = 3;
...
};
};
In your case, you'd better use oneof.
You can not pack from or unpack to a built-in type, e.g. double, int32, int64, to google.protobuf.Any. Instead, you can only pack from or unpack to a message, i.e. a class derived from google::protobuf::Message.
google.protobuf.Value, in fact, is a wrapper on oneof:
message Value {
// The kind of value.
oneof kind {
// Represents a null value.
NullValue null_value = 1;
// Represents a double value.
double number_value = 2;
// Represents a string value.
string string_value = 3;
// Represents a boolean value.
bool bool_value = 4;
// Represents a structured value.
Struct struct_value = 5;
// Represents a repeated `Value`.
ListValue list_value = 6;
}
}
Also from the definition of google.protobuf.Value, you can see, that there's no int32, int64, or unint64 fields, but only a double field. IMHO (correct me, if I'm wrong), you might lose precision if the the integer is very large. Normally, google.protobuf.Value is used with google.protobuf.Struct. Check google/protobuf/struct.proto for detail.
Related
I am facing with the problem that protobuf I defined enum but its value is int32
Now I want someway or somehow to change all the protobuf defined to string
Or any code-hack for doing it in gateway without changing the protobuf.
Enum defined
enum TimeUnit {
seconds = 0;
minutes = 1;
hours = 2;
days = 3;
months = 4;
}
message CacheDuration {
uint32 Value = 1;
TimeUnit Units = 2;
}
What i got from generated code now is
And it is the return value for front end to use. So they would see the value of Units = int32 like this:
The services communicate by generated struct protobuf.
I want to make it change to
"Units":"days"
Thanks
You can use String method in your go code:
generatedTimeUnitEnum.String() // output: days
I'm designing a protobuf to represent an event, where each event can hold extra fields.
There are a lot of possible extra fields (~100), but only a small portion of them will be used in each message (~3)
Each extra field will be used only once, but multiple of them can exist, therefore I would like to have a concept of an anyof message type, but unfortunately, there is no such thing in protobuf.
So to try and mock this behavior, and as mentioned in this discussion I thought I can put all my extra fields in a oneof, wrap it with a message, and use this message as repeated in my event:
message ExtraField {
oneof extra_field_value {
string extraData1 = 1;
uint64 extraData2 = 2;
....
SomeOtherMessage extraData100 = 100;
}
}
message MyEvent {
uint64 timestamp = 1;
string event_name = 2;
string some_other_data = 3;
...
repeated ExtraField extra_fields = 8;
}
Even though this solution is more explicit for my understanding, it isn't the most memory effective, and the repeated message with oneof implementation allows to add the same extra field more than once (unwanted behavior)
I can also just write all the extra fields as-is in an inner message, but most of them will be empty all the time
message ExtraFields {
string extraData1 = 1;
uint64 extraData2 = 2;
....
SomeOtherMessage extraData100 = 100;
}
message MyEvent {
uint64 timestamp = 1;
string event_name = 2;
string some_other_data = 3;
...
extraFields extra_fields = 8;
}
If I understand correctly, using empty fields in a message isn't going to make my serialized data larger, and therefore the second protobuf design is the preferred practice
Am I correct?
Is there another protobuf design for my needs?
What's the difference between using an Enum and a oneof kind in protobuf3? As far as I can tell, an Enum restricts the field to be one of a predefined set of values, but so does the oneof kind.
Enums are named numbers. You define the names in the enum definition and assign them a value. An enum should always have the value zero it it's set.
enum State {
A = 0;
B = 1;
C = 2;
}
Next you can use this enum in any of your message
message Update {
State currentState = 1;
State previousState = 2;
}
A oneof is something very different. It allows you to send different types but only allocate limited memory for them. This as you can only set one of those types at a time. This is similar to an union in C/C++ or a std::variant as of C++17.
Take this example in which we have a message, a integer and double defined in our oneof.
// The message in our oneof
message someMsg {
// Multiple fields
}
// The message holding our oneof
message msgWithOneof {
oneof theOneof {
someMsg msg = 1;
int32 counter = 2;
double value = 3;
}
// Feel free to add more fields her of before the oneof
}
You are only able to set msg, counter or value at one time. If you set another this will clear the other field.
Assuming an C/C++ implementation the largest field will determine the amount of memory allocated. Say someMsg is the largest, setting a integer or double will not be a problem as they fit in the same space. If you not use a oneof the total memory allocated would be the sum of sizeof(someMsg) + sizeof(int32) + sizeof(double).
There is some overhead to keep track which field has been set. In the google C++ implementation this is a bit in the presence variable. This is similar to fields which are marked optional.
In the following example:
try (ParquetWriter<Example> writer =
new ProtoParquetWriter<>(
new Path("file:/tmp/foo.parquet"),
Example.class,
SNAPPY,
DEFAULT_BLOCK_SIZE,
DEFAULT_PAGE_SIZE)) {
writer.write(
Example.newBuilder()
.setTs(System.currentTimeMillis())
.setTenantId("tenant")
.setSomeFlag(false)
.setSomeInt(1)
.setOtherInt(0)
.build());
}
}
And example .proto file:
syntax = "proto3";
package com.example;
message Example {
uint64 ts = 1;
string tenantId = 2;
bool someFlag = 3;
int32 someInt = 4;
int32 otherInt = 2;
}
The resulting parquet file won't have the fields someFlag and otherInt because they are false and 0 respectively.
Is there a way to make it write it anyway or should I handle this on the reader side?
In proto3, presence tracking was not enabled historically, and the only presence rule was around zero defaults. Fortunately this changed recently in new versions of protoc. The optional keyword can now be used in from of fields in proto3 to enable this. So: add optional, and any compliant implementation should do what you want. The defaults are still zero/false/etc, but if they are explicitly set: they are serialized.
syntax = "proto3";
package com.example;
message Example {
optional uint64 ts = 1;
optional string tenantId = 2;
optional bool someFlag = 3;
optional int32 someInt = 4;
optional int32 otherInt = 2; // [sic]
}
Also, the second 2 should be a 5
because there is no extend in proto3, so I combine the base message with google.protobuf.Any type message, but it's binary length is too long
.proto file
message TradeMessage {
google.protobuf.Any message = 1;
string code = 2;
}
message Connect {
int32 seq = 1;
string appid = 2;
string clientid = 3;
string ver = 4;
}
...
.java file
TradeProtocol.Connect inner = TradeProtocol.Connect.newBuilder()
.setSeq(1)
.setAppid("test")
.build();
TradeProtocol.TradeMessage packet = TradeProtocol.TradeMessage.newBuilder()
.setMessage(Any.pack(inner))
.setCode(2)
.build();
service send packet to client, client can decode all message to base TradeMessage, the problem is the inner's length is 8 bytes, while packet's length is 56 bytes. the same function implement use proto2's extend just ten more bytes, so is there any way to implement extend function in proto3 or reduce the packet's length ? thanks
One alternative is to use oneof:
message Connect {
int32 seq = 1;
string appid = 2;
string clientid = 3;
string ver = 4;
}
message TradeMessage {
string code = 1;
oneof inner {
Connect inner_connect = 2;
SomeOtherMessage inner_other = 3;
...
}
}
The encoded size will still be larger than with extend, but only by 1-2 bytes.