Avoiding code duplication in protobuf design - protocol-buffers

message Foo3D {
optional string sha = 1;
optional Type type = 2;
optional Vector3D field_1 = 4;
optional Vector3D field_2 = 5;
}
message Foo2D {
optional string sha = 1;
optional Type type = 2;
optional Vector2D field_1 = 4;
optional Vector2D field_2 = 5;
}
message FooData {
optional Foo2D = 1;
optional Foo3D = 2;
}
message AllFooData {
repeated FooData foo_data = 1;
}
I'm serializing a templated data type which can be of dimension 2 or 3 but that has identical functionality otherwise. Is there a more synthetic way to represent its protobuf definition and avoid code duplication? Perhaps something creating a wrapper for field_1 and field_2? the dimensionality of field_1 and field_2 can be deduced from its type field

I think this is the best way to reuse as much as possible and avoid duplication
message FooData {
optional string sha = 1;
optional Type type = 2;
message Data2D {
optional Vector2D field_1 = 1
optional Vector2D field_2 = 2
}
message Data6D {
optional Vector6D field_1 = 1
optional Vector6D field_2 = 2
}
optional Data2D foo2d = 3;
optional Data6D foo6d = 4;
}
message AllFooData {
repeated FooData foo_data = 1;
}

Related

Set multiple fields as one element of oneof

I want to define a message that can have 2 fields (field A AND field B) XOR one other field (field C alone). I saw I can use the keyword oneof to set the XOR, but only between two fields.
how can I express my needs?
Ideally I want something like (not working)
syntax = "proto3";
message M {
oneof name {
{
string a = 1;
string b = 2;
}
string c = 3;
}
}
Only way I know of is to put the two fields a and b into separate submessage, which you can then put inside the oneof:
syntax = "proto3";
message A {
string a = 1;
string b = 2;
}
message M {
oneof name {
A a = 1;
string c = 3;
}
}
Alternatively you can put all fields into M without oneof. Describe the logic in the comments and check it manually in application code.

ProtoParquetWriter don't write falses, 0s and empty strings

In the following example:
try (ParquetWriter<Example> writer =
new ProtoParquetWriter<>(
new Path("file:/tmp/foo.parquet"),
Example.class,
SNAPPY,
DEFAULT_BLOCK_SIZE,
DEFAULT_PAGE_SIZE)) {
writer.write(
Example.newBuilder()
.setTs(System.currentTimeMillis())
.setTenantId("tenant")
.setSomeFlag(false)
.setSomeInt(1)
.setOtherInt(0)
.build());
}
}
And example .proto file:
syntax = "proto3";
package com.example;
message Example {
uint64 ts = 1;
string tenantId = 2;
bool someFlag = 3;
int32 someInt = 4;
int32 otherInt = 2;
}
The resulting parquet file won't have the fields someFlag and otherInt because they are false and 0 respectively.
Is there a way to make it write it anyway or should I handle this on the reader side?
In proto3, presence tracking was not enabled historically, and the only presence rule was around zero defaults. Fortunately this changed recently in new versions of protoc. The optional keyword can now be used in from of fields in proto3 to enable this. So: add optional, and any compliant implementation should do what you want. The defaults are still zero/false/etc, but if they are explicitly set: they are serialized.
syntax = "proto3";
package com.example;
message Example {
optional uint64 ts = 1;
optional string tenantId = 2;
optional bool someFlag = 3;
optional int32 someInt = 4;
optional int32 otherInt = 2; // [sic]
}
Also, the second 2 should be a 5

What the difference between google.protobuf.Any and google.protobuf.Value?

I want th serialize int/int64/double/float/uint32/uint64 into protobuf, which one should I use ? which one is more effective ?
For example :
message Test {
google.protobuf.Any any = 1; // solution 1
google.protobuf.Value value = 2; // solution 2
};
message Test { // solution 3
oneof Data {
uint32 int_value = 1;
double double_value = 2;
bytes string_value = 3;
...
};
};
In your case, you'd better use oneof.
You can not pack from or unpack to a built-in type, e.g. double, int32, int64, to google.protobuf.Any. Instead, you can only pack from or unpack to a message, i.e. a class derived from google::protobuf::Message.
google.protobuf.Value, in fact, is a wrapper on oneof:
message Value {
// The kind of value.
oneof kind {
// Represents a null value.
NullValue null_value = 1;
// Represents a double value.
double number_value = 2;
// Represents a string value.
string string_value = 3;
// Represents a boolean value.
bool bool_value = 4;
// Represents a structured value.
Struct struct_value = 5;
// Represents a repeated `Value`.
ListValue list_value = 6;
}
}
Also from the definition of google.protobuf.Value, you can see, that there's no int32, int64, or unint64 fields, but only a double field. IMHO (correct me, if I'm wrong), you might lose precision if the the integer is very large. Normally, google.protobuf.Value is used with google.protobuf.Struct. Check google/protobuf/struct.proto for detail.

how to replace proto2 extension with proto3 any when extend different number of field?

I'm trying to learn proto3, and have some questions with any.
I use extension quite much, if my proto is like this:
message base {
extensions 1 to 100;
}
// a.proto
extend base {
optional int32 a = 1;
optional int32 b = 2;
}
// b.proto
extend base {
optional string c = 1;
optional string d = 2;
optional string e = 3;
optional string f = 4;
}
then how to replace these extensions with any ? should i must write like
import google/protobuf/any.proto
message base {
any a = 1;
any b = 2;
any c = 3;
any d = 4;
}
?
may so many proto has extended base.proto and I cannot make sure the max extension number of these protos. then how can I replace these extensions with any?
If I have to write any from 1 to 100 in message base ... oh, that will be too terrible !
You would typically structure it like this:
message base {
any submsg = 1;
}
// a.proto
message submsg_a {
optional int32 a = 1;
optional int32 b = 2;
}
// b.proto
message submsg_b {
optional string c = 1;
optional string d = 2;
optional string e = 3;
optional string f = 4;
}
And then put either submsg_a or submsg_b inside the any field.

How to write an inline array of protobuf

I use Google's ProtoBuf and I set lots of value like the following:
optional string force_sampling = 1;
optional string status = 2;
optional string host = 3;
optional string server_addr = 4;
optional string server_port = 5;
optional string client_addr = 6;
optional string request = 7;
optional string msec = 8;
optional string request_time = 9;
optional string logid = 10;
optional string request_body = 11;
optional string response_body = 12;
optional string other = 100;
So, when I set a value to a message, I write many constructions like the following:
set_logid(); set_request_body(); set_other(); set_request_body(); etc.
Can I have an easier way for doing that?
For example, something like:
array way={"set_logid","set_other"}
for (;i = 0;i < len)
{
sample.way[i]()
}
By the way, set_logid is inline
You can use the Message::GetReflection() function and use it to access the fields by name given in a string.
The documentation is here:
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Reflection
However, this will turn out to be slower and more complex, so it might not be worth it.

Resources