Many subclasses for one base class - protobuf performance - protocol-buffers

Right now I have an application where my iPhone app sends a request that is handled in .NET/C#, serialized to XML, and parsed on the app in objective-c. The current response class structure has one base class (BaseResponse) and many (over 25) subclasses for each type of request that corresponds to different things that need to be returned. Right now I'm looking to see if protobuf would be faster and easier than XML. From what I understand, a .proto file for this class structure is:
Message BaseResponse {
Required Int field1 = 1;
Optional SubResponse1 sub1= 2;
Optional SubResponse2 sub2 = 3;
Etc....
}
Message SubResponse1 {
....
}
Message SubResponse2 {
....
}
Etc for each sub response.
My question is: if I have over 25 of these optional elements (of which only 1 will be non null), does that completely wipe away the size and performance benefit of using protobuf? Does protobuf make sense for this application?

No, it does not impact the performance benefit - you'll just need to check which one is non-null in the objective-C code. Since protobuf only serializes the non-null values it will still be very efficient over the wire. The protobuf specification itself doesn't actually include inheritance, so you are right to say that you need to spoof it via encapsulation - but since you mention C#, note that what you have described (including how the data appears on the wire, i.e. it will be 100% comptible) can be done directly via inheritance if you use protobuf-net as the C# implementation - which should be possible with your existing model. For example:
[ProtoContract]
[ProtoInclude(2, typeof(SubResponse1))]
[ProtoInclude(3, typeof(SubResponse2))]
public class BaseResponse
{
// note Name and IsRequired here are optional - only
// included to match your example
[ProtoMember(1, IsRequired = true, Name="field1")]
public int Field1 { get; set; }
/*...*/
}
[ProtoContract]
public class SubResponse1 : BaseResponse
{/*...*/}
[ProtoContract]
public class SubResponse2 : BaseResponse
{/*...*/}
You can get the .proto via:
var proto = Serializer.GetProto<BaseResponse>();
Which gives:
message BaseResponse {
required int32 field1 = 1 [default = 0];
// the following represent sub-types; at most 1 should have a value
optional SubResponse1 SubResponse1 = 2;
optional SubResponse2 SubResponse2 = 3;
}
message SubResponse1 {
}
message SubResponse2 {
}

Related

How to organize proto file to re-use message if possible?

I recently started working with protobuf in my golang project. I created below simple protobuf file. I have three different endpoints.
GetLink takes CustomerRequest as an input parameter and returns back CustomerResponse
GetBulkLinks takes BulkCustomerRequest as an input parameter and returns back BulkCustomerResponse
StreaLinks takes StreamRequest as an input parameter and returns back CustomerResponse
I am wondering if there is any way we can improve below proto file because CustomerRequest and BulkCustomerRequest has mostly everything in common except resources field so there is a duplication. And same goes with StreamRequest input parameter as it only takes clientId as the input parameter. Is there anything in protocol buffer which can reuse stuff from another message type?
Is there any better or efficient way to organize below proto file which reuses message accordingly?
syntax = "proto3";
option go_package = "github.com/david/customerclient/gen/go/data/v1";
package data.v1;
service CustomerService {
rpc GetLink(CustomerRequest) returns (CustomerResponse) {};
rpc GetBulkLinks(BulkCustomerRequest) returns (BulkCustomerResponse) {};
rpc StreaLinks(StreamRequest) returns (CustomerResponse) {};
}
message CustomerRequest {
int32 clientId = 1;
string resources = 2;
bool isProcess = 3;
}
message BulkCustomerRequest {
int32 clientId = 1;
repeated string resources = 2;
bool isProcess = 3;
}
message StreamRequest {
int32 clientId = 1;
}
message CustomerResponse {
string value = 1;
string info = 2;
string baseInfo = 3;
string link = 4;
}
message BulkCustomerResponse {
map<string, CustomerResponse> customerResponse = 1;
}
Is there anything in protocol buffer which can reuse stuff from another message type?
Composition.
However keep in mind that request and response payloads may change over time. Even if it looks like they have something in common today, they may diverge tomorrow. After all they are used in different RPCs. Then excessive coupling achieves the opposite effect and becomes technical debt.
Since your schema has literally no more than three fields for each message, I would leave everything as is. Anyway, if you really must, you may consider the following:
Extract the GetLink and GetBulkLinks common fields in a separate message and compose with that:
message CustomerRequestParams {
int32 clientId = 1;
bool isProcess = 2;
}
message CustomerRequest {
CustomerRequestParams params = 1;
string resources = 2;
}
message BulkCustomerRequest {
CustomerRequestParams params = 1;
repeated string resources = 2;
}
StreamRequest looks just fine with repeating clientId. Arguably a stream is conceptually different from a unary RPC, so just keep them separated. And it's just one field.

Is there a way to use a proto oneof field as a type in another message?

Suppose I have a proto message like this:
message WorkflowParameters {
oneof parameters {
WorkflowAParams a = 1;
WorkflowBParams b = 2;
}
}
And I want to have another message where the type of workflow can be specified. Something like this:
message ListWorkflowsRequest {
// The type of workflows to fetch
WorkflowParameters.parameters workflow_type = 1;
}
The above doesn't work (it throws "WorkflowParameters.parameters" is not a type.) What's the recommended way of doing this?
It's not possible. oneof is only a thin syntatic sugar/behavior change, and has no effect on the actual schema. It affects the generated code's behavior, but not the serialized format. In the following example, these two messages are interchangeable and wire-compatible:
message WorkflowParameters {
oneof parameters {
WorkflowAParams a = 1;
WorkflowBParams b = 2;
}
}
message WorkflowParameters2 {
WorkflowAParams a = 1;
WorkflowBParams b = 2;
}
Now, if you just want to specify which part of a oneof will be set, you could theoretically use the generated code constants, and a simple int field:
message ListWorkflowsRequest {
// The field number of WorkflowParameters that should be filled.
int32 workflow_type = 1;
}
All language generators should have convenient enough constants created, like WorkflowParameters::A_FIELD_NUMBER for C++.

In a proto3 message is there a way to mark a field as not required for requests and required for response?

I have the following proto3 message structure:
message BaseBuildContent {
string locale = 1;
string buildVersion = 2;
string buildLabel = 3;
google.protobuf.Timestamp createTime = 4;
}
I am using the "same" structure for some requests and responses on my app. What I want to achieve is to mark somehow (if possible) the createTime field as not required, in case we are talking about a request object, and required in case we are taking about a response object.
Is it possible to do this without creating a separate message ?
Thanks
To my knowledge, it's not possible and I'd discourage pursuing solutions other than defining distinct message types: one which includes the optional field and one which does not.
One way to solve this is to define a message that includes the mandatory fields and another than extends it:
message BaseBuildContent {
string locale = 1;
string buildVersion = 2;
string buildLabel = 3;
}
message SomeRequest {
BaseBuildContent content = 1;
}
message SomeResponse {
BaseBuildContent content = 1;
google.protobuf.Timestamp createTime = 2;
}
NOTE Protobuf style guide recommends message names be PascalCased and field names be snake_cased.

How are shared/placed the int of the ProtoMember/ProtoInclude in ProtoBuf?

I've several questions on how/where the ID of a [ProtoContract] should be declared.
Imagine the following code:
[ProtoContract]
[ProtoInclude(100, typeof(SomeClassA))]//1) CAN I USE 1 here?
public abstract class RootClass{
[ProtoMember(1)]
public int NodeId {get;set;}
}
[ProtoContract]
[ProtoInclude(200, typeof(SomeClassC)]//2) Should I declare this here or directly on the RootClass?
//3) Can I use the id 100 here?
//4) Can I use the id 1 here? or member + include share the id?
public class SomeClassA : RootClass{
[ProtoMember(1)]//5) CAN I USE 1 here? Since the parent already use it but it's a different class
public String Name{get;set;}
}
[ProtoContract]
public class SomeClassC : SomeClassA {
[ProtoMember(2)]
public int Count{get;set;}
}
[ProtoContract]
public class SomeClassD : SomeClassA {
[ProtoMember(2)] //6) Can I use 2 here? Since SomeClassC already use it and is a sibling?
public int Count{get;set;}
}
I've put several number with questions:
CAN I USE 1 here?
Should I declare this here or directly on the RootClass?
Can I use the id 100 here?
Can I use the id 1 here? or member + include share the id?
CAN I USE 1 here? Since the parent already use it but it's a different class
Can I use 2 here? Since SomeClassC already use it and is a sibling?
The thing is that we have a huge model with a lot of classes, which all herits from the same object, so I'm trying to figure out to which ID I should take care.
Short version:
the set of field numbers for a type is the union of the numbers defined against members (fields and properties), and the numbers defined for immediate subtypes (includes)
the set of field numbers must be unique within that single type - it is not required to consider base types or derived types
Longer version:
The reason for this is that subtypes are essentially mapped as optional fields:
[ProtoContract]
[ProtoInclude(100, typeof(SomeClassA))]
public abstract class RootClass{
[ProtoMember(1)]
public int NodeId {get;set;}
}
[ProtoContract]
[ProtoInclude(200, typeof(SomeClassC)]
public class SomeClassA : RootClass{
[ProtoMember(1)]
public String Name{get;set;}
}
[ProtoContract]
public class SomeClassC : SomeClassA {
[ProtoMember(2)]
public int Count{get;set;}
}
is, in terms of proto2 syntax:
message RootClass {
optional int32 NodeId = 1;
optional SomeClassA _notNamed = 100;
}
message SomeClassA {
optional string Name = 1;
optional SomeClassC _notNamed = 200;
}
message SomeClassC {
optional int32 Count = 2;
}
Note that at most 1 sub-type field will be used, so it can be considered oneof for the purposes of .proto. Any fields relating to the sub-type will be included in message SomeClassA, so there is no conflict with RootClass and they do not need to be unique. The numbers only need to be unique per message in the .proto sense.
To take the specific questions, then:
no, because that would conflict with NodeId
it should be declared on SomeClassA; protobuf-net is only expecting immediate descendants, and it keeps the numbering consistent and conveniently readable, since the field number is only required to not conflict with the members of SomeClassA
yes you can; there is no conflict
no, because that would conflict with Name
yes you can; there is no conflict
yes you can; there is no conflict - although actually protobuf-net won't even think of SomeClassD as a sibling anyway (it isn't advertised anywhere as an include) - but if there was a [ProtoInclude(201, typeof(SomeClassD))] on SomeClassA, then it would be fine. This would change our .proto to add:
optional SomeClassD _alsoNotNamed = 201;
to message SomeClassA, and add:
message SomeClassD {
optional int32 Count = 2;
}
Note that protobuf-net doesn't actually generate the .proto syntax unless you explicitly ask for it (via GetSchema<T> etc) - I'm including it purely for illustrative purposes in terms of the underlying protobuf concepts.

Incorrect naming convention for fields in .proto files generated by protobuf-net?

I'm just getting started with Google Protocol Buffers and Marc Gravell's awesome protobuf-net program, and one thing I don't understand is the naming convention for the field declarations in a generated .proto file.
Here's what Google is recommending:
"Use underscore_separated_names for field names – for example, song_name."
https://developers.google.com/protocol-buffers/docs/style
"Note that method names always use camel-case naming, even if the field name in the .proto file uses lower-case with underscores (as it should)."
https://developers.google.com/protocol-buffers/docs/reference/java-generated
"Notice how these accessor methods use camel-case naming, even though the .proto file uses lowercase-with-underscores."
https://developers.google.com/protocol-buffers/docs/javatutorial
But when I use the Serializer.GetProto() method in protobuf-net on this:
[ProtoContract]
public partial class AuthEntry
{
private string _windowsAccount = "";
private string _machineNames = "*";
[ProtoMember(1)]
public string WindowsAccount
{
get { return _windowsAccount; }
set { _windowsAccount = value; }
}
[ProtoMember(2)]
public string MachineNames
{
get { return _machineNames; }
set { _machineNames = value; }
}
}
I get this:
message AuthEntry {
optional string WindowsAccount = 1;
optional string MachineNames = 2;
}
Instead of this, as I'd expected:
message AuthEntry {
optional string windows_account = 1;
optional string machine_names = 2;
}
I'm guessing it's no big deal, but just in case ...
The proto generation doesn't attempt to apply those conventions, because then it gets into the arms race of disambiguation, collisions, etc - no to mention the fun of finding word breaks in arbitrary names like CustomerIDReference (ok, that's an unlikely example, but you get the point). If you want to control that yourself - specify the Name property on either ProtoContractAttribute or ProtoMemberAttribute.

Resources