Should I use Get methods to get values or should I use fields directly? - go

I'm using protobuf (and protoc) for the first time with Go.
message MyProtoStruct {
string description = 1;
}
I'm a little bit confused:
should I use methods to get values (like MyProtoStruct.GetDescription()) or
should I use fields directly (like MyProtoStruct.Description)?

You can use either. Note that for proto2 generated code rather than proto3 (proto2 is the default), fields in protocol buffer messages are always pointers. In that case, the getters return a zero value if the field is nil. That's very convenient, since it's quite difficult to write code that uses fields directly without causing nil pointer dereferences when a field is missing.
In proto3 generated code (which I'd suggest you use, for more than one reason), I'd suggest you use fields directly. In proto2 generated code, I'd suggest using the get methods.

Related

Is it a good practice in protobuf3 using optional to check nullability?

I noticed that they bring optional back in protobuf 3.15. I'm trying to use optional to check field presence. But I'm still unclear regarding the philosophy behind this.
Here is my usecase:
I'm providing some services that accept protobuf as my input. But the client side is untrusted from my perspective, therefore I have to check the nullability for the input protobuf.
The way I expect is that,
for a required field, either it's set, or it's null,
for an optional field, I don't care I can just use a default value and that won't cause any problem from my system
So I end up adding optional to every field that should not be null so that I can use hasXXX to check the presence. This looks weird to me because those fileds are actually required from my perspective, but I have to add optioanl keyword for them all.......I'm not sure whether this is a good practice. Proto experts pls give me some suggestions.
Also the default value doesn't make sense to me at all regarding nullability checking, since zero or empty string usually have their own meaning in many scenarios.
The entire point of optional in proto3 is to be able to distinguish between, for example:
no value was specified for field Foo
the field Foo was explicitly assigned the value that happens to be the proto3 default (zero/empty-string/false/etc)
In proto3 without optional: the above both look identical (which is to say: the field is omitted)
If you don't need to distinguish between those two scenarios: you don't need optional, but using optional also isn't likely to hurt much - worst case, a few extra zero/empty-string/false values get written to the wire, but they're small anyway.
Google's API Design guide discourages the usage of the optional keyword. Better practice is to make use of the google.api.field_behavior annotation for describing the field's behaviour.
It is however not recommended to use the optional annotation at all [1]. If one consistently implements the field behaviour annotations then OPTIONAL is redundant and can be omitted.
Check out AIP 203 for an overview of the various behaviour types along with guidelines around the usage of OPTIONAL fields.
In general, Google's API Improvement Proposals are a great reference for good practices in your API design.

Is un-deprecating a protobuf field allowed?

We have a protobuf type where we added a field, and then never implemented the features using it (so it was [deprecated=true] to hint that it should not be used). Several years later, the time has come, and now we do want to use that field after all.
Is it safe to just remove the [deprecated=true] and start using the field, or is that likely to break anything?
A field with the same type and semantics already exists on another message, so it would be very nice to use the name we gave it initially, rather than adding a new field and bloating the definition with two similar fields.
Edit: The proto3 language guide section on options has this to say:
deprecated (field option): If set to true, indicates that the field is deprecated and should not be used by new code. In most languages this has no actual effect. In Java, this becomes a #Deprecated annotation. In the future, other language-specific code generators may generate deprecation annotations on the field's accessors, which will in turn cause a warning to be emitted when compiling code which attempts to use the field. If the field is not used by anyone and you want to prevent new users from using it, consider replacing the field declaration with a reserved statement.
The only thing your clients will "notice", will be the missing deprecation warning in Java that they may have been used to, if they are still using the deprecated field. All fields are optional since proto3, so this should not break anything.

How is protobuf 3 singular different than optional

Looking at the proto3 reference:
https://developers.google.com/protocol-buffers/docs/proto3#simple
It says this about singular:
singular: a well-formed message can have zero or one of this field (but not more than one).
It's not clear to me how this is different than optional. Is singular just an explicit way of stating that something is optional (which is now implicit for proto3)? Or is there something else this does that I'm missing?
Thanks.
Optional is proto2 syntax. Singular is proto3 syntax.
In proto3, singular is the default rule. As today the documentation needs to be improved, and there's an open issue: google/protobuf#3457.
See also google/protobuf#2497 why messge type remove 'required,optional'?, and also haberman's comment on GoogleCloudPlatform/google-cloud-python#1402:
I think the question is: what are you trying to do? Why is it relevant
to you whether a field is set or not and what do you intend to do with
this information?
In proto3, field presence for scalar fields simply doesn't exist. Your
mental model for proto3 should be that it's a C++ or Go struct. For
integers and strings, there is no such thing as being set or not, it
always has a value. For submessages, it's a pointer to the submessage
instance which can be NULL, that's why you can test presence for it.

How to design for a future additional enum value in protocol buffers?

One of the attractive features of protocol buffers is that it allows you extend the message definitions without breaking code that uses the older definition. In the case of an enum according to the documentation:
a field with an enum type can only have one of a specified set of constants as its value (if you try to provide a different value, the parser will treat it like an unknown field)
therefore if you extend the enum and use the new value then a field with that type in old code will be undefined or have its default value, if there is one.
What is a good strategy to deal with this, knowing that in future the enum may have additional values added?
One way that comes to mind is to define an "undefined" member of the enum and make that the default, then old code will know it has been sent something that it can't interpret. Is that sensible, are there better ways to deal with this situation?
Yes, the best approach is to make the first value in the enum something like UNKNOWN = 0. Then old programs reading a protobuf with an enum value they don't recognize will see it as UNKNOWN and hopefully they can handle that reasonably, eg by skipping that element.
If you want to do this you'll also want to make the enum be optional not required.
required, generally, means "I'd rather the program just abort than handle something it doesn't understand."
Note that it must be the first value declared in the proto source - just being the zero value doesn't make it the default.
At least in the java implementation of proto3, it creates a default value. The value will start with "UNKNOWN_ENUM_VALUE_"
Code references:
https://github.com/protocolbuffers/protobuf/blob/0707f2e7f556c8396d6027d0533ec3a56d1061db/java/core/src/main/java/com/google/protobuf/Descriptors.java#L640-L642
https://github.com/protocolbuffers/protobuf/blob/0707f2e7f556c8396d6027d0533ec3a56d1061db/java/core/src/main/java/com/google/protobuf/GeneratedMessageV3.java#L2750-L2751
https://github.com/protocolbuffers/protobuf/blob/0707f2e7f556c8396d6027d0533ec3a56d1061db/java/core/src/main/java/com/google/protobuf/Descriptors.java#L1832
https://github.com/protocolbuffers/protobuf/blob/0707f2e7f556c8396d6027d0533ec3a56d1061db/java/core/src/main/java/com/google/protobuf/Descriptors.java#L2034-L2045

Return concrete or abstract datatypes?

I'm in the middle of reading Code Complete, and towards the end of the book, in the chapter about refactoring, the author lists a bunch of things you should do to improve the quality of your code while refactoring.
One of his points was to always return as specific types of data as possible, especially when returning collections, iterators etc. So, as I've understood it, instead of returning, say, Collection<String>, you should return HashSet<String>, if you use that data type inside the method.
This confuses me, because it sounds like he's encouraging people to break the rule of information hiding. Now, I understand this when talking about accessors, that's a clear cut case. But, when calculating and mangling data, and the level of abstraction of the method implies no direct data structure, I find it best to return as abstract a datatype as possible, as long as the data doesn't fall apart (I wouldn't return Object instead of Iterable<String>, for example).
So, my question is: is there a deeper philosophy behind Code Complete's advice of always returning as specific a data type as possible, and allow downcasting, instead of maintaining a need-to-know-basis, that I've just not understood?
I think it is simply wrong for the most cases. It has to be:
be as lenient as possible, be as specific as needed
In my opinion, you should always return List rather than LinkedList or ArrayList, because the difference is more an implementation detail and not a semantic one. The guys from the Google collections api for Java taking this one step further: they return (and expect) iterators where that's enough. But, they also recommend to return ImmutableList, -Set, -Map etc. where possible to show the caller he doesn't have to make a defensive copy.
Beside that, I think the performance of the different list implementations isn't the bottleneck for most applications.
Most of the time one should return an interface or perhaps an abstract type that represents the return value being returned. If you are returning a list of X, then use List. This ultimately provides maximum flexibility if the need arises to return the list type.
Maybe later you realise that you want to return a linked list or a readonly list etc. If you put a concrete type your stuck and its a pain to change. Using the interface solves this problem.
#Gishu
If your api requires that clients cast straight away most of the time your design is suckered. Why bother returning X if clients need to cast to Y.
Can't find any evidence to substantiate my claim but the idea/guideline seems to be:
Be as lenient as possible when accepting input. Choose a generalized type over a specialized type. This means clients can use your method with different specialized types. So an IEnumerable or an IList as an input parameter would mean that the method can run off an ArrayList or a ListItemCollection. It maximizes the chance that your method is useful.
Be as strict as possible when returning values. Prefer a specialized type if possible. This means clients do not have to second-guess or jump through hoops to process the return value. Also specialized types have greater functionality. If you choose to return an IList or an IEnumerable, the number of things the caller can do with your return value drastically reduces - e.g. If you return an IList over an ArrayList, to get the number of elements returned - use the Count property, the client must downcast. But then such downcasting defeats the purpose - works today.. won't tomorrow (if you change the Type of returned object). So for all purposes, the client can't get a count of elements easily - leading him to write mundane boilerplate code (in multiple places or as a helper method)
The summary here is it depends on the context (exceptions to most rules). E.g. if the most probable use of your return value is that clients would use the returned list to search for some element, it makes sense to return a List Implementation (type) that supports some kind of search method. Make it as easy as possible for the client to consume the return value.
I could see how, in some cases, having a more specific data type returned could be useful. For example knowing that the return value is a LinkedList rather than just List would allow you to do a delete from the list knowing that it will be efficient.
I think, while designing interfaces, you should design a method to return the as abstract data type as possible. Returning specific type would make the purpose of the method more clear about what they return.
Also, I would understand it in this way:
Return as abstract a data type as possible = return as specific a data type as possible
i.e. when your method is supposed to return any collection data type return collection rather than object.
tell me if i m wrong.
A specific return type is much more valuable because it:
reduces possible performance issues with discovering functionality with casting or reflection
increases code readability
does NOT in fact, expose more than is necessary.
The return type of a function is specifically chosen to cater to ALL of its callers. It is the calling function that should USE the return variable as abstractly as possible, since the calling function knows how the data will be used.
Is it only necessary to traverse the structure? is it necessary to sort the structure? transform it? clone it? These are questions only the caller can answer, and thus can use an abstracted type. The called function MUST provide for all of these cases.
If,in fact, the most specific use case you have right now is Iterable< string >, then that's fine. But more often than not - your callers will eventually need to have more details, so start with a specific return type - it doesn't cost anything.

Resources