How to set default value to a customed google protobuf type? - protocol-buffers

I have a google protobuf structure:
message ResourceProto{
optional int32 memory = 0;
optional int32 core = 1;
}
And I have another structure:
message AnotherProto{
optional ResourceProto resource = 0 [default to ResourceProto(100,1)];
....
}
I know how to set default value to normal type like int, String, Bool, but how to assign default value to customed structure, what is the syntax? Say, set the default value of resource in AnotherProto to memory = 100 and core = 1?

Protocol Buffers doesn't support default values for fields of non-primitive types.
Not sure why exactly, but I would assume this is because it is rarely needed in practice and tricky to implement:
Arbitrary default values are rather difficult to self-describe in a consistent and portable way. Essentially you need to have a notion of a dynamically-typed any type, which is not supported by Protobuf2. Instead, they represent defaults as optional string default_value with some implementation-dependent syntax for the values.
When you allow this in the definition language, you need to introduce syntax for structured default values. This is slightly more complicated than supporting syntax for primitive values alone.
Depending on a target language, it may not be quite clear how to handle such default values at runtime with regard to dynamic object allocation and ownership. The safest option would involve copying which may result in unexpected performance hit.
That said, fundamentally, it can be done. For instance, I've implemented support for arbitrary defaults in piqi and it works well in OCaml and Erlang.

Related

Using tostring() to ensure functional conditionals in Terraform?

tl;dr: Is it an acceptable practice to use tostring() to cast values used for conditionals in Terraform >= 0.13 for handling a strictly defined set of input types?
Yesterday I asked a question that led me to a new question today:
Terraform count using bool?
What I learned is that there is some automatic type-casting applied to certain primitives in Terraform (going to and from strings to other data types mainly), but that these primitives cannot be used to infer a different data type (e.g. a bool cannot be passed as an input to the count argument because count only accepts a number type.
One comment on that question had a very simple way to use a bool as a condition:
count = var.my_var ? 1 : 0
The only potential issue with this, is if my_var can have different input types. In my use case, it'll be added to a Terraform module in which the user will decide what to supply for this argument; previously we've only been passing in string or number, but I find that to be a little less specific than I'd prefer, because Terraform can interpret count to be > 1 copy of a resource (I want a discrete 0 or 1 [specifically for something like var.create_this_resource whose value can be either true or false]); this also just doesn't look as nice to see "1" vs true IMO. So I'd like to start using bool instead, but also be able to handle when a user inputs a number. What I found is that I can use the following to accomplish this:
count = tostring(var.my_var) ? 1 : 0
Here, tostring() will take whatever is in the input and, presumably, cast it to a string. It only works for string, number, and boolean, and really, I'm only using it to get a number to a string because that's the only case where passing into a ternary operator is currently failing.
So my question is whether or not it's acceptable to do this? I've tested it with string, bool, and number, as well as unsupported types (i.e. an empty list or null); it seems to work well in code but the following made me think I shouldn't use it:
From the docs:
Explicit type conversions are rarely necessary in Terraform because it will convert types automatically where required. Use the explicit type conversion functions only to normalize types returned in module outputs.
In most cases I would suggest avoiding designs where a particular variable could have different types in different situations, unless your module is treating the value as entirely opaque and just passing it through to something else which has broader validation rules.
Since your module is working directly with this value, it would typically be best to specify an exact type constraint for the variable and make the caller of the module write expressions to convert the value if the automatic conversions are insufficient. That way the caller can get better feedback about what sort of value your module is expecting, and can decide for themselves how to convert their value of a different type.
Converting to string can only produce a value that can automatically convert to bool in the following situations:
The value was already a string, and was either "true" or "false".
The value was a bool value, in which case tostring will convert it to a string and then the conditional operator will immediately convert it back to bool again, which would be redundant.
If you declare the variable as being bool itself then the same rules will apply, but the conversion will happen inside the calling module block rather than in the count expression:
variable "my_var" {
type = bool
}
module "example" {
# ...
# This will automatically convert to bool true,
# just as it would've in the conditional operator.
my_var = "true"
}
If you really cannot avoid supporting various unusual ways of writing boolean values then you can potentially write your own conversion table which would be based on strings, and would specify the boolean value for each possible string after conversion:
locals {
sloppy_bool = tomap({
"1" = true
"true" = true
"0" = false
"false" = false
})
my_var = local.sloppy_bool[var.my_var]
}
Because mapping types (map types and object types) only support strings as keys, local.sloppy_bool[var.my_var] will automatically convert var.my_var to string, just as if you'd written tostring(var.my_var). It'll then look up the result in the table and return the corresponding boolean value, which means you can then use local.my_var instead of var.my_var elsewhere in your module and rely on it always being a true boolean value.
I would suggest doing this only if you had a previous version of the module which tolerated this sort of typing weirdness and you need to remain compatible with it. For an entirely new module, I would consider this to be non-idiomatic and probably confusing for anyone already familiar with Terraform who is trying to use the module, because they will need to become familiar with your unusual definition of the type conversion rather than relying on their knowledge of the built-in conversion rules.

Safety of using reflect.StringHeader in Go?

I have a small function which passes the pointer of Go string data to C (Lua library):
func (L *C.lua_State) pushLString(s string) {
gostr := (*reflect.StringHeader)(unsafe.Pointer(&s))
C.lua_pushlstring(L, (*C.char)(unsafe.Pointer(gostr.Data)), C.ulong(gostr.Len))
// lua_pushlstring copies the given string, not keeping the original pointer.
}
It works in simple tests, but from the documentations it's unclear whether this is safe at all.
According to Go document, the memory of reflect.StringHeader should be pinned for gostr, but the Stringheader.Data is already a uintptr, "an integer value with no pointer semantics" - which is itself odd because if it has no pointer semantics, wouldn't the field be completely useless as the memory may be moved right after the value is read? Or is the field treated specially like reflect.Value.Pointer? Or perhaps there is a different way of getting C pointer from string?
it's unclear whether this is safe at all.
Tapir Liui (https://twitter.com/TapirLiu/) dans Go101 (https://github.com/go101/go101) gives a clue as to the "safety" of reflect.StringHeader in this tweet:
Since Go 1.20, the reflect.StringHeader and reflect.SliceHeader types will be depreciated and not recommended to be used.
Accordingly, two functions, unsafe.StringData and unsafe.SliceData, will be introduced in Go 1.20 to take over the use cases of two old reflect types.
That was initially discussed in CL 401434, then in issue 53003.
The reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused.
As well, the types have always been documented as unstable and not to be relied upon.
We can see in Github code search that usage of these types is ubiquitous.
The most common use cases I've seen are:
converting []byte to string:
Equivalent to *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon.
Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule.
converting string to []byte:
commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule.
grabbing the Data pointer field for ffi or some other niche use converting a slice of one type to a slice of another type
Ian Lance Taylor adds:
One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap.
I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

How does a loosely typed language know how to handle different data types?

I was working on a simple task yesterday, just needed to sum the values in a handful of dropdown menus to display in a textbox via Javascript. Unexpectedly, it was just building a string so instead of giving me the value 4 it gave me "1111". I understand what was happening; but I don't understand how.
With a loosely typed language like Javascript or PHP, how does the computer "know" what type to treat something as? If I just type everything as a var, how does it differentiate a string from an int from an object?
What the + operator will do in Javascript is determined at runtime, when both actual arguments (and their types) are known.
If the runtime sees that one of the arguments is a string, it will do string concatenation. Otherwise it will do numeric addition (if necessary coercing the arguments into numbers).
This logic is coded into the implementation of the + operator (or any other function like it). If you looked at it, you would see if typeof(a) === 'string' statements (or something very similar) in there.
If I just type everything as a var
Well, you don't type it at all. The variable has no type, but any actual value that ends up in that variable has a type, and code can inspect that.

In what way does this struct-field-aliasing code invoke Undefined Behavior

Given the code:
#include <stdlib.h>
#include <stdint.h>
typedef struct { int32_t x, y; } INTPAIR;
typedef struct { int32_t w; INTPAIR xy; } INTANDPAIR;
void foo(INTPAIR * s1, INTPAIR * s2)
{
s2->y++;
s1->x^=1;
s2->y--;
s1->x^=1;
}
int hey(int x)
{
static INTPAIR dummy;
void *p = calloc(sizeof (INTANDPAIR),1);
INTANDPAIR *p1 = p;
INTPAIR *p2a = p;
INTPAIR *p2b = &p1->xy;
p2b->x = x;
foo(p2b,p2a);
int result= p2b->x;
free(p);
return result;
}
#include <stdio.h>
int main(void)
{
for (int i=0; i<10; i++)
printf("%d.",hey(i));
}
Behavior depends upon gcc optimization level, which implies that gcc thinks
this code invokes Undefined Behavior (the definition of "foo" collapses to nothing, but interestingly the definition of "hey" increments the value passed in). I'm not quite sure what if anything it does that runs afoul of the Standard's rules, though.
The code very deliberately and evilly constructs two pointers such that
s2a->y and s2b->x will alias, but the pointers are deliberately constructed in such a way that both identify legitimate potential objects of type INTPAIR. Because code used calloc to get the memory, all field members have legitimate initial defined values of zero. All accesses to the allocated memory are done via an int32_t member of an INTPAIR*.
I can understand why it would make sense for the Standard to forbid aliasing structure fields in this fashion, but I couldn't find anything in the Standard which actually does so. Is gcc operating in Standard-compliant fashion here, or is it violating some clause in the Standard which isn't referenced by Annex J.2 and doesn't use any of the terms I searched for?
UPDATE:
I felt this answer was OK, but not still a little imprecise, and not cut and dry as to what the UB was. After a lot of very interesting discussion and comments I have tried again with a new answer
The right part of the C99 standard is quoted in this answer. I'm copying it here for convenience. The question and several of the answers are quite thorough.
(C99; ISO/IEC 9899:1999 6.5/7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types 73) or 88):
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of
the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
What is an effective type then? (C99; ISO/IEC 9899:1999 6.5/6:
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
So at the line p2b->x = x the object at p+4 becomes of effective type INTPAIR. Is it aligned correctly? If it isn't then Undefined Behavior (UB). But to keep it interesting, assume it is as it must be in this case because of the layout of INTANDPAIR.
By the same analysis there are two 8 byte objects, p2a (s2) at #(p+4) and p2b #p. As your example is demonstrating the 2nd element of p2a and the first of p2b end up being aliased.
In the foo(), the object p2b #p+4 is accessed by the normal method via s1->x. But then the "stored value" of object p2b is also accessed by a side effect of modifying a different object p2a #p. Since this falls under none of the bullets of 6.5/7, it is UB. Note that 6.5/7 says only, so objects shall not be accessed in any other ways.
I think the main distinction is that the "object" in question is the whole structure p2a/s2 and p2b/s1, not the integer members. If you change the argument of the function to take the integers and alias them it works "fine" because the function can't know s1 and s2 alias. For example:
void foo2(int *s1, int *s2)
{
(*s2)++;
(*s1)^=1;
(*s2)--;
(*s1)^=1;
}
...
/*foo(p2b,p2a);*/
foo2((int*)p, (int*)p); /* or p+4 or whatever you want */
This more or less confirms that this is the way GCC chose to interpret things: modifying a member is modifying the whole struct object and that since side effects of modifying one object are not on the listed legal ways to indirectly modify a different object, whee! we can do whatever silly thing we feel like doing.
So whether GCC interprets the ambiguities in standard to decide that by deriving s1 and s2 pointers through different typed pointers and then accessing them constitutes indirectly accessing the memory via different original types via p1 and p or whether it interprets the standard in the way I'm suggesting that "object" s2->y modifies is not just the integer but the s2 object, it is UB either way. Or is GCC just being especially snarky and pointing out that if the standard doesn't very clearly specify the semantics of dynamically allocated yet overlapping objects, it is free to do whatever it wants because by definition it is "undefined".
I don't think at this microscopic level anyone other than the standards body can definitively answer whether this should be UB or not because at this level it requires some "interpretation". The GCC's implementers opinion's seem to favor very aggressive interpretations.
I like Linus's reaction to this whole thing. And it is true, why not just be conservative and let the programmer tell the compiler when it is safe? Very Excellent Linus Rant
My previous answer was lacking, maybe not completely wrong, but the sample program is deliberately designed to sidestep each of the more obvious explicit Undefined Behaviors (UB) dictated by the C99 standard, like 6.5/7. But with both GCC (and Clang) this example demonstrates strict aliasing failure like symptoms under optimization. They appear to be assuming s1->y and s2-x can't alias. So, is the compiler wrong? Is this a loophole in the strict aliasing legalese?
Short answer: No. I wouldn't be surprised if there was a loophole of some kind in the standard, given its complexity. But in this example, creating overlapping objects on the heap is explicitly undefined behavior, and there are several other things happening that the standard does not define.
I think the point of the example is not that it fails - it is obvious that "playing fast and loose" with pointers is a bad idea and relying on corner cases and legalese to prove the compile "wrong" is of little help if the code doesn't work. The key questions are: is GCC wrong? and what in the standard says so.
First, lets look at the obvious strict aliasing rules and how this example is trying to avoid them.
C99 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 76)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This is the main strict aliasing section. It means that accessing the same memory via two different type pointers is UB. This example sidesteps it by accessing both using INTPAIR pointers in foo().
The key problem with this is that it is talking about accessing the stored value via two different effective types (e.g. pointers). It doesn't talk about accessing via two different objects.
What is being accessed? is it the integer member or the entire object s1 / s2? Is accessing s2->x via s1->y access via "a type compatible with the effective type of the object". I believe an argument can be made that a) the access as a side effect of modifying a different object does not fall under the permissible methods in 6.5/7 and that b) modifying one member of the aggregate transitively modifies the aggregate (*s1 or *s2) also.
Since this is not specified, it is UB, but it is a bit hand-wavy.
How did we get pointers to two overlapping objects? Are the pointer casts leading to them OK? Section 6.3.2.3 contains the rules for casting pointers and the example carefully does not violate any of them. In particular, because p2b is a pointer to INTANDPAIR member xy the alignment is guaranteed to be right, otherwise it would definitely run afoul of 6.3.2.3/7.
Furthermore, &p1->xy is not a problem - it can't be - it is a perfectly legitimate pointer to an INTPAIR. Simply casting pointers and/or taking addresses is safely outside the definition of "access" (3.1/1).
It is clear that the problem comes about by accessing two integer members that overlay each other as different parts of overlapping objects. Any attempt to do this via pointers of different types would clearly run afoul of 6.5/7. If accessed by the same type pointer at the same address, there would be no problem whatsoever. So the only way left that they could alias this way is that if two objects at different addresses overlapped in some fashion.
Obviously this could occur as part of a union, but that is not the case for this example. Type punning through unions may not be UB in C99, but it would be a different question whether a variant of this example could be made misbehave via unions.
The example uses dynamic allocation and casts the resultant void pointer to two different types. Going from from a pointer to an object to void * and back again is valid (6.3.2.3/1). Several other ways of obtaining pointers to objects that would overlap are explicitly UB by the pointer conversion rules of 6.3.2.3, the aliasing rules of 6.5/7, and/or the compatible type rules 6.2.7.
So what else is wrong?
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3
The storage for each of the objects is allocated by calloc() so the duration we want is "allocated". So we check 7.20.3: (emphasis added)
7.20.3 Memory management functions
1 The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
...
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 25) and retains its last-stored value throughout its lifetime. 26) If an object is referred to outside of its lifetime, the behavior is undefined.
To avoid UB, the accesses to the two different objects must be to a valid object within its lifetime. You can get a single valid object (or an array) with malloc()/calloc(), but these guarantee that you will receive a pointer disjoint from all other objects. So is the object returned from calloc() p or is it p1? It can't be both.
The UB is triggered by attempting to reuse the same dynamically allocated object to hold two objects that are not disjoint. While calloc() guarantees it will return a pointer to a disjoint object, there is nothing that says it will still work if you then start using parts of the buffer for a 2nd overlapping one. In fact, it even explicitly says it is UB if you access an object outside its lifetime and there is only a single allocation ergo a single lifetime.
Also note:
4. Conformance
In this International Standard, ‘‘shall’’ is to be interpreted as a requirement on an implementation or on a program; conversely, ‘‘shall not’’ is to be interpreted as a prohibition.
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition
of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
For this to be a compiler error it must fail on a program that only uses constructs explicitly defined. Anything else is outside the safe-harbor and is still undefined, even if it the standard doesn't explicitly state that it is Undefined Behavior.

Is there a builtin func named "int32"?

The below snippet works fine.
In this case, what "int32" is? A func?
I know there is a type named "int32"
This could be a stupid question. I've just finished A Tour of Go but I could not find the answer.(it's possible I'm missing something.)
package main
import "fmt"
func main() {
var number = int32(5)
fmt.Println(number) //5
}
It is a type conversion, which is required for numeric types.
Conversions are required when different numeric types are mixed in an expression or assignment. For instance, int32 and int are not the same type even though they may have the same size on a particular architecture.
Since you do a variable declaration, you need to specify the type of '5'.
Another option, as mentioned by rightfold in the comments is: var number int32 = 5
(as opposed to a short variable declaration like number := 5)
See also Go FAQ:
The convenience of automatic conversion between numeric types in C is outweighed by the confusion it causes.
When is an expression unsigned? How big is the value? Does it overflow? Is the result portable, independent of the machine on which it executes?
It also complicates the compiler; “the usual arithmetic conversions” are not easy to implement and inconsistent across architectures.
For reasons of portability, we decided to make things clear and straightforward at the cost of some explicit conversions in the code. The definition of constants in Go—arbitrary precision values free of signedness and size annotations—ameliorates matters considerably, though.
A related detail is that, unlike in C, int and int64 are distinct types even if int is a 64-bit type.
The int type is generic; if you care about how many bits an integer holds, Go encourages you to be explicit.

Resources