golang protobuf marshal empty struct with fixed size - go

I hava a protobuf struct Data
in .proto:
message Data {
uint64 ID = 1;
uint32 GUID = 2;
}
in golang
b, err := proto.Marshal(&pb.Data{})
if err != nil {
panic(err)
}
fmt.Println(len(b))
I got 0 length!
How can I make proto.Marshal always return fixed size no matter what pb.Data is?
ps.
pb.Data only contains int64 and int32

There are two issues here
1) protobuf uses varint encoding for integers, so the size depends on the value, see this link
2) zero value fields are not transmitted by default, so because the two integers are zero, even their field identifiers are not sent. I'm actually not sure there's even an option to send zero values looking at the docs
if you set them both to 1, you will have more than zero bytes, but it still won't be fixed in length, depending on range of the values
so, there's no real way to enforce fixed size in protobuf messages in general
if you want fixed length messages, you are probably better off using direct structs-on-the-wire type encoding, but then that's harder for language interop as they'd all have to define the same message and you'd lose easy message migrations and all the cool stuff that protobuf gives.
Cap'n Proto might have an option for fixed size structs, but they also generally compress which will, once again, produce variable length messages.
If you describe the problem you are trying to ultimately solve, we may be able to suggest other alternatives.

You are calling len() on a byte array. It is going to count the number of elements in that array, and return it.
If you have just instantiated a new, empty, protobuf pointer object with nothing inside, the marshaled byte array will not hold any data -- thus why you're getting 0.
I'm quite unsure what you're wanting it to return instead. Could you clarify your question a bit more with what you're wanting the output to be? I can maybe better answer your question.

Related

Go Ints and Strings are immutable OR mutable?

What I am reading about ints and strings over internet is they are immutable in the nature.
But the following code shows that after changing the values of these types, still they points to the same address. This contradicts the idea behind the nature of types in python.
Can anyone please explain me this?
Thanks in advance.
package main
import (
"fmt"
)
func main() {
num := 2
fmt.Println(&num)
num = 3
fmt.Println(&num) // address value of the num does not change
str := "2"
fmt.Println(&str)
str = "34"
fmt.Println(&str) // address value of the str does not change
}```
A number is immutable by nature. 7 is 7, and it won't be 8 tomorrow. That doesn't mean that which number is stored in a variable cannot change. Variables are variable. They're mutable containers for values which may be mutable or immutable.
A Go string is immutable by language design; the string type doesn't support any mutating operators (like appending or replacing a character in the middle of the string). But, again, assignment can change which string a variable contains.
In Python (CPython at least), a number is implemented as a kind of object, with an address and fields like any other object. When you do tricks with id(), you're looking at the address of the object "behind" the variable, which may or may not change depending on what you do to it, and whether or not it was originally an interned small integer or something like that.
In Go, an integer is an integer. It's stored as an integer. The address of the variable is the address of the variable. The address of the variable might change if the garbage collector decides to move it (making the numeric value of the address more or less useless), but it doesn't reveal to you any tricks about the implementation of arithmetic operators, because there aren't any.
Strings are more complicated than integers; they are kind of object-ish internally, being a structure containing a pointer and a size. But taking the address of a string variable with &str doesn't tell you anything about that internal structure, and it doesn't tell you whether the Go compiler decided to use a de novo string value for an assignment, or to modify the old one in place (which it could, without breaking any rules, if it could prove that the old one would never be seen again by anything else). All it tells you is the address of str. If you wanted to find out whether that internal pointer changed you would have to use reflection... but there's hardly ever any practical reason to do so.
When you read about a string being immutable, it means you cannot modify it by index, ex:
x := "hello"
x[2] = 'r'
//will raise an error
As a comment says, when you modify the whole var(and not a part of it with an index), it's not related to being mutable or not, and you can do it

How can I clone a strings.Builder in Go?

The Go programming language's standard library exposes a struct called strings.Builder which allows for easy building of strings through repeated concatenation in an efficient way, similar to C# or Java's StringBuilder.
In Java I would use StringBuilder's constructor to "clone" the object, like this:
StringBuilder newBuffer = new StringBuilder(oldBuffer.toString());
in Go, I can only see the following two-line way:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
and no other .Clone() an initializer method (which I might have just not found yet).
Is there another way that would be more brief/concise than the one I have presented?
Going into unnecessary detail for curiosity's sake...
After considering the documentation, here are your main issues:
The only exported way to read data out of the Builder is the Builder.String method.
It is not safe to copy a Builder value once you have manipulated it.
Let's look at this version:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
My first thought about why this isn't desirable is because the Builder internally uses a byte slice (mutable data type), and returns a string (immutable data type). Even though a string's underlying representation is the same as a byte slice, due to this mutability rule it would require a copy to convert to string. This means that by the time you write the string to the new buffer, you're already on your second copy when your task intuitively only requires a single copy.
Actually taking a look at the source code, however, we'll see that this assumption is wrong:
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
Using the unsafe package, the strings package basically "hacks" the buffer ([]byte) directly into a string. Again, these data types are the same on a memory level: A pointer to the start of the string or slice, and a pointer offset describing how many bytes long the string or slice is. These data types are just headers, so no copying of the buffer has occurred here.
This creates the uncomfortable situation where you have a string which is supposed to be immutable, but you still have a byte slice somewhere that could mutate those underlying bytes. The package is called unsafe after all, and this is a good example of why that is.
Because the strings.Builder is purely a "builder", i.e. it can only create new parts of the string and never modify data that's already written, we still get the immutability of our string that the language "guarantees". The only way we can break that rule is by gaining access to the internal buf of the Builder, but as that field is un-exported, you would again need to employ unsafe yourself to access it.
Summary:
The straightforward method you came up with, while perhaps a line (or two) longer than one might hope for, it is the definitive and correct way to do it. It's already as efficient as you're going to get, even if you bring out the more gritty features of Go like unsafe and reflect.
I hope that this has been informative. Here are the only suggested changes to your code:
// clone the builder contents. this is fast.
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())

Safety of using reflect.StringHeader in Go?

I have a small function which passes the pointer of Go string data to C (Lua library):
func (L *C.lua_State) pushLString(s string) {
gostr := (*reflect.StringHeader)(unsafe.Pointer(&s))
C.lua_pushlstring(L, (*C.char)(unsafe.Pointer(gostr.Data)), C.ulong(gostr.Len))
// lua_pushlstring copies the given string, not keeping the original pointer.
}
It works in simple tests, but from the documentations it's unclear whether this is safe at all.
According to Go document, the memory of reflect.StringHeader should be pinned for gostr, but the Stringheader.Data is already a uintptr, "an integer value with no pointer semantics" - which is itself odd because if it has no pointer semantics, wouldn't the field be completely useless as the memory may be moved right after the value is read? Or is the field treated specially like reflect.Value.Pointer? Or perhaps there is a different way of getting C pointer from string?
it's unclear whether this is safe at all.
Tapir Liui (https://twitter.com/TapirLiu/) dans Go101 (https://github.com/go101/go101) gives a clue as to the "safety" of reflect.StringHeader in this tweet:
Since Go 1.20, the reflect.StringHeader and reflect.SliceHeader types will be depreciated and not recommended to be used.
Accordingly, two functions, unsafe.StringData and unsafe.SliceData, will be introduced in Go 1.20 to take over the use cases of two old reflect types.
That was initially discussed in CL 401434, then in issue 53003.
The reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused.
As well, the types have always been documented as unstable and not to be relied upon.
We can see in Github code search that usage of these types is ubiquitous.
The most common use cases I've seen are:
converting []byte to string:
Equivalent to *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon.
Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule.
converting string to []byte:
commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule.
grabbing the Data pointer field for ffi or some other niche use converting a slice of one type to a slice of another type
Ian Lance Taylor adds:
One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap.
I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

Create repeatable byte array of Go struct which contains a pointer

I want to be able to create repeatable byte arrays of structs in Go so I can hash them and then verify that hash at some point.
I am currently following this simple approach to create a byte array from a struct with:
[]byte(fmt.Sprintf("%v", struct))...)
This works perfectly until my struct holds an embedded struct with a pointer, for example:
type testEmbeddedPointerStruct struct {
T *testSimpleStruct
}
In my tests this creates a different byte array each time, I think it may be because with the pointer the address in memory changes each time?
Is there a way of creating a repeatable byte array digest even if the struct holds a pointer?
Thanks
... I think it may be because with the pointer the address in memory changes ...
That's the obvious candidate, yes. You have chosen a very simple encoding, in which pointer fields are encoded as a hexadecimal representation of the pointer, rather than any value found at the target of the pointer.
Is there a way of creating a repeatable byte array digest even if the struct holds a pointer?
You may need to define more precisely what "repeat of same value" means to you,1 but in general, this is really an encoding problem. The encoding/gob package could perhaps give you an encoding you would like here, though note that unlike %v formatting, it encodes only exported struct fields and keeps the various names. It has the effect of "flattening" any pointer data, but won't work for cyclic data structures.
(You can write your own simpler encoder that simply follows pointers when it encounters them, and otherwise works like %v.)
1For example, suppose you have:
type T struct {
I int
P *Sub
}
type Sub struct {
J int
}
// ...
s2 := Sub{2}
s3 := Sub{3}
t1 := T{1, &s2}
t2 := T{1, &s3}
Obviously printing t1 and t2 (while flattening away pointers) produces an encoded version of {1 2} and {1 3} respectively, so these are not the same value. However, if we change s3 itself to:
s3 := Sub{2}
we now have two different entities, t1 and t2, that both "contain as a value" {1 2}. In Go, t1 and t2 are different because their pointers differ. Their values, in other words, are different. In the proposed encoding, t1 and t2 both encode the same, so they are the same value.
This is the kind of thing that occurs with pointers: the underlying data may be the same—the "same value" in one sense—but the objects holding those values may differ in location, so that if one object is modified, the other is not. If you run such objects through an encode-then-decode process that makes them share the pointed-to value, you may give up the ability to modify one object without modifying the other, or to distinguish between them.
Since you get to choose how to do the encoding, you get to decide exactly what you want to have happen here. But you must make that choice on purpose, not just accidentally.

SIGSEGV when writing to, but not reading from a memory location in golang

I was under the impression that using the unsafe package allows you to read/write arbitrary data. I'm trying to change the value the interface{} points to without changing the pointer itself.
Assuming that interface{} is implemented as
type _interface struct {
type_info *typ
value unsafe.Pointer
}
setting fails with a SIGSEGV, although reading is successful.
func data(i interface{}) unsafe.Pointer {
return unsafe.Pointer((*((*[2]uintptr)(unsafe.Pointer(&i))))[1])
}
func main() {
var i interface{}
i = 2
fmt.Printf("%v, %v\n", (*int)(data(i)), *(*int)(data(i)))
*((*int)(data(i))) = 3
}
Am I doing something wrong, or is this not possible in golang?
Hm... Here's how I understand your second code example currently, in case I've made an error (if you notice anything amiss in what I'm describing, my answer is probably irredeemably wrong and you should ignore the rest of what I have to say).
Allocate memory for interface i in main.
Set the value of i to an integer type with the value 2.
Allocate memory for interface i in data.
Copy the value of main's i to data's i; that is, set the value of the new interface to an integer type with the value 2.
Cast the address of the new variable into a pointer to length-2 array of uintptr (with unsafe.Pointer serving as the intermediary that forces the compiler to accept this cast).
Cast the second element of the array (whose value is the address of the value-part of i in data) back into an unsafe.Pointer and return it.
I've made an attempt at doing the same thing in more steps, but unfortunately I encountered all the same problems: the program recognizes that I have a non-nil pointer and it's able to dereference the pointer for reading, but using the same pointer for writing produces a runtime error.
It's step 6 that go vet complains about, and I think it's because, according to the package docs,
A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer
semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed.
More to the point, from what I can tell (though I'll admit I'm having trouble digging up explicit confirmation without scanning the compiler and runtime source), the runtime doesn't appear to track the value-part of an interface{} type as a discrete pointer with its own reference count; you can, of course, trample over both the interface{}'s words by writing another interface value into the whole thing, but that doesn't appear to be what you wanted to do at all (write to the memory address of a pointer that is inside an interface type, all without moving the pointer).
What's interesting is that we seem to be able to approximate this behavior by just defining our own structured type that isn't given special treatment by the compiler (interfaces are clearly somewhat special, with type-assertion syntax and all). That is, we can use unsafe.Pointer to maintain a reference that points to a particular point in memory, and no matter what we cast it to, the memory address never moves even if the value changes (and the value can be reinterpreted by casting it to something else). The part that surprises me a bit is that, at least in my own example, and at least within the Playground environment, the value that is pointed to does not appear to have a fixed size; we can establish an address to write to once, and repeated writes to that address succeed even with huge (or tiny) amounts of data.
Of course, with at least this implementation, we lose a bunch of the other nice-to-have things we associate with interface types, especially non-empty interface types (i.e. with methods). So, there's no way to use this to (for example) make a super-sneaky "generic" type. It seems that an interface is its own value, and part of that value's definition is an address in memory, but it's not entirely the same thing as a pointer.

Resources