I want to be able to create repeatable byte arrays of structs in Go so I can hash them and then verify that hash at some point.
I am currently following this simple approach to create a byte array from a struct with:
[]byte(fmt.Sprintf("%v", struct))...)
This works perfectly until my struct holds an embedded struct with a pointer, for example:
type testEmbeddedPointerStruct struct {
T *testSimpleStruct
}
In my tests this creates a different byte array each time, I think it may be because with the pointer the address in memory changes each time?
Is there a way of creating a repeatable byte array digest even if the struct holds a pointer?
Thanks
... I think it may be because with the pointer the address in memory changes ...
That's the obvious candidate, yes. You have chosen a very simple encoding, in which pointer fields are encoded as a hexadecimal representation of the pointer, rather than any value found at the target of the pointer.
Is there a way of creating a repeatable byte array digest even if the struct holds a pointer?
You may need to define more precisely what "repeat of same value" means to you,1 but in general, this is really an encoding problem. The encoding/gob package could perhaps give you an encoding you would like here, though note that unlike %v formatting, it encodes only exported struct fields and keeps the various names. It has the effect of "flattening" any pointer data, but won't work for cyclic data structures.
(You can write your own simpler encoder that simply follows pointers when it encounters them, and otherwise works like %v.)
1For example, suppose you have:
type T struct {
I int
P *Sub
}
type Sub struct {
J int
}
// ...
s2 := Sub{2}
s3 := Sub{3}
t1 := T{1, &s2}
t2 := T{1, &s3}
Obviously printing t1 and t2 (while flattening away pointers) produces an encoded version of {1 2} and {1 3} respectively, so these are not the same value. However, if we change s3 itself to:
s3 := Sub{2}
we now have two different entities, t1 and t2, that both "contain as a value" {1 2}. In Go, t1 and t2 are different because their pointers differ. Their values, in other words, are different. In the proposed encoding, t1 and t2 both encode the same, so they are the same value.
This is the kind of thing that occurs with pointers: the underlying data may be the same—the "same value" in one sense—but the objects holding those values may differ in location, so that if one object is modified, the other is not. If you run such objects through an encode-then-decode process that makes them share the pointed-to value, you may give up the ability to modify one object without modifying the other, or to distinguish between them.
Since you get to choose how to do the encoding, you get to decide exactly what you want to have happen here. But you must make that choice on purpose, not just accidentally.
Related
What I am reading about ints and strings over internet is they are immutable in the nature.
But the following code shows that after changing the values of these types, still they points to the same address. This contradicts the idea behind the nature of types in python.
Can anyone please explain me this?
Thanks in advance.
package main
import (
"fmt"
)
func main() {
num := 2
fmt.Println(&num)
num = 3
fmt.Println(&num) // address value of the num does not change
str := "2"
fmt.Println(&str)
str = "34"
fmt.Println(&str) // address value of the str does not change
}```
A number is immutable by nature. 7 is 7, and it won't be 8 tomorrow. That doesn't mean that which number is stored in a variable cannot change. Variables are variable. They're mutable containers for values which may be mutable or immutable.
A Go string is immutable by language design; the string type doesn't support any mutating operators (like appending or replacing a character in the middle of the string). But, again, assignment can change which string a variable contains.
In Python (CPython at least), a number is implemented as a kind of object, with an address and fields like any other object. When you do tricks with id(), you're looking at the address of the object "behind" the variable, which may or may not change depending on what you do to it, and whether or not it was originally an interned small integer or something like that.
In Go, an integer is an integer. It's stored as an integer. The address of the variable is the address of the variable. The address of the variable might change if the garbage collector decides to move it (making the numeric value of the address more or less useless), but it doesn't reveal to you any tricks about the implementation of arithmetic operators, because there aren't any.
Strings are more complicated than integers; they are kind of object-ish internally, being a structure containing a pointer and a size. But taking the address of a string variable with &str doesn't tell you anything about that internal structure, and it doesn't tell you whether the Go compiler decided to use a de novo string value for an assignment, or to modify the old one in place (which it could, without breaking any rules, if it could prove that the old one would never be seen again by anything else). All it tells you is the address of str. If you wanted to find out whether that internal pointer changed you would have to use reflection... but there's hardly ever any practical reason to do so.
When you read about a string being immutable, it means you cannot modify it by index, ex:
x := "hello"
x[2] = 'r'
//will raise an error
As a comment says, when you modify the whole var(and not a part of it with an index), it's not related to being mutable or not, and you can do it
I hava a protobuf struct Data
in .proto:
message Data {
uint64 ID = 1;
uint32 GUID = 2;
}
in golang
b, err := proto.Marshal(&pb.Data{})
if err != nil {
panic(err)
}
fmt.Println(len(b))
I got 0 length!
How can I make proto.Marshal always return fixed size no matter what pb.Data is?
ps.
pb.Data only contains int64 and int32
There are two issues here
1) protobuf uses varint encoding for integers, so the size depends on the value, see this link
2) zero value fields are not transmitted by default, so because the two integers are zero, even their field identifiers are not sent. I'm actually not sure there's even an option to send zero values looking at the docs
if you set them both to 1, you will have more than zero bytes, but it still won't be fixed in length, depending on range of the values
so, there's no real way to enforce fixed size in protobuf messages in general
if you want fixed length messages, you are probably better off using direct structs-on-the-wire type encoding, but then that's harder for language interop as they'd all have to define the same message and you'd lose easy message migrations and all the cool stuff that protobuf gives.
Cap'n Proto might have an option for fixed size structs, but they also generally compress which will, once again, produce variable length messages.
If you describe the problem you are trying to ultimately solve, we may be able to suggest other alternatives.
You are calling len() on a byte array. It is going to count the number of elements in that array, and return it.
If you have just instantiated a new, empty, protobuf pointer object with nothing inside, the marshaled byte array will not hold any data -- thus why you're getting 0.
I'm quite unsure what you're wanting it to return instead. Could you clarify your question a bit more with what you're wanting the output to be? I can maybe better answer your question.
I have a question regarding the array of struct, if we should prefer to using the struct pointer or not.
Let's say we have Item and Cart which contains an array of Items.
type Item struct {
Id string
Name string
Price string
}
type Cart1 struct {
Id string
Items []Item
}
or
type Cart2 struct {
Id string
Items []*Item
}
I heard that when we append a struct to a struct list, golang will make a copy and add it to list, this is not necessary, so we should use list of struct pointer, is that true?
could anyone clarify?
You are right in your assumption - any(not only append()) function application copy by value provided arguments in Go. But how slice of pointer would reduce memory consumption? You should store actual struct plus reference to it in memory. Referencing is more about access control.
foo := cart1.Items[0]
foo.Name := "foo" //will not change cart1
//but in pointer case
bar := cart2.Items[0]
bar.Name := "bar" //will change cart2.Items[0].Name to "bar"
Go Arrays are passed by value, go Slices are passed by reference like a pointer. In fact slices include a pointer as part of their internal data type. Since your cart will have a variable number of items, just use []Item.
See this effective go reference
BTW if the slice has capacity 4 and you append something 5th thing to it, Go doubles the capacity, so it's not like every single addition will assign memory
As I understand your question, you problem is not memory consumption but unnecessary copying of structs.
Everything in Go is passed by value. If you have a slice of structs and you append a new struct Go will make a copy. Depending on the size of the struct it may be too much. Instead you may choose to use a slice of pointers to structs. That way when you append Go will make a copy of a pointer.
That might be cheaper but it also may complicate the code that will access the slice. Because now you have shared mutable state which is a problem especially in Go where you can't have a const pointer and anyone could modify the struct. Pointers are also prone to nil dereference errors.
Which one you choose is entirely up to you. There's no single "Go way" here.
We have:
type A struct {
Name string
Value string
}
type B struct {
//
First *A
Second A
}
First off: What it is more efficient in B, using *A or A?
And second: When instantiating B I would use b := &B{ ... }, and thus have a pointer to B. All functions which have B as receiver use func (*B) ... as signature, therefore operating only on the pointer. Now that I always have a pointer to B, does it really matter what B is composed of? If I always use the pointer, no matter what fields B has, I always pass around a pointer to B and the value of Second A is never copied when passing *B around. Or am I missing something?
There is no single right answer. It always depends on your use case.
Some guidance:
Treat semantic reasons over efficiency considerations
Use a pointer when A is "large"
Avoid a pointer when B should not be allowed to edit A
Your second statement is correct. But when you have lots of instances of B using a pointer will be significantly more efficient (if the struct A is significantly bigger than the size of a pointer).
If you are in doubt, measure it for use case and then decide what the best solution is.
Just want to add to Sebastian's answer: use *A if you ever want it to be nil. This has benefits sometimes when marshaling JSON or using the struct with databases.
A non-pointer to a A will always have at least the zero value for that type. So it will always serialize into JSON even if you don't have anything useful there. To include it in the JSON serialization only when a populated struct is present, make it a pointer, which can be nil.
So i know in go you can initialize a struct two different ways in GO. One of them is using the new keyword which returns a pointer to the struct in memory. Or you can use the { } to make a struct. My question is when is appropriate to use each?
Thanks
I prefer {} when the full value of the type is known and new() when the value is going to be populated incrementally.
In the former case, adding a new parameter may involve adding a new field initializer. In the latter it should probably be added to whatever code is composing the value.
Note that the &T{} syntax is only allowed when T is a struct, array, slice or map type.
Going off of what #Volker said, it's generally preferable to use &A{} for pointers (and this doesn't necessarily have to be zero values: if I have a struct with a single integer in it, I could do &A{1} to initialize the field). Besides being a stylistic concern, the big reason that people normally prefer this syntax is that, unlike new, it doesn't always actually allocate memory in the heap. If the go compiler can be sure that the pointer will never be used outside of the function, it will simply allocate the struct as a local variable, which is much more efficient than calling new.
Most people use A{} to create a zero value of type A, &A{} to create a pointer to a zero value of type A. Using newis only necessary for int and that like as int{} is a no go.