SIGSEGV when writing to, but not reading from a memory location in golang - go

I was under the impression that using the unsafe package allows you to read/write arbitrary data. I'm trying to change the value the interface{} points to without changing the pointer itself.
Assuming that interface{} is implemented as
type _interface struct {
type_info *typ
value unsafe.Pointer
}
setting fails with a SIGSEGV, although reading is successful.
func data(i interface{}) unsafe.Pointer {
return unsafe.Pointer((*((*[2]uintptr)(unsafe.Pointer(&i))))[1])
}
func main() {
var i interface{}
i = 2
fmt.Printf("%v, %v\n", (*int)(data(i)), *(*int)(data(i)))
*((*int)(data(i))) = 3
}
Am I doing something wrong, or is this not possible in golang?

Hm... Here's how I understand your second code example currently, in case I've made an error (if you notice anything amiss in what I'm describing, my answer is probably irredeemably wrong and you should ignore the rest of what I have to say).
Allocate memory for interface i in main.
Set the value of i to an integer type with the value 2.
Allocate memory for interface i in data.
Copy the value of main's i to data's i; that is, set the value of the new interface to an integer type with the value 2.
Cast the address of the new variable into a pointer to length-2 array of uintptr (with unsafe.Pointer serving as the intermediary that forces the compiler to accept this cast).
Cast the second element of the array (whose value is the address of the value-part of i in data) back into an unsafe.Pointer and return it.
I've made an attempt at doing the same thing in more steps, but unfortunately I encountered all the same problems: the program recognizes that I have a non-nil pointer and it's able to dereference the pointer for reading, but using the same pointer for writing produces a runtime error.
It's step 6 that go vet complains about, and I think it's because, according to the package docs,
A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer
semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed.
More to the point, from what I can tell (though I'll admit I'm having trouble digging up explicit confirmation without scanning the compiler and runtime source), the runtime doesn't appear to track the value-part of an interface{} type as a discrete pointer with its own reference count; you can, of course, trample over both the interface{}'s words by writing another interface value into the whole thing, but that doesn't appear to be what you wanted to do at all (write to the memory address of a pointer that is inside an interface type, all without moving the pointer).
What's interesting is that we seem to be able to approximate this behavior by just defining our own structured type that isn't given special treatment by the compiler (interfaces are clearly somewhat special, with type-assertion syntax and all). That is, we can use unsafe.Pointer to maintain a reference that points to a particular point in memory, and no matter what we cast it to, the memory address never moves even if the value changes (and the value can be reinterpreted by casting it to something else). The part that surprises me a bit is that, at least in my own example, and at least within the Playground environment, the value that is pointed to does not appear to have a fixed size; we can establish an address to write to once, and repeated writes to that address succeed even with huge (or tiny) amounts of data.
Of course, with at least this implementation, we lose a bunch of the other nice-to-have things we associate with interface types, especially non-empty interface types (i.e. with methods). So, there's no way to use this to (for example) make a super-sneaky "generic" type. It seems that an interface is its own value, and part of that value's definition is an address in memory, but it's not entirely the same thing as a pointer.

Related

Safety of using reflect.StringHeader in Go?

I have a small function which passes the pointer of Go string data to C (Lua library):
func (L *C.lua_State) pushLString(s string) {
gostr := (*reflect.StringHeader)(unsafe.Pointer(&s))
C.lua_pushlstring(L, (*C.char)(unsafe.Pointer(gostr.Data)), C.ulong(gostr.Len))
// lua_pushlstring copies the given string, not keeping the original pointer.
}
It works in simple tests, but from the documentations it's unclear whether this is safe at all.
According to Go document, the memory of reflect.StringHeader should be pinned for gostr, but the Stringheader.Data is already a uintptr, "an integer value with no pointer semantics" - which is itself odd because if it has no pointer semantics, wouldn't the field be completely useless as the memory may be moved right after the value is read? Or is the field treated specially like reflect.Value.Pointer? Or perhaps there is a different way of getting C pointer from string?
it's unclear whether this is safe at all.
Tapir Liui (https://twitter.com/TapirLiu/) dans Go101 (https://github.com/go101/go101) gives a clue as to the "safety" of reflect.StringHeader in this tweet:
Since Go 1.20, the reflect.StringHeader and reflect.SliceHeader types will be depreciated and not recommended to be used.
Accordingly, two functions, unsafe.StringData and unsafe.SliceData, will be introduced in Go 1.20 to take over the use cases of two old reflect types.
That was initially discussed in CL 401434, then in issue 53003.
The reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused.
As well, the types have always been documented as unstable and not to be relied upon.
We can see in Github code search that usage of these types is ubiquitous.
The most common use cases I've seen are:
converting []byte to string:
Equivalent to *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon.
Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule.
converting string to []byte:
commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule.
grabbing the Data pointer field for ffi or some other niche use converting a slice of one type to a slice of another type
Ian Lance Taylor adds:
One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap.
I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

How large is the struct underlying a map in golang?

I know that map is a reference type in Go (it has a pointer to the map entries memory region in its underlying struct). However, I would like to know what is the size of the underlying struct of the map because I want to know if using a pointer to a map as a function argument would be faster than not using a pointer.
Looking at this blog post it seems that the maptype struct has a lot of fields and that it would take a long time to copy (relative to a pointer).
Looking through the golang standard libraries I have found almost no use of *map[x]x so I guess using just map[x]x should be efficient as a function argument. So this leads me to think that maybe the compiler actually replaces map[x]x by a pointer to the maptype struct. Is that the case? If not what actually is happening that may avoid the copying of the maptype struct with its many fields?
The zero value for a Go map variable is a nil pointer.
var m map[string]int
make intializes a map and sets the map variable to point to a package runtime hmap struct.
m = make(map[string]int)
In Go, all arguments are passed by value. In the case of a map value, a reference type, a map value is a pointer. Therefore, passing a map value as a function or method argument is fast, you are passing a pointer.
The Go map runtime structs are currently located in the src/runtime/map.go Go source file. Since you only see a hmap pointer, their size is unlikely to be relevant.
See GopherCon 2016: Keith Randall - Inside the Map Implementation.

what should be used New() or var in go?

How a object should be created for a struct?
object := new(struct)
or
var object struct
I could not understatnd when to use what? and if both are same which one should be prefered?
The new syntax you're showing returns a pointer while the other one is a value. Check out this article here; https://golang.org/doc/effective_go.html#allocation_new
There's actually even one other option which I prefer. It's called composite literal and looks like this;
object := &struct{}
The example above is equivalent to your use of new. The cool thing about it is you can specify values for any property in struct within the brackets there.
When to use what is a decision you need to make on a case by case basis. In Go there are several reasons I would want one or the other; Perhaps only the pointer *myType implements some interface while myType does not, an instance myType could contain about 1 GB of data and you want to ensure you're passing a pointer and not the value to other methods, ect. The choice of which to use depends on the use case. Although I will say, pointers are rarely worse and because that's the case I almost always use them.
When you need a pointer object use new or composite literal else use var.
Use var whenever possible as this is more likely to be allocated in stack and memory get freed as soon as scope ends. I case of new memory gets allocated most likely in heap and need to be garbage collected.

using new vs. { } when initializing a struct in Go

So i know in go you can initialize a struct two different ways in GO. One of them is using the new keyword which returns a pointer to the struct in memory. Or you can use the { } to make a struct. My question is when is appropriate to use each?
Thanks
I prefer {} when the full value of the type is known and new() when the value is going to be populated incrementally.
In the former case, adding a new parameter may involve adding a new field initializer. In the latter it should probably be added to whatever code is composing the value.
Note that the &T{} syntax is only allowed when T is a struct, array, slice or map type.
Going off of what #Volker said, it's generally preferable to use &A{} for pointers (and this doesn't necessarily have to be zero values: if I have a struct with a single integer in it, I could do &A{1} to initialize the field). Besides being a stylistic concern, the big reason that people normally prefer this syntax is that, unlike new, it doesn't always actually allocate memory in the heap. If the go compiler can be sure that the pointer will never be used outside of the function, it will simply allocate the struct as a local variable, which is much more efficient than calling new.
Most people use A{} to create a zero value of type A, &A{} to create a pointer to a zero value of type A. Using newis only necessary for int and that like as int{} is a no go.

Go receiver methods calling syntax confusion

I was just reading through Effective Go and in the Pointers vs. Values section, near the end it says:
The rule about pointers vs. values for receivers is that value methods can be invoked on pointers and values, but pointer methods can only be invoked on pointers. This is because pointer methods can modify the receiver; invoking them on a copy of the value would cause those modifications to be discarded.
To test it, I wrote this:
package main
import (
"fmt"
"reflect"
)
type age int
func (a age) String() string {
return fmt.Sprintf("%d yeasr(s) old", int(a))
}
func (a *age) Set(newAge int) {
if newAge >= 0 {
*a = age(newAge)
}
}
func main() {
var vAge age = 5
pAge := new(age)
fmt.Printf("TypeOf =>\n\tvAge: %v\n\tpAge: %v\n", reflect.TypeOf(vAge),
reflect.TypeOf(pAge))
fmt.Printf("vAge.String(): %v\n", vAge.String())
fmt.Printf("vAge.Set(10)\n")
vAge.Set(10)
fmt.Printf("vAge.String(): %v\n", vAge.String())
fmt.Printf("pAge.String(): %v\n", pAge.String())
fmt.Printf("pAge.Set(10)\n")
pAge.Set(10)
fmt.Printf("pAge.String(): %v\n", pAge.String())
}
And it compiles, even though the document says it shouldn't since the pointer method Set() should not be invocable through the value var vAge. Am I doing something wrong here?
That's valid because vAge is addressable. See the last paragraph in Calls under the language spec:
A method call x.m() is valid if the method set of (the type of) x
contains m and the argument list can be assigned to the parameter list
of m. If x is addressable and &x's method set contains m, x.m() is
shorthand for (&x).m().
vAge is not considered as only a "value variable", because it's a known location in memory that stores a value of type age. Looking at vAge only as its value, vAge.Set(10) is not valid as an expression on its own, but because vAge is addressable, the spec declares that it's okay to treat the expression as shorthand for "get the address of vAge, and call Set on that" at compile-time, when we will be able to verify that Set is part of the method set for either age or *age. You're basically allowing the compiler to do a textual expansion on the original expression if it determines that it's necessary and possible.
Meanwhile, the compiler will allow you to call age(23).String() but not age(23).Set(10). In this case, we're working with a non-addressable value of type age. Since it's not valid to say &age(23), it can't be valid to say (&age(23)).Set(10); the compiler won't do that expansion.
Looking at the Effective Go example, you're not directly calling b.Write() at the scope where we know b's full type. You're instead making a temporary copy of b and trying to pass it off as a value of type interface io.Writer(). The problem is that the implementation of Printf doesn't know anything about the object being passed in except that it has promised it knows how to receive Write(), so it doesn't know to take a byteSlice and turn it into a *ByteSlice before calling the function. The decision of whether to address b has to happen at compile time, and PrintF was compiled with the precondition that its first argument would know how to receive Write() without being referenced.
You may think that if the system knows how to take an age pointer and convert it to an age value, that it should be able to do the reverse; t doesn't really make sense to be able to, though. In the Effective Go example, if you were to pass b instead of &b, you'd modify a slice that would no longer exist after PrintF returns, which is hardly useful. In my age example above, it literally makes no sense to take the value 23 and overwrite it with the value 10. In the first case, it makes sense for the compiler to stop and ask the programmer what she really meant to do when handing b off. In the latter case, it of course makes sense for the compiler to refuse to modify a constant value.
Furthermore, I don't think the system is dynamically extending age's method set to *age; my wild guess is that pointer types are statically given a method for each of the base type's methods, which just dereferences the pointer and calls the base's method. It's safe to do this automatically, as nothing in a receive-by-value method can change the pointer anyway. In the other direction, it doesn't always make sense to extend a set of methods that are asking to modify data by wrapping them in a way that the data they modify disappears shortly thereafter. There are definitely cases where it makes sense to do this, but this needs to be decided explicitly by the programmer, and it makes sense for the compiler to stop and ask for such.
tl;dr I think that the paragraph in Effective Go could use a bit of rewording (although I'm probably too long-winded to take the job), but it's correct. A pointer of type *X effectively has access to all of X's methods, but 'X' does not have access to *X's. Therefore, when determining whether an object can fulfill a given interface, *X is allowed to fulfill any interface X can, but the converse is not true. Furthermore, even though a variable of type X in scope is known to be addressable at compile-time--so the compiler can convert it to a *X--it will refuse to do so for the purposes of interface fulfillment because doing so may not make sense.

Resources