Byte slice converted with unsafe from string changes its address - go

I have this function to convert string to slice of bytes without copying
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
That works fine, but with very specific setup gives very strange behavior:
The setup is here: https://github.com/leviska/go-unsafe-gc/blob/main/pkg/pkg_test.go
What happens:
Create a byte slice
Convert it into temporary (rvalue) string and with unsafe convert it into byte slice again
Then, copy this slice (by reference)
Then, do something with the second slice inside goroutine
Print the pointers before and after
And I have this output on my linux mint laptop with go 1.16:
go test ./pkg -v -count=1
=== RUN TestSomething
0xc000046720 123 0xc000046720 123
0xc000076f20 123 0xc000046721 z
--- PASS: TestSomething (0.84s)
PASS
ok github.com/leviska/go-unsafe-gc/pkg 0.847s
So, the first slice magically changes its address, while the second isn't
If we remove the goroutine with runtime.GC() (and may be play with the code a little bit), we can get the both pointers to change the value (to the same one).
If we change the unsafe cast to just []byte() everything works without changing the addresses. Also, if we change it to the unsafe cast from here https://stackoverflow.com/a/66218124/5516391 everything works the same.
func StringToByteUnsafe(str string) []byte { // this works fine
var buf = *(*[]byte)(unsafe.Pointer(&str))
(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
return buf
}
I run it with GOGC=off and got the same result. I run it with -race and got no errors.
If you run this as main package with main function, it seems to work correctly. Also if you remove the Convert function. My guess is that compiler optimizes stuff in this cases.
So, I have several questions about this:
What the hell is happening? Looks like a weird UB
Why and how go runtime magically changes the address of the variable?
Why in concurentless case it can change both addresses, while in concurrent can't?
What's the difference between this unsafe cast and the cast from stackoverflow answer? Why it does work?
Or is this just a compiler bug?
A copy of the full code from github, you need to put it in some package and run as test:
import (
"fmt"
"reflect"
"sync"
"testing"
"unsafe"
)
func StringToByteUnsafe(s string) []byte {
strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
var sh reflect.SliceHeader
sh.Data = strh.Data
sh.Len = strh.Len
sh.Cap = strh.Len
return *(*[]byte)(unsafe.Pointer(&sh))
}
func Convert(s []byte) []byte {
return StringToByteUnsafe(string(s))
}
type T struct {
S []byte
}
func Copy(s []byte) T {
return T{S: s}
}
func Mid(a []byte, b []byte) []byte {
fmt.Printf("%p %s %p %s\n", a, a, b, b)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
b = b[1:2]
wg.Done()
}()
wg.Wait()
fmt.Printf("%p %s %p %s\n", a, a, b, b)
return b
}
func TestSomething(t *testing.T) {
str := "123"
a := Convert([]byte(str))
b := Copy(a)
Mid(a, b.S)
}

Answer from the github issue https://github.com/golang/go/issues/47247
The backing store of a is allocated on stack, because it does not
escape. And goroutine stacks can move dynamically. b, on the other
hand, escapes to heap, because it is passed to another goroutine. In
general, we don't assume the address of an object don't change.
This works as intended.
And my version is incorrect because
it uses reflect.SliceHeader as plain struct. You can run go vet on it,
and go vet will warn you.`

Related

golang get char* as return value from dll

I'm using golang to call a Dll function like char* fn(), the dll is not written by myself and I cannot change it. Here's my code:
package main
import (
"fmt"
"syscall"
"unsafe"
)
func main() {
dll := syscall.MustLoadDLL("my.dll")
fn := dll.MustFindProc("fn")
r, _, _ := fn.Call()
p := (*byte)(unsafe.Pointer(r))
// define a slice to fill with the p string
data := make([]byte, 0)
// loop until find '\0'
for *p != 0 {
data = append(data, *p) // append 1 byte
r += unsafe.Sizeof(byte(0)) // move r to next byte
p = (*byte)(unsafe.Pointer(r)) // get the byte value
}
name := string(data) // convert to Golang string
fmt.Println(name)
}
I have some questions:
Is there any better way of doing this? There're hundred of dll functions like this, I'll have to write the loop for all functions.
For very-long-string like 100k+ bytes, will append() cause performance issue?
Solved. the unsafe.Pointer(r) causes linter govet shows warning possible misuse of unsafe.Pointer, but the code runs fine, how to avoid this warning? Solution: This can be solved by adding -unsafeptr=false to govet command line, for vim-ale, add let g:ale_go_govet_options = '-unsafeptr=false'.
Casting uintptr as upointer is haram.
You must read the rules:
https://golang.org/pkg/unsafe/#Pointer
But there's hacky way, that shouldn't produce warning:
//go:linkname gostringn runtime.gostringn
func gostringn(p uintptr, l int) string
//go:linkname findnull runtime.findnull
//go:nosplit
func findnull(s uintptr) int
// ....
name := gostringn(r, findnull(r))
Functions takes pointer, but we link them from runtime as uintptr because they have same sizeof.
Might work in theory. But is also frowned upon.
Getting back to your code, as JimB said, you could do it one line with:
name := C.GoString((*C.char)(unsafe.Pointer(r)))
I got the following solution by tracking the os.Args of the go source code, But I am based on go1.17. If you are in another version, you can read the source code to solve it.
func UintPtrToString(r uintptr) string {
p := (*uint16)(unsafe.Pointer(r))
if p == nil {
return ""
}
n, end, add := 0, unsafe.Pointer(p), unsafe.Sizeof(*p)
for *(*uint16)(end) != 0 {
end = unsafe.Add(end, add)
n++
}
return string(utf16.Decode(unsafe.Slice(p, n)))
}

golang proxy io.Writer behaves differently when used in a log.Logger

I am trying to implement a proxy that satisfies io.Writer, so I can plug it into a logger. The idea is that it will print the output like normal but also keeps a copy of the data to be read later.
The ProxyIO struct in the following code should do this, and indeed it does it as long as I directly call its Write() method. However, when I plug it into a log.Logger instance the output is unexpected.
(This is stripped down code, the original implementation I want to use is with a map and a circular pointer instead of the [][]byte buf used in the example code. Also I removed all the locking.)
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
)
type ProxyIO struct {
out io.Writer // the io we are proxying
buf [][]byte
}
func newProxyIO(out io.Writer) *ProxyIO {
return &ProxyIO{
out: out,
buf: [][]byte{},
}
}
func (r *ProxyIO) Write(s []byte) (int, error) {
r.out.Write(s)
r.buf = append(r.buf, s)
return len(s), nil
}
func main() {
p := newProxyIO(ioutil.Discard)
p.Write([]byte("test1\n"))
p.Write([]byte("test2\n"))
p.Write([]byte("test3\n"))
l := log.New(p, "", 0)
l.Print("test4")
l.Print("test5")
l.Print("test6")
for i, e := range p.buf {
fmt.Printf("%d: %s", i, e)
}
}
(Here is the code on the playground https://play.golang.org/p/UoOq4Nd-rmI)
I would expect the following output from this code:
0: test1
1: test2
2: test3
3: test4
4: test5
5: test6
However, it will always print this:
0: test1
1: test2
2: test3
3: test6
4: test6
5: test6
The behaviour with my map implementation is the same. I also tried using a doubly linked list from container/list as storage, it's always the same. So I must be missing something substantial here.
Why am I seeing the last log output three times in the buffer instead of the last three lines of log output?
If you look at the source code for Logger.Print you'll see it calls logger.Output. You'll notice how it sets the value of the string to l.buf and then calls Write
If you read this answer you'll see that even though everything is pass by value
when you pass a slice to a function, a copy will be made from this
header, including the pointer, which will point to the same backing
array.
So when you do:
l.Print("test4")
l.Print("test5")
l.Print("test6")
Logger is effectively reusing the same slice and you're appending a reference to that same slice three times so naturally upon printing it uses the most recent value set three times.
To fix this you can copy the []byte before using it like this:
func (r *ProxyIO) Write(s []byte) (int, error) {
c := make([]byte, len(s))
copy(c, s)
r.out.Write(c)
r.buf = append(r.buf, c)
return len(c), nil
}
Updated playground: https://play.golang.org/p/DIWC1Xa6w0R

Using reflect to update value by reference when argument is not a pointer in go

I've had difficulty learning the basics of reflect, pointers and interface in go, so here's another entry level question I can't seem to figure out.
This code does what I want it to do - I'm using reflect to add another record to a slice that's typed as an interface.
package main
import (
"reflect"
"log"
)
type Person struct {
Name string
}
func Add(slice interface{}) {
s := reflect.ValueOf(slice).Elem()
// in my actual code, p is declared via the use of reflect.New([Type])
p := Person{Name:"Sam"}
s.Set(reflect.Append(s,reflect.ValueOf(p)))
}
func main() {
p := []Person{}
Add(&p)
log.Println(p)
}
If I changed the Add and main function to this, things don't work the way I want it to.
func Add(slice interface{}) {
s := reflect.ValueOf(&slice).Elem()
p := Person{Name:"Sam"}
s.Set(reflect.Append(reflect.ValueOf(slice),reflect.ValueOf(p)))
log.Println(s)
}
func main() {
p := []Person{}
Add(p)
log.Println(p)
}
That is, the log.Println(p) at the end doesn't show a slice with the record Sam in it like the way I had hoped. So my question is whether it's possible for me to have Add() receive a slice that is not a pointer, and for me to still write some code in Add() that will produce the outcome shown in my first scenario?
A lot of my recent questions dance around this kind of subject, so it's still taking me a while to figure out how to use the reflect package effectively.
No, it's not possible to append to a slice in a function without passing in a pointer to the slice. This isn't related to reflection, but to how variables are passed in to functions. Here's the same code, modified to not use reflection:
package main
import (
"log"
)
type Person struct {
Name string
}
func AddWithPtr(slicep interface{}) {
sp := slicep.(*[]Person)
// This modifies p1 itself, since *sp IS p1
*sp = append(*sp, Person{"Sam"})
}
func Add(slice interface{}) {
// s is now a copy of p2
s := slice.([]Person)
sp := &s
// This modifies a copy of p2 (i.e. s), not p2 itself
*sp = append(*sp, Person{"Sam"})
}
func main() {
p1 := []Person{}
// This passes a reference to p1
AddWithPtr(&p1)
log.Println("Add with pointer: ", p1)
p2 := []Person{}
// This passes a copy of p2
Add(p2)
log.Println("Add without pointer:", p2)
}
(Above, when it says 'copy' of the slice, it doesn't mean the copy of the underlying data - just the slice)
When you pass in a slice, the function effectively gets a new slice that refers to the same data as the original. Appending to the slice in the function increases the length of the new slice, but doesn't change the length of the original slice that was passed in. That's why the original slice remains unchanged.

How to atomic store & load an interface in golang?

I want to write some code like this:
var myValue interface{}
func GetMyValue() interface{} {
return atomic.Load(myValue)
}
func StoreMyValue(newValue interface{}) {
atomic.Store(myValue, newValue)
}
It seems like that i can use LoadUintptr(addr *uintptr) (val uintptr) and StoreUintptr(addr *uintptr, val uintptr) in atomic package to achive this,but i do not know how to convert between uintptr,unsafe.Pointer and interface{}.
If i do it like this:
var V interface{}
func F(v interface{}) {
p := unsafe.Pointer(&V)
atomic.StorePointer(&p, unsafe.Pointer(&v))
}
func main() {
V = 1
F(2)
fmt.Println(V)
}
the V will always be 1
If I'm not mistaken you want atomic Value. You can store and fetch values atomically with it (signatures are interface{} but you should put same type into it). It does some unsafe pointer stuff under the hood like what you wanted to do.
Sample from docs:
var config Value // holds current server configuration
// Create initial config value and store into config.
config.Store(loadConfig())
go func() {
// Reload config every 10 seconds
// and update config value with the new version.
for {
time.Sleep(10 * time.Second)
config.Store(loadConfig())
}
}()
// Create worker goroutines that handle incoming requests
// using the latest config value.
for i := 0; i < 10; i++ {
go func() {
for r := range requests() {
c := config.Load()
// Handle request r using config c.
_, _ = r, c
}
}()
}
Here's a way to use atomic.StorePointer and atomic.LoadPointer (based on your example):
package main
import (
"fmt"
"sync/atomic"
"unsafe"
)
var addr unsafe.Pointer
func GetMyValue() *interface{} {
return (*interface{})(atomic.LoadPointer(&addr))
}
func StoreMyValue(newValue *interface{}) {
atomic.StorePointer(&addr, unsafe.Pointer(newValue))
}
func main() {
var i interface{}
i = 1
StoreMyValue(&i)
fmt.Println("before:", *GetMyValue())
i = 2
StoreMyValue(&i)
fmt.Println("after", *GetMyValue())
}
Playground link
Note that this will not make your object thread-safe. Only the pointer is stored/loaded atomically. Also, I would avoid using interface{} and prefer concrete types whenever possible.
As an alternative to using 'any' (interface{}), Go 1.19 (Q3 2022) comes with new types in the sync/atomic package that make it easier to use atomic values, such as atomic.Int64 and atomic.Pointer[T].
That would be easier than using atomic.StorePointer.
This comes from issue 50860 "sync/atomic: add typed atomic values".
And CL 381317
Pointer[T] also avoids conversions using unsafe.Pointer at call sites.
You cannot do this.
You will have to protect the store/load with a mutex.
The internal representation of an interface is not specified by the language and might (is) to large to be handled by package atomic.

Golang interface benefits

I read about the interfaces a lot and I think I understand how it works. I read about the interface{} type and use it to take an argument of function. It is clear. My question (and what I don't understand) is what is my benefit if I am using it. It is possible I didn't get it entirely but for example I have this:
package main
import (
"fmt"
)
func PrintAll(vals []interface{}) {
for _, val := range vals {
fmt.Println(val)
}
}
func main() {
names := []string{"stanley", "david", "oscar"}
vals := make([]interface{}, len(names))
for i, v := range names {
vals[i] = v
}
PrintAll(vals)
}
Why is it better than this:
package main
import (
"fmt"
)
func PrintAll(vals []string) {
for _, val := range vals {
fmt.Println(val)
}
}
func main() {
names := []string{"stanley", "david", "oscar"}
PrintAll(names)
}
If you're always want to print string values, then the first using []interface{} is not better at all, it's worse as you lose some compile-time checking: it won't warn you if you pass a slice which contains values other than strings.
If you want to print values other than strings, then the second with []string wouldn't even compile.
For example the first also handles this:
PrintAll([]interface{}{"one", 2, 3.3})
While the 2nd would give you a compile-time error:
cannot use []interface {} literal (type []interface {}) as type []string in argument to PrintAll
The 2nd gives you compile-time guarantee that only a slice of type []string is passed; should you attempt to pass anything other will result in compile-time error.
Also see related question: Why are interfaces needed in Golang?

Resources