Safety of using reflect.StringHeader in Go? - go

I have a small function which passes the pointer of Go string data to C (Lua library):
func (L *C.lua_State) pushLString(s string) {
gostr := (*reflect.StringHeader)(unsafe.Pointer(&s))
C.lua_pushlstring(L, (*C.char)(unsafe.Pointer(gostr.Data)), C.ulong(gostr.Len))
// lua_pushlstring copies the given string, not keeping the original pointer.
}
It works in simple tests, but from the documentations it's unclear whether this is safe at all.
According to Go document, the memory of reflect.StringHeader should be pinned for gostr, but the Stringheader.Data is already a uintptr, "an integer value with no pointer semantics" - which is itself odd because if it has no pointer semantics, wouldn't the field be completely useless as the memory may be moved right after the value is read? Or is the field treated specially like reflect.Value.Pointer? Or perhaps there is a different way of getting C pointer from string?

it's unclear whether this is safe at all.
Tapir Liui (https://twitter.com/TapirLiu/) dans Go101 (https://github.com/go101/go101) gives a clue as to the "safety" of reflect.StringHeader in this tweet:
Since Go 1.20, the reflect.StringHeader and reflect.SliceHeader types will be depreciated and not recommended to be used.
Accordingly, two functions, unsafe.StringData and unsafe.SliceData, will be introduced in Go 1.20 to take over the use cases of two old reflect types.
That was initially discussed in CL 401434, then in issue 53003.
The reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused.
As well, the types have always been documented as unstable and not to be relied upon.
We can see in Github code search that usage of these types is ubiquitous.
The most common use cases I've seen are:
converting []byte to string:
Equivalent to *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon.
Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule.
converting string to []byte:
commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule.
grabbing the Data pointer field for ffi or some other niche use converting a slice of one type to a slice of another type
Ian Lance Taylor adds:
One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap.
I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

Related

How can I clone a strings.Builder in Go?

The Go programming language's standard library exposes a struct called strings.Builder which allows for easy building of strings through repeated concatenation in an efficient way, similar to C# or Java's StringBuilder.
In Java I would use StringBuilder's constructor to "clone" the object, like this:
StringBuilder newBuffer = new StringBuilder(oldBuffer.toString());
in Go, I can only see the following two-line way:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
and no other .Clone() an initializer method (which I might have just not found yet).
Is there another way that would be more brief/concise than the one I have presented?
Going into unnecessary detail for curiosity's sake...
After considering the documentation, here are your main issues:
The only exported way to read data out of the Builder is the Builder.String method.
It is not safe to copy a Builder value once you have manipulated it.
Let's look at this version:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
My first thought about why this isn't desirable is because the Builder internally uses a byte slice (mutable data type), and returns a string (immutable data type). Even though a string's underlying representation is the same as a byte slice, due to this mutability rule it would require a copy to convert to string. This means that by the time you write the string to the new buffer, you're already on your second copy when your task intuitively only requires a single copy.
Actually taking a look at the source code, however, we'll see that this assumption is wrong:
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
Using the unsafe package, the strings package basically "hacks" the buffer ([]byte) directly into a string. Again, these data types are the same on a memory level: A pointer to the start of the string or slice, and a pointer offset describing how many bytes long the string or slice is. These data types are just headers, so no copying of the buffer has occurred here.
This creates the uncomfortable situation where you have a string which is supposed to be immutable, but you still have a byte slice somewhere that could mutate those underlying bytes. The package is called unsafe after all, and this is a good example of why that is.
Because the strings.Builder is purely a "builder", i.e. it can only create new parts of the string and never modify data that's already written, we still get the immutability of our string that the language "guarantees". The only way we can break that rule is by gaining access to the internal buf of the Builder, but as that field is un-exported, you would again need to employ unsafe yourself to access it.
Summary:
The straightforward method you came up with, while perhaps a line (or two) longer than one might hope for, it is the definitive and correct way to do it. It's already as efficient as you're going to get, even if you bring out the more gritty features of Go like unsafe and reflect.
I hope that this has been informative. Here are the only suggested changes to your code:
// clone the builder contents. this is fast.
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())

SIGSEGV when writing to, but not reading from a memory location in golang

I was under the impression that using the unsafe package allows you to read/write arbitrary data. I'm trying to change the value the interface{} points to without changing the pointer itself.
Assuming that interface{} is implemented as
type _interface struct {
type_info *typ
value unsafe.Pointer
}
setting fails with a SIGSEGV, although reading is successful.
func data(i interface{}) unsafe.Pointer {
return unsafe.Pointer((*((*[2]uintptr)(unsafe.Pointer(&i))))[1])
}
func main() {
var i interface{}
i = 2
fmt.Printf("%v, %v\n", (*int)(data(i)), *(*int)(data(i)))
*((*int)(data(i))) = 3
}
Am I doing something wrong, or is this not possible in golang?
Hm... Here's how I understand your second code example currently, in case I've made an error (if you notice anything amiss in what I'm describing, my answer is probably irredeemably wrong and you should ignore the rest of what I have to say).
Allocate memory for interface i in main.
Set the value of i to an integer type with the value 2.
Allocate memory for interface i in data.
Copy the value of main's i to data's i; that is, set the value of the new interface to an integer type with the value 2.
Cast the address of the new variable into a pointer to length-2 array of uintptr (with unsafe.Pointer serving as the intermediary that forces the compiler to accept this cast).
Cast the second element of the array (whose value is the address of the value-part of i in data) back into an unsafe.Pointer and return it.
I've made an attempt at doing the same thing in more steps, but unfortunately I encountered all the same problems: the program recognizes that I have a non-nil pointer and it's able to dereference the pointer for reading, but using the same pointer for writing produces a runtime error.
It's step 6 that go vet complains about, and I think it's because, according to the package docs,
A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer
semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed.
More to the point, from what I can tell (though I'll admit I'm having trouble digging up explicit confirmation without scanning the compiler and runtime source), the runtime doesn't appear to track the value-part of an interface{} type as a discrete pointer with its own reference count; you can, of course, trample over both the interface{}'s words by writing another interface value into the whole thing, but that doesn't appear to be what you wanted to do at all (write to the memory address of a pointer that is inside an interface type, all without moving the pointer).
What's interesting is that we seem to be able to approximate this behavior by just defining our own structured type that isn't given special treatment by the compiler (interfaces are clearly somewhat special, with type-assertion syntax and all). That is, we can use unsafe.Pointer to maintain a reference that points to a particular point in memory, and no matter what we cast it to, the memory address never moves even if the value changes (and the value can be reinterpreted by casting it to something else). The part that surprises me a bit is that, at least in my own example, and at least within the Playground environment, the value that is pointed to does not appear to have a fixed size; we can establish an address to write to once, and repeated writes to that address succeed even with huge (or tiny) amounts of data.
Of course, with at least this implementation, we lose a bunch of the other nice-to-have things we associate with interface types, especially non-empty interface types (i.e. with methods). So, there's no way to use this to (for example) make a super-sneaky "generic" type. It seems that an interface is its own value, and part of that value's definition is an address in memory, but it's not entirely the same thing as a pointer.

Copying reference to pointer or by value

I think I understand the answer from here but just in case, I want to explicitly ask about the following (my apologies if you think it is the same question, but to me, it feels different on the concerns):
func f() *int {
d := 6
pD := new(int)
pD = &d // option 1
*pD = d // option 2
return pD
}
The first option where I just copy the reference as a pointer is performance-wise, more optimal (this is educational guess, but it seems obvious). I would prefer this method/pattern.
The second option would (shallow) copy (?) instead. What I presume is that this method, because it copies, I have no concerns about GC sweeping the instance of 'd'. I often use this method due to my insecurity (or ignorance as a beginner).
What I am concerned about (or more so, insecure about) is that in the first method (where address of 'd' is transfered), will GC recognize that it (the 'd' variable) is referenced by a pointer container, thus it will not be swept? Thus it will be safe to use this method instead? I.e. can I safely pass around pointer 'pD' returned from func 'f()' for the lifetime of the application?
Reference: https://play.golang.org/p/JWNf5yRd_B
There is no better place to look than the official documentation:
func NewFile(fd int, name string) *File {
if fd < 0 {
return nil
}
f := File{fd, name, nil, 0}
return &f
}
Note that, unlike in C, it's perfectly OK to return the address
of a local variable; the storage associated with the variable survives
after the function returns. In fact, taking the address of a composite
literal allocates a fresh instance each time it is evaluated, so we
can combine these last two lines.
(source: "Effective Go")
So the first option (returning a pointer to a local variable) is absolutely safe and even encouraged. By performing escape analysis the compiler can tell that a variable escapes its local scope and allocates it on the heap instead.
In short: No.
First: There are no "references" in Go. Forget about this idea now, otherwise you'll hurt yourself. Really. Thinking about "by reference" is plain wrong.
Second: Performance is totally the same. Forget about this type of nano optimisations now. Especially when dealing with int. If and only if you have a performance problem: Measure, then optimize. It might be intuitively appealing to think "Handing around a tiny pointer of 8 bytes must be much faster than copying structs with 30 or even 100 bytes." It is not, at least it is not that simple.
Third: Just write it a func f() *int { d := 6; return &d; }. There is no need to do any fancy dances here.
Fourth: Option 2 makes a "deep copy" of the int. But this might be misleading as there are no "shallow copies" of an int so I'm unsure if I understand what you are asking here. Go has no notion of deep vs. shallow copy. If you copy a pointer value the pointer value is copied. You remember the first point? There are no references in Go. A pointer value is a value if copied you have a copy of the pointer value. Such a copy does absolutely nothing to the value pointed to, especially it doesn't do a copy. This would hint that copies in Go are not "deep". Forget about deep/shallow copy when talking about Go. (Of course you can implement functions which perform a "deep copy" of your custom objects)
Fifth: Go has a properly working garbage collector. It makes absolutely no difference what you do: While an object is live it won't be collected and once it can be collected it will be. You can pass, return, copy, hand over, take address, dereference pointers or whatever you like, it just does not matter. The GC works properly. (Unless you are deliberately looking for pain and errors by using package unsafe.)

Is there a builtin func named "int32"?

The below snippet works fine.
In this case, what "int32" is? A func?
I know there is a type named "int32"
This could be a stupid question. I've just finished A Tour of Go but I could not find the answer.(it's possible I'm missing something.)
package main
import "fmt"
func main() {
var number = int32(5)
fmt.Println(number) //5
}
It is a type conversion, which is required for numeric types.
Conversions are required when different numeric types are mixed in an expression or assignment. For instance, int32 and int are not the same type even though they may have the same size on a particular architecture.
Since you do a variable declaration, you need to specify the type of '5'.
Another option, as mentioned by rightfold in the comments is: var number int32 = 5
(as opposed to a short variable declaration like number := 5)
See also Go FAQ:
The convenience of automatic conversion between numeric types in C is outweighed by the confusion it causes.
When is an expression unsigned? How big is the value? Does it overflow? Is the result portable, independent of the machine on which it executes?
It also complicates the compiler; “the usual arithmetic conversions” are not easy to implement and inconsistent across architectures.
For reasons of portability, we decided to make things clear and straightforward at the cost of some explicit conversions in the code. The definition of constants in Go—arbitrary precision values free of signedness and size annotations—ameliorates matters considerably, though.
A related detail is that, unlike in C, int and int64 are distinct types even if int is a 64-bit type.
The int type is generic; if you care about how many bits an integer holds, Go encourages you to be explicit.

using new vs. { } when initializing a struct in Go

So i know in go you can initialize a struct two different ways in GO. One of them is using the new keyword which returns a pointer to the struct in memory. Or you can use the { } to make a struct. My question is when is appropriate to use each?
Thanks
I prefer {} when the full value of the type is known and new() when the value is going to be populated incrementally.
In the former case, adding a new parameter may involve adding a new field initializer. In the latter it should probably be added to whatever code is composing the value.
Note that the &T{} syntax is only allowed when T is a struct, array, slice or map type.
Going off of what #Volker said, it's generally preferable to use &A{} for pointers (and this doesn't necessarily have to be zero values: if I have a struct with a single integer in it, I could do &A{1} to initialize the field). Besides being a stylistic concern, the big reason that people normally prefer this syntax is that, unlike new, it doesn't always actually allocate memory in the heap. If the go compiler can be sure that the pointer will never be used outside of the function, it will simply allocate the struct as a local variable, which is much more efficient than calling new.
Most people use A{} to create a zero value of type A, &A{} to create a pointer to a zero value of type A. Using newis only necessary for int and that like as int{} is a no go.

Resources