Access the raw bytes of a string - go

I'm trying to call a C function that expects a C string (char*) from go. I know about the C.CString function documented in the cgo documentation but as the function I'm calling will already make a copy, I'm trying to avoid the one Cstring makes.
Right now, I'm doing this, s being a go string
var cs *C.char = (*C.char)( unsafe.Pointer(& []byte(s) [0]))
But I get the feeling that the []bytes(s) is making its own copy. Is it possible to just get the char* ?

If you're doing this enough times that performance is a concern, it would really be advisable to keep the data in a slice to begin with.
If you really want to access to the address of the string, you can use the unsafe package to convert it into a struct matching the string header. Using the reflect.StringHeader type:
p := unsafe.Pointer((*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data)
Or using a slice as a proxy, since they both put the data pointer and length integers in the same field locations
p := unsafe.Pointer(&(*(*[]byte)(unsafe.Pointer(&s)))[0])
Or because the data pointer is first, you could use a uintptr alone
p := unsafe.Pointer(*(*uintptr)(unsafe.Pointer(&s)))
https://play.golang.org/p/ps1Py7Ax6QK
None of these ways are guaranteed to work in all cases, or in future versions of Go, and none of the options are going to guarantee a null terminated string.
The best, supported option is to create a shim in the cgo preamble to accept the go string, and convert it to a *char. CGO provides access to the following function to do this:
const char *_GoStringPtr(_GoString_ s);
See the Go references to C section in the documentation.

Related

Get the underlying type from a string in go? [duplicate]

Is there a way to use the reflection libraries in Go to go from the name of a type to its Type representation?
I've got a library where the user needs to provide Type representations for some code generation. I know it must be possible (in a sense) because they can just create a variable of that type and call the TypeOf function, but is there a way to circumvent this and just get representation from the name?
The question is not quite explicit, it can be interpreted in 2 ways, to one of which the answer is no, not possible; and the other to which the answer is yes, it's possible.
At runtime
If the type name is provided as a string value, then at runtime it's not possible as types that are not referred to explicitly may not get compiled into the final executable binary (and thus obviously become unreachable, "unknown" at runtime). For details see Splitting client/server code. For possible workarounds see Call all functions with special prefix or suffix in Golang.
At "coding" time
If we're talking about "coding" time (source code writing / generating), then it's possible without creating / allocating a variable of the given type and calling reflect.TypeOf() and passing the variable.
You may start from the pointer to the type, and use a typed nil pointer value without allocation, and you can navigate from its reflect.Type descriptor to the descriptor of the base type (or element type) of the pointer using Type.Elem().
This is how it looks like:
t := reflect.TypeOf((*YourType)(nil)).Elem()
The type descriptor t above will be identical to t2 below:
var x YourType
t2 := reflect.TypeOf(x)
fmt.Println(t, t2)
fmt.Println(t == t2)
Output of the above application (try it on the Go Playground):
main.YourType main.YourType
true

How can I clone a strings.Builder in Go?

The Go programming language's standard library exposes a struct called strings.Builder which allows for easy building of strings through repeated concatenation in an efficient way, similar to C# or Java's StringBuilder.
In Java I would use StringBuilder's constructor to "clone" the object, like this:
StringBuilder newBuffer = new StringBuilder(oldBuffer.toString());
in Go, I can only see the following two-line way:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
and no other .Clone() an initializer method (which I might have just not found yet).
Is there another way that would be more brief/concise than the one I have presented?
Going into unnecessary detail for curiosity's sake...
After considering the documentation, here are your main issues:
The only exported way to read data out of the Builder is the Builder.String method.
It is not safe to copy a Builder value once you have manipulated it.
Let's look at this version:
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())
My first thought about why this isn't desirable is because the Builder internally uses a byte slice (mutable data type), and returns a string (immutable data type). Even though a string's underlying representation is the same as a byte slice, due to this mutability rule it would require a copy to convert to string. This means that by the time you write the string to the new buffer, you're already on your second copy when your task intuitively only requires a single copy.
Actually taking a look at the source code, however, we'll see that this assumption is wrong:
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
Using the unsafe package, the strings package basically "hacks" the buffer ([]byte) directly into a string. Again, these data types are the same on a memory level: A pointer to the start of the string or slice, and a pointer offset describing how many bytes long the string or slice is. These data types are just headers, so no copying of the buffer has occurred here.
This creates the uncomfortable situation where you have a string which is supposed to be immutable, but you still have a byte slice somewhere that could mutate those underlying bytes. The package is called unsafe after all, and this is a good example of why that is.
Because the strings.Builder is purely a "builder", i.e. it can only create new parts of the string and never modify data that's already written, we still get the immutability of our string that the language "guarantees". The only way we can break that rule is by gaining access to the internal buf of the Builder, but as that field is un-exported, you would again need to employ unsafe yourself to access it.
Summary:
The straightforward method you came up with, while perhaps a line (or two) longer than one might hope for, it is the definitive and correct way to do it. It's already as efficient as you're going to get, even if you bring out the more gritty features of Go like unsafe and reflect.
I hope that this has been informative. Here are the only suggested changes to your code:
// clone the builder contents. this is fast.
newBuffer := strings.Builder{}
newBuffer.WriteString(oldBuffer.String())

Safety of using reflect.StringHeader in Go?

I have a small function which passes the pointer of Go string data to C (Lua library):
func (L *C.lua_State) pushLString(s string) {
gostr := (*reflect.StringHeader)(unsafe.Pointer(&s))
C.lua_pushlstring(L, (*C.char)(unsafe.Pointer(gostr.Data)), C.ulong(gostr.Len))
// lua_pushlstring copies the given string, not keeping the original pointer.
}
It works in simple tests, but from the documentations it's unclear whether this is safe at all.
According to Go document, the memory of reflect.StringHeader should be pinned for gostr, but the Stringheader.Data is already a uintptr, "an integer value with no pointer semantics" - which is itself odd because if it has no pointer semantics, wouldn't the field be completely useless as the memory may be moved right after the value is read? Or is the field treated specially like reflect.Value.Pointer? Or perhaps there is a different way of getting C pointer from string?
it's unclear whether this is safe at all.
Tapir Liui (https://twitter.com/TapirLiu/) dans Go101 (https://github.com/go101/go101) gives a clue as to the "safety" of reflect.StringHeader in this tweet:
Since Go 1.20, the reflect.StringHeader and reflect.SliceHeader types will be depreciated and not recommended to be used.
Accordingly, two functions, unsafe.StringData and unsafe.SliceData, will be introduced in Go 1.20 to take over the use cases of two old reflect types.
That was initially discussed in CL 401434, then in issue 53003.
The reason for deprecation is that reflect.SliceHeader and reflect.StringHeader are commonly misused.
As well, the types have always been documented as unstable and not to be relied upon.
We can see in Github code search that usage of these types is ubiquitous.
The most common use cases I've seen are:
converting []byte to string:
Equivalent to *(*string)(unsafe.Pointer(&mySlice)), which is never actually officially documented anywhere as something that can be relied upon.
Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule.
converting string to []byte:
commonly seen as *(*[]byte)(unsafe.Pointer(&string)), which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule.
grabbing the Data pointer field for ffi or some other niche use converting a slice of one type to a slice of another type
Ian Lance Taylor adds:
One of the main use cases of unsafe.Slice is to create a slice whose backing array is a memory buffer returned from C code or from a call such as syscall.MMap.
I agree that it can be used to (unsafely) convert from a slice of one type to a slice of a different type.

What is the best (safest + most performant) way of getting a specific slice of bytes from an unsafe.Pointer

I'm trying to convert this c++ to go.
This is in short what the c code is doing:
static const char *pSharedMem = NULL;
int sessionInfoOffset;
return pSharedMem + pHeader->sessionInfoOffset;
This is my (pseudo) go code:
var pSharedMem unsafe.Pointer
sessionInfoLen C.int
byteSlice := C.GoBytes(pSharedMem, pHeader.sessionInfoLen)
return byteSlice[pHeader.sessionInfoOffset:]
I've never really written any C code and I have no idea if this a good way of retrieving a byte slice from an unsafe.Pointer. Could go anything wrong with this (copying wrong piece of memory or something) and is this performant of am I doing something really stupid?
GoBytes is going to be the safest method. You still have to ensure that the array exists for the entirety of the copy operation, but once you have the bytes copied, you know it can't be changed from cgo. As long as the memory being copied isn't too large, or the copy presents a performance problem, this is the way to go.
If you want direct access to the bytes via a slice, you can slice it as an array like so:
size := pHeader.sessionInfoLen
byteSlice := (*[1<<30]byte)(pSharedMem)[:size:size]
This saves you the copy, but you then have to manage concurrent access from cgo, and ensure the memory isn't released while you're accessing it from Go.

Using empty struct properly with CGO

Working with gssapi.h
struct gss_name_struct;
typedef struct gss_name_struct * gss_name_t;
I am trying to figure out how to properly initialize a variable containing this by
var output_name C.gss_name_t = &C.struct_gss_name_struct{}
But the functions like gss_import_name act like if I was passing null pointer to them. What is the correct way to properly initialize and use these empty structs with CGO?
Go's strict typing makes typedefs a pain to work with. The best way to make your Go look clear is to write a small wrapper function in C to build the struct exactly how you want it. In this case though, go is using a zero-length byte array for an empty C struct, which you can verify below. You can declare it directly in go, and convert it when necessary.
Since C isn't strict with types, using type inference is often the easiest way to assign the type that Go expects. There's also a trick using the cgo tool to show the type declarations you need. Using go tool cgo -godefs filename.go will output the cgo definitions for your types. As you see though, the go equivalent types could get a little messy.
// statement in the original .go file
//var output_name C.gss_name_t = &C.struct_gss_name_struct{}
// output from cgo -godefs
// var output_name *[0]byte = &[0]byte{}
// or more succinctly
output_name := &[0]byte{}
// output_name can be converted directly to a C.gss_name_t
fmt.Printf("%+v\n", output_name)
fmt.Printf("%+v\n", C.gss_name_t(output_name))

Resources