Fastest way to allocate a large string in Go? - go

I need to create a string in Go that is 1048577 characters (1MB + 1 byte). The content of the string is totally unimportant. Is there a way to allocate this directly without concatenating or using buffers?
Also, it's worth noting that the value of string will not change. It's for a unit test to verify that strings that are too long will return an error.

Use strings.Builder to allocate a string without using extra buffers.
var b strings.Builder
b.Grow(1048577)
for i := 0; i < 1048577; i++ {
b.WriteByte(0)
}
s := b.String()
The call to the Grow method allocates a slice with capacity 1048577. The WriteByte calls fill the slice to capacity. The String() method uses unsafe to convert that slice to a string.
The cost of the loop can be reduced by writing chunks of N bytes at a time and filling single bytes at the end.
If you are not opposed to using the unsafe package, then use this:
p := make([]byte, 1048577)
s := *(*string)(unsafe.Pointer(&p))
If you are asking about how to do this with the simplest code, then use the following:
s := string(make([]byte, 1048577)
This approach does not meet the requirements set forth in the question. It uses an extra buffer instead of allocating the string directly.

I ended up using this:
string(make([]byte, 1048577))
https://play.golang.org/p/afPukPc1Esr

Related

How do I do a FAST conversion of an int array to an array of bytes?

I have a process that needs to pack a large array of int16s to a protobuf every few milliseconds. Understanding the protobuf side of it isn't critical, since all I really need is a way to convert a bunch of int16s (160-16k of them) to []byte. It's a CPU-critical operation, so I don't want to do something like this:
for _, sample := range listOfIntegers {
protobufObject.ByteStream = append(protobufObject.Bytestream, byte(sample>>8))
protobufObject.ByteStream = append(protobufObject.Bytestream, byte(sample&0xff))
}
(If you're interested, this is the protobuf)
message ProtobufObject {
bytes byte_stream = 1;
... = 2;
etc.
}
There has to be a faster way to supply that list of ints as a block of memory to the protobuf. I've fiddled with the cgo library to get access to memcpy, but suspect I've been destroying an underlying go data structure because I get crashes in totally unrelated sections of code.
A faster version of the above code is:
protobufObject.ByteStream := make([]byte, len(listOfIntegers) * 2)
for i, n := range listOfIntegers {
j := i * 2
protobufObject.ByteStream[j+1] = byte(n)
protobufObject.ByteStream[j] = byte(n>>8)
}
You can avoid copying the data when running on a big-endian architecture.
Use the unsafe package to copy the []int16 header to the []byte header. Use the unsafe package again to get a pointer to the []byte header and adjust the length and capacity for the conversion.
b = *(*[]byte)(unsafe.Pointer(&listOfIntegers))
hdr := (*reflect.SliceHeader)(unsafe.Pointer(&b))
hdr.Len *= 2
hdr.Cap *= 2
protobufObject.ByteStream = b

encode object to bytes by golang unsafe?

func Encode(i interface{}) ([]byte, error) {
buffer := bytes.NewBuffer(make([]byte, 0, 1024))
// size := unsafe.Sizeof(i)
size := reflect.TypeOf(i).Size()
fmt.Println(size)
ptr := unsafe.Pointer(&i)
startAddr := uintptr(ptr)
endAddr := startAddr + size
for i := startAddr; i < endAddr; i++ {
bytePtr := unsafe.Pointer(i)
b := *(*byte)(bytePtr)
buffer.WriteByte(b)
}
return buffer.Bytes(), nil
}
func TestEncode(t *testing.T) {
test := Test{10, "hello world"}
b, _ := Encode(test)
ptr := unsafe.Pointer(&b)
newTest := *(*Test)(ptr)
fmt.Println(newTest.X)
}
I am learning how to use golang unsafe and wrote this function for encoding any object. I meet with two problems, first, dose unsafe.Sizeof(obj) always return obj's pointer size? Why it different from reflect.TypeOf(obj).Size()? Second, I want to iterate the underlying bytes of obj and convert it back to obj in TestEncode function by unsafe.Pointer(), but the object's values all corrupt, why?
First, unsafe.Sizeof returns the bytes that needs to store the type. It is a little bit tricky, but it does not mean bytes that needs to store the data.
For example, a slice, as it is well known, stores 3 4-byte ints on a 32bit machine. One uintptr for memory address of the underlying array, and two int32 for len and cap. So no matter how long a slice is or what type it is of, a slice takes always 12 bytes on a 32 bit machine. Likely, a string uses 8 bytes: 1 uintptr for address and 1 int32 for len.
As for difference between reflect.TypeOf().Size, it is about interface. reflect.TypeOf looks into the interface and gets an concrete type, and reports bytes needed about the concrete type, while unsafe.Sizeof just returns 8 for an interface type: 2 uintptr for a pointer to the data and a pointer to the method lists.
Second part is quite clear now. For one, unsafe.Pointer is taking the address of the interface, instead of the concrete type. Two, in TestEncode, unsafe.Pointer is taking address to the 12-byte slice "header". There might be other errors, but with the two mentioned, they are meaningless to spot.
Note: I avoid talking about orders of the uintptr and int32 not only because I don't know, but also becuase they are not documented, unsafe, and implentation depended.
Note 2: Conclusion: Don't try to dump memory of a Go data.
Note 3: I change everything to 32 bit becuase playground is using it, so it is easier to check.

How do you initialize an empty bytes.Buffer of size N in Go?

What is the easiest method to create an empty buffer of size n in Go using bytes.NewBuffer()?
Adding some additional info here. The quick way to create a new buffer is briefly mentioned at the end of the doc string:
b := new(bytes.Buffer)
or
b := &bytes.Buffer{}
The Buffer struct define includes a 64 byte internal bootstrap field that is initially used for small allocations. Once the default size is exceeded, a byte slice Buffer.buf is created and internally maintained.
As #leafbebop suggested we can pre-initalize the buf field of the Buffer struct using a new slice.
b := bytes.NewBuffer(make([]byte,0,N))
I also found another option to use the Grow() method:
b := new(bytes.Buffer)
b.Grow(n)
Also it's interesting to point out that the internal buf slice will grow at a rate of cap(buf)*2 + n. This means that if you've written 1MB into a buffer and then add 1 byte, your cap() will increase to 2097153 bytes.

Technical things about conversion from []byte and string in Golang

Is it true that converting from string to []byte allocates new memory? Also, does converting from []byte to string allocates new memory?
s := "a very long string"
b := []byte(s) // does this doubled the memory requirement?
b := []byte{1,2,3,4,5, ...very long bytes..}
s := string(b) // does this doubled the memory requirement?
Yes in both cases.
String types are immutable. Therefore converting them to a mutable slice type will allocate a new slice. See also http://blog.golang.org/go-slices-usage-and-internals
The same with the inverse. Otherwise mutating the slice would change the string, which would contradict the spec.

Skipping ahead n codepoints while iterating through a unicode string in Go

In Go, iterating over a string using
for i := 0; i < len(myString); i++{
doSomething(myString[i])
}
only accesses individual bytes in the string, whereas iterating over a string via
for i, c := range myString{
doSomething(c)
}
iterates over individual Unicode codepoints (calledrunes in Go), which may span multiple bytes.
My question is: how does one go about jumping ahead while iterating over a string with range Mystring? continue can jump ahead by one unicode codepoint, but it's not possible to just do i += 3 for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?
I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.
I'd consider avoiding the conversion to []rune, and code this directly.
skip := 0
for _, c := range myString {
if skip > 0 {
skip--
continue
}
skip = doSomething(c)
}
It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune is a bit simpler so you may prefer that.
It turns out this can be done quite easily simply by casting the string into a slice of runes.
runes := []rune(myString)
for i := 0; i < len(runes); i++{
jumpHowFarAhead := doSomething(runes[i])
i += jumpHowFarAhead
}

Resources