How do I do a FAST conversion of an int array to an array of bytes? - go

I have a process that needs to pack a large array of int16s to a protobuf every few milliseconds. Understanding the protobuf side of it isn't critical, since all I really need is a way to convert a bunch of int16s (160-16k of them) to []byte. It's a CPU-critical operation, so I don't want to do something like this:
for _, sample := range listOfIntegers {
protobufObject.ByteStream = append(protobufObject.Bytestream, byte(sample>>8))
protobufObject.ByteStream = append(protobufObject.Bytestream, byte(sample&0xff))
(If you're interested, this is the protobuf)
message ProtobufObject {
bytes byte_stream = 1;
... = 2;
There has to be a faster way to supply that list of ints as a block of memory to the protobuf. I've fiddled with the cgo library to get access to memcpy, but suspect I've been destroying an underlying go data structure because I get crashes in totally unrelated sections of code.

A faster version of the above code is:
protobufObject.ByteStream := make([]byte, len(listOfIntegers) * 2)
for i, n := range listOfIntegers {
j := i * 2
protobufObject.ByteStream[j+1] = byte(n)
protobufObject.ByteStream[j] = byte(n>>8)
You can avoid copying the data when running on a big-endian architecture.
Use the unsafe package to copy the []int16 header to the []byte header. Use the unsafe package again to get a pointer to the []byte header and adjust the length and capacity for the conversion.
b = *(*[]byte)(unsafe.Pointer(&listOfIntegers))
hdr := (*reflect.SliceHeader)(unsafe.Pointer(&b))
hdr.Len *= 2
hdr.Cap *= 2
protobufObject.ByteStream = b


Go - Failing escape analysis on different slice headers with shared data

I'm working on a project where I frequently convert []int32 to []byte. I created a function intsToBytes to perform an inplace conversion to minimize copying. I noticed that Go's escape analysis doesn't realize that ints and bytes reference the same underlying data. As a result, ints is overwritten by the next function's stack data and bytes lives on and references the overwritten data.
The only solution I can think of involves copying the data into a new byte slice. Is there away to avoid copying the data?
func pack() []byte {
ints := []int32{1,2,3,4,5} // This does not escape so it is allocated on the stack
bytes := intsToBytes(ints) // 'ints' and 'bytes' are different slice headers
return bytes
// After the return, the []int32{...} is deallocated and can be overwritten
// by the next function's stack data
func intsToBytes(i []int32) []byte {
const SizeOfInt32 = 4
// Get the slice header
header := *(*reflect.SliceHeader)(unsafe.Pointer(&i))
header.Len *= SizeOfInt32
header.Cap *= SizeOfInt32
// Convert slice header to an []byte
data := *(*[]byte)(unsafe.Pointer(&header))
/* Potentital Solution
outData := make([]byte, len(data))
copy(outData, data)
return outData
return data

How to write a vector

I am using the Go flatbuffers interface for the first time. I find the instructions sparse.
I would like to write a vector of uint64s into a table. Ideally, I would like to store numbers directly in a vector without knowing how many there are up front (I'm reading them from sql.Rows iterator). I see the generated code for the table has functions:
func DatasetGridAddDates(builder *flatbuffers.Builder, dates flatbuffers.UOffsetT) {
builder.PrependUOffsetTSlot(2, flatbuffers.UOffsetT(dates), 0)
func DatasetGridStartDatesVector(builder *flatbuffers.Builder, numElems int) flatbuffers.UOffsetT {
return builder.StartVector(8, numElems, 8)
Can I first write the vector using (??), then use DatasetGridAddDates to record the resulting vector in the containing "DatasetGrid" table?
(caveat: I have not heard of FlatBuffers prior to reading your question)
If you do know the length in advance, storing a vector is done as explained in the tutorial:
name := builder.CreateString("hello")
q55310927.DatasetGridStartDatesVector(builder, len(myDates))
for i := len(myDates) - 1; i >= 0; i-- {
dates := builder.EndVector(len(myDates))
q55310927.DatasetGridAddName(builder, name)
q55310927.DatasetGridAddDates(builder, dates)
grid := q55310927.DatasetGridEnd(builder)
Now what if you don’t have len(myDates)? On a toy example I get exactly the same output if I replace StartDatesVector(builder, len(myDates)) with StartDatesVector(builder, 0). Looking at the source code, it seems like the numElems may be necessary for alignment and for growing the buffer. I imagine alignment might be moot when you’re dealing with uint64, and growing seems to happen automatically on PrependUint64, too.
So, try doing it without numElems:
q55310927.DatasetGridStartDatesVector(builder, 0)
var n int
for rows.Next() { // use ORDER BY to make them go in reverse order
var date uint64
if err := rows.Scan(&date); err != nil {
// ...
dates := builder.EndVector(n)
and see if it works on your data.

How to get size of struct containing data structures in Go?

I'm currently trying to get the size of a complex struct in Go.
I've read solutions that use reflect and unsafe, but neither of these help with structs that contain arrays or maps (or any other field that's a pointer to an underlying data structure).
type testStruct struct {
A int
B string
C struct{}
items map[string]string
How would I find out the correct byte size of the above if items contains a few values in it?
You can get very close to the amount of memory required by the structure and its content by using the package reflect. You need to iterate over the fields and obtain the size of each field. For example:
func getSize(v interface{}) int {
size := int(reflect.TypeOf(v).Size())
switch reflect.TypeOf(v).Kind() {
case reflect.Slice:
s := reflect.ValueOf(v)
for i := 0; i < s.Len(); i++ {
size += getSize(s.Index(i).Interface())
case reflect.Map:
s := reflect.ValueOf(v)
keys := s.MapKeys()
size += int(float64(len(keys)) * 10.79) // approximation from
for i := range(keys) {
size += getSize(keys[i].Interface()) + getSize(s.MapIndex(keys[i]).Interface())
case reflect.String:
size += reflect.ValueOf(v).Len()
case reflect.Struct:
s := reflect.ValueOf(v)
for i := 0; i < s.NumField(); i++ {
if s.Field(i).CanInterface() {
size += getSize(s.Field(i).Interface())
return size
This obtains the size of v using reflect and then, for the supported types in this example (slices, maps, strings, and structs), it computes the memory required by the content stored in them. You would need to add here other types that you need to support.
There are a few details to work out:
Private fields are not counted.
For structs we are double-counting the basic types.
For number two, you can filter them out before doing the recursive call when handling structs, you can check the kinds in the documentation for the reflect package.

Fastest way to allocate a large string in Go?

I need to create a string in Go that is 1048577 characters (1MB + 1 byte). The content of the string is totally unimportant. Is there a way to allocate this directly without concatenating or using buffers?
Also, it's worth noting that the value of string will not change. It's for a unit test to verify that strings that are too long will return an error.
Use strings.Builder to allocate a string without using extra buffers.
var b strings.Builder
for i := 0; i < 1048577; i++ {
s := b.String()
The call to the Grow method allocates a slice with capacity 1048577. The WriteByte calls fill the slice to capacity. The String() method uses unsafe to convert that slice to a string.
The cost of the loop can be reduced by writing chunks of N bytes at a time and filling single bytes at the end.
If you are not opposed to using the unsafe package, then use this:
p := make([]byte, 1048577)
s := *(*string)(unsafe.Pointer(&p))
If you are asking about how to do this with the simplest code, then use the following:
s := string(make([]byte, 1048577)
This approach does not meet the requirements set forth in the question. It uses an extra buffer instead of allocating the string directly.
I ended up using this:
string(make([]byte, 1048577))

How to efficiently hash (SHA 256) in golang data where only the last few bytes changes

Assuming you had 80 bytes of data and only the last 4 bytes was constantly changing, how would you efficiently hash the total 80 bytes using Go. In essence, the first 76 bytes are the same, while the last 4 bytes keeps changing. Ideally, you want to keep a copy of the hash digest for the first 76 bytes and just keep changing the last 4.
You can try the following examples on the Go Playground. Benchmark results is at the end.
Note: the implementations below are not safe for concurrent use; I intentionally made them like this to be simpler and faster.
Fastest when using only public API (always hashes all input)
The general concept and interface of Go's hash algorithms is the hash.Hash interface. This does not allow you to save the state of the hasher and to return or rewind to the saved state. So using the public hash APIs of the Go standard lib, you always have to calculate the hash from start.
What the public API offers is to reuse an already constructed hasher to calculate the hash of a new input, using the Hash.Reset() method. This is nice so that no (memory) allocations will be needed to calculate multiple hash values. Also you may take advantage of the optional slice that may be passed to Hash.Sum() which is used to append the current hash to. This is nice so that no allocations will be needed to receive the hash results either.
Here's an example that takes advantage of these:
type Cached1 struct {
hasher hash.Hash
result [sha256.Size]byte
func NewCached1() *Cached1 {
return &Cached1{hasher: sha256.New()}
func (c *Cached1) Sum(data []byte) []byte {
return c.hasher.Sum(c.result[:0])
Test data
We'll use the following test data:
var fixed = bytes.Repeat([]byte{1}, 76)
var variantA = []byte{1, 1, 1, 1}
var variantB = []byte{2, 2, 2, 2}
var data = append(append([]byte{}, fixed...), variantA...)
var data2 = append(append([]byte{}, fixed...), variantB...)
var c1 = NewCached1()
First let's get authentic results (to verify if our hasher works correctly):
fmt.Printf("%x\n", sha256.Sum256(data))
fmt.Printf("%x\n", sha256.Sum256(data2))
Now let's check our Cached1 hasher:
fmt.Printf("%x\n", c1.Sum(data))
fmt.Printf("%x\n", c1.Sum(data2))
Output is the same:
Even faster but may break (in future Go releases): hashes only the last 4 bytes
Now let's see a less flexible solution which truly calculates the hash of the first 76 fixed part only once.
The hasher of the crypto/sha256 package is the unexported sha256.digest type (more precisely a pointer to this type):
// digest represents the partial evaluation of a checksum.
type digest struct {
h [8]uint32
x [chunk]byte
nx int
len uint64
is224 bool // mark if this digest is SHA-224
A value of the digest struct type basically holds the current state of the hasher.
What we may do is feed the hasher the fixed, first 76 bytes, and then save this struct value. When we need to caclulate the hash of some 80 bytes data where the first 76 is the same, we use this saved value as a starting point, and then feed the varying last 4 bytes.
Note that it's enough to simply save this struct value as it contains no pointers and no descriptor types like slices and maps. Else we would also have to make a copy of those, but we're "lucky". So this solution would need adjustment if a future implementation of crypto/sha256 would add a pointer or slice field for example.
Since sha256.digest is unexported, we can only use reflection (reflect package) to achieve our goals, which inherently will add some delays to computation.
Example implementation that does this:
type Cached2 struct {
origv reflect.Value
hasherv reflect.Value
hasher hash.Hash
result [sha256.Size]byte
func NewCached2(fixed []byte) *Cached2 {
h := sha256.New()
c := &Cached2{origv: reflect.ValueOf(h).Elem()}
hasherv := reflect.New(c.origv.Type())
c.hasher = hasherv.Interface().(hash.Hash)
c.hasherv = hasherv.Elem()
return c
func (c *Cached2) Sum(data []byte) []byte {
// Set state of the fixed hash:
return c.hasher.Sum(c.result[:0])
Testing it:
var c2 = NewCached2(fixed)
fmt.Printf("%x\n", c2.Sum(variantA))
fmt.Printf("%x\n", c2.Sum(variantB))
Output is again the same:
So it works.
The "ultimate", fastest solution
Cached2 could be faster if reflection would not be involved. If we want an even faster solution, simply we can make a copy of the sha256.digest type and its methods into our package, so we can directly use it without having to resort to reflection.
If we do this, we will have access to the digest struct value, and we can simply make a copy of it like:
var d digest
// init d
saved := d
And restoring it is like:
d = saved
I simply "cloned" the crypto/sha256 package to my workspace, and changed / exported the digest type as Digest just for demonstration purposes. Then using this mysha256.Digest type I implemented Cached3 like this:
type Cached3 struct {
orig mysha256.Digest
result [sha256.Size]byte
func NewCached3(fixed []byte) *Cached3 {
var d mysha256.Digest
return &Cached3{orig: d}
func (c *Cached3) Sum(data []byte) []byte {
// Make a copy of the fixed hash:
d := c.orig
return d.Sum(c.result[:0])
Testing it:
var c3 = NewCached3(fixed)
fmt.Printf("%x\n", c3.Sum(variantA))
fmt.Printf("%x\n", c3.Sum(variantB))
Output again is the same. So this works too.
We can benchmark performance with this code:
func BenchmarkCached1(b *testing.B) {
for i := 0; i < b.N; i++ {
func BenchmarkCached2(b *testing.B) {
for i := 0; i < b.N; i++ {
func BenchmarkCached3(b *testing.B) {
for i := 0; i < b.N; i++ {
Benchmark results (go test -bench . -benchmem):
BenchmarkCached1-4 1000000 1569 ns/op 0 B/op 0 allocs/op
BenchmarkCached2-4 2000000 926 ns/op 0 B/op 0 allocs/op
BenchmarkCached3-4 2000000 872 ns/op 0 B/op 0 allocs/op
Cached2 is approximately 41% faster than Cached1 which is quite noticable and nice. Cached3 only gives a "little" performance boost compared to Cached2, another 6%. Cached3 is 44% faster than Cached1.
Also note that none of the solutions use any allocations which is also nice.
For that extra 40% or 44%, I would probably not go for the Cached2 or Cached3 solutions. Of course it really depends on how important the performance is to you. If it is important, I think the Cached2 solution presents a fine compromise between minimum added complexity and the noticeable performance gain. It does pose a threat as future Go implementations may break it; if it is a problem, Cached3 solves this by copying the current implementation (and also improves its performance a little).
