Efficient appending to a variable-length container of strings (Golang) - go

The problem:
I need to apply multiple regexes to each line of a big log file (like several GB long) , gather non-empty matches and put them all in an array (for serialization and sending it over the network).
Slices are not much help if answer to this question holds:
If the slice does not have sufficient capacity, append will need to allocate new memory and copy the old one over. For slices with <1024 elements, it will double the capacity, for slices with >1024 elements it will increase it by factor 1.25.
Since there can be literally hundreds of thousands of regex matches, I can't really predict the length / capacity of a slice. I can't make it too big either "just in case" bc this will waste memory (or will it? if memory allocator is smart enough not to allocate too much memory that is not written into, maybe I could use a huge slice capacity without much harm?).
So I'm thinking about following alternative:
have a doubly-linked list of matches (http://golang.org/pkg/container/list/)
calc its length (will len() work?)
preallocate a slice of this capacity
copy string pointers to this slice
Is there a less laborious way of achieving this goal in Go (append with ~ O(1) append complexity)?
(golang newbie here of course)

append()'s average (amortized) cost is already O(1) because it grows the array by a percentage each time. As the array gets larger, growing it gets costlier but proportionately rarer. A 10M-item slice will be 10x more expensive to grow than a 1M-item slice, but since the extra capacity we're allocating is proportional to the size, it'll also be 10x as many append(slice, item) calls until the next time it grows. The increasing cost and decreasing frequency of reallocations cancel out, leaving the average cost constant, i.e., O(1).
The same idea applies other languages' dynamically-sized arrays, too: Microsoft's std::vector implementation apparently grows the array by 50% each time, for example. Amortized O(1) doesn't mean you pay nothing for allocations, only that you continue paying at the same average rate as your array gets bigger.
On my laptop, I could run a million slice = append(slice, someStaticString)s in 77ms. One reason it's quick, which siritinga noted below, is that "copying" the string to enlarge the array is really just copying the string header (a pointer/length pair), not copying the contents. 100,000 string headers is still under 2MB to copy, not a big deal compared to the other quantities of data you're working with.
container/list turned out ~3x slower for me in a microbenchmark; linked-list append is also constant time, of course, but I imagine append has a lower constant because it can usually just write to a couple words of memory and not allocate a list item, etc. The timing code won't work in the Playground but you can copy this locally and run it to see yourself: http://play.golang.org/p/uYyMScmOjX
Sometimes, you can usefully pre-allocate space to avoid reallocations/copies (in this example, using make([]string, 0, 1000000) takes the runtime from ~77ms to ~10ms), but, of course, often-to-usually just you don't have enough info about the expected data size and so on to eke out worthwhile gains, and you're better off leaving it to the built-in algorithm.
But you're asking a more specific question here about a grep-like application (and thanks for asking a detailed question with context). For that, bottom-line recommendation is that if you're searching gigs of logs, it's probably best to avoid buffering the whole output in RAM at all.
You could write something to stream the results as a single function: logparser.Grep(in io.Reader, out io.Writer, patterns []regexp.Regexp); you could alternatively make out a chan []byte or func(match []byte) (err error) if you don't want the code that sends results to be too enmeshed with the grep code.
(On []byte vs. string: a []byte seems to do the job here and avoids []byte<=>string conversions when you do I/O, so I'd prefer that. I don't know what all you're doing, though, and if you need string it's fine.)
If you do keep the whole match list in RAM, be aware that keeping around a reference to part of a big string or byte slice keeps the whole source string/slice from being garbage collected. So if you go that route, then counterintuitively you may actually want to copy matches to avoid keeping all of the source log data in RAM.

I tried to distill your question into a very simple example.
Since there can be "hundreds of thousands of regex matches", I did a large initial allocation of 1 M (1024 * 1024) entries for the matches slice capacity. A slice is a reference type. A slice header 'struct' has length, a capacity, and a pointer for a total of 24 (3 * 8) bytes on a 64-bit OS. The initial allocation for a slice of 1 M entries is therefore only 24 (24 * 1) MB. If there are more than 1 M entries, a new slice with capacity of 1.25 (1 + 1 / 4) M entries will be allocated and the existing 1 M slice header entries (24 MB) will be copied to it.
In summary, you can avoid much of the the overhead of many appends by initially over allocating the slice capacity. The bigger memory problem is all the data that is saved and referenced for each match. The far bigger CPU time problem is the time taken to perform the regexp.FindAll's.
package main
import (
"bufio"
"fmt"
"os"
"regexp"
)
var searches = []*regexp.Regexp{
regexp.MustCompile("configure"),
regexp.MustCompile("unknown"),
regexp.MustCompile("PATH"),
}
var matches = make([][]byte, 0, 1024*1024)
func main() {
logName := "config.log"
log, err := os.Open(logName)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
defer log.Close()
scanner := bufio.NewScanner(log)
for scanner.Scan() {
line := scanner.Bytes()
for _, s := range searches {
for _, m := range s.FindAll(line, -1) {
matches = append(matches, append([]byte(nil), m...))
}
}
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
}
// Output matches
fmt.Println(len(matches))
for i, m := range matches {
fmt.Println(string(m))
if i >= 16 {
break
}
}
}

Related

What is the point in setting a slice's capacity?

In Golang, we can use the builtin make() function to create a slice with a given initial length and capacity.
Consider the following lines, the slice's length is set to 1, and its capacity 3:
func main() {
var slice = make([]int, 1, 3)
slice[0] = 1
slice = append(slice, 6, 0, 2, 4, 3, 1)
fmt.Println(slice)
}
I was surprised to see that this program prints:
[1 6 0 2 4 3 1]
This got me wondering- what is the point of initially defining a slice's capacity if append() can simply blow past it? Are there performance gains for setting a sufficiently large capacity?
A slice is really just a fancy way to manage an underlying array. It automatically tracks size, and re-allocates new space as needed.
As you append to a slice, the runtime doubles its capacity every time it exceeds its current capacity. It has to copy all of the elements to do that. If you know how big it will be before you start, you can avoid a few copy operations and memory allocations by grabbing it all up front.
When you make a slice providing capacity, you set the initial capacity, not any kind of limit.
See this blog post on slices for some interesting internal details of slices.
A slice is a wonderful abstraction of a simple array. You get all sorts of nice features, but deep down at its core, lies an array. (I explain the following in reverse order for a reason). Therefore, if/when you specify a capacity of 3, deep down, an array of length 3 is allocated in memory, which you can append up to without having it need to reallocate memory. This attribute is optional in the make command, but note that a slice will always have a capacity whether or not you choose to specify one. If you specify a length (which always exists as well), the slice be indexable up to that length. The rest of the capacity is hidden away behind the scenes so it does not have to allocate an entirely new array when append is used.
Here is an example to better explain the mechanics.
s := make([]int, 1, 3)
The underlying array will be allocated with 3 of the zero value of int (which is 0):
[0,0,0]
However, the length is set to 1, so the slice itself will only print [0], and if you try to index the second or third value, it will panic, as the slice's mechanics do not allow it. If you s = append(s, 1) to it, you will find that it has actually been created to contain zero values up to the length, and you will end up with [0,1]. At this point, you can append once more before the entire underlying array is filled, and another append will force it to allocate a new one and copy all the values over with a doubled capacity. This is actually a rather expensive operation.
Therefore the short answer to your question is that preallocating the capacity can be used to vastly improve the efficiency of your code. Especially so if the slice is either going to end up very large, or contains complex structs (or both), as the zero value of a struct is effectively the zero values of every single one of its fields. This is not because it would avoid allocating those values, as it has to anyway, but because append would have to reallocate new arrays full of these zero values each time it would need to resize the underlying array.
Short playground example: https://play.golang.org/p/LGAYVlw-jr
As others have already said, using the cap parameter can avoid unnecessary allocations. To give a sense of the performance difference, imagine you have a []float64 of random values and want a new slice that filters out values that are not above, say, 0.5.
Naive approach - no len or cap param
func filter(input []float64) []float64 {
ret := make([]float64, 0)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Better approach - using cap param
func filterCap(input []float64) []float64 {
ret := make([]float64, 0, len(input))
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Benchmarks (n=10)
filter 131 ns/op 56 B/op 3 allocs/op
filterCap 56 ns/op 80 B/op 1 allocs/op
Using cap made the program 2x+ faster and reduced the number of allocations from 3 to 1. Now what happens at scale?
Benchmarks (n=1,000,000)
filter 9630341 ns/op 23004421 B/op 37 allocs/op
filterCap 6906778 ns/op 8003584 B/op 1 allocs/op
The speed difference is still significant (~1.4x) thanks to 36 fewer calls to runtime.makeslice. However, the bigger difference is the memory allocation (~4x less).
Even better - calibrating the cap
You may have noticed in the first benchmark that cap makes the overall memory allocation worse (80B vs 56B). This is because you allocate 10 slots but only need, on average, 5 of them. This is why you don't want to set cap unnecessarily high. Given what you know about your program, you may be able to calibrate the capacity. In this case, we can estimate that our filtered slice will need 50% as many slots as the original slice.
func filterCalibratedCap(input []float64) []float64 {
ret := make([]float64, 0, len(input)/2)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Unsurprisingly, this calibrated cap allocates 50% as much memory as its predecessor, so that's ~8x improvement on the naive implementation at 1m elements.
Another option - using direct access instead of append
If you are looking to shave even more time off a program like this, initialize with the len parameter (and ignore the cap parameter), access the new slice directly instead of using append, then throw away all the slots you don't need.
func filterLen(input []float64) []float64 {
ret := make([]float64, len(input))
var counter int
for _, el := range input {
if el > .5 {
ret[counter] = el
counter++
}
}
return ret[:counter]
}
This is ~10% faster than filterCap at scale. However, in addition to being more complicated, this pattern does not provide the same safety as cap if you try and calibrate the memory requirement.
With cap calibration, if you underestimate the total capacity required, then the program will automatically allocate more when it needs it.
With this approach, if you underestimate the total len required, the program will fail. In this example, if you initialize as ret := make([]float64, len(input)/2), and it turns out that len(output) > len(input)/2, then at some point the program will try to access a non-existent slot and panic.
Each time you add an item to a slice that has len(mySlice) == cap(mySlice), the underlying data structure is replaced with a larger structure.
fmt.Printf("Original Capacity: %v", cap(mySlice)) // Output: 8
mySlice = append(mySlice, myNewItem)
fmt.Printf("New Capacity: %v", cap(mySlice)) // Output: 16
Here, mySlice is replaced (through the assignment operator) with a new slice containing all the elements of the original mySlice, plus myNewItem, plus some room (capacity) to grow without triggering this resize.
As you can imagine, this resizing operation is computationally non-trivial.
Quite often, all the resize operations can be avoided if you know how many items you will need to store in mySlice. If you have this foreknowledge, you can set the capacity of the original slice upfront and avoid all the resize operations.
(In practice, it's quite often possible to know how many items will be added to a collection; especially when transforming data from one format to another.)

Best practices constructing an empty array

I'm wondering about best practices when initializing empty arrays.
i.e. Is there any difference here between arr1, arr2, and arr3?
myArr1 := []int{}
myArr2 := make([]int,0)
var myArr3 []int
I know that they make empty []int but I wonder, is one syntax preferable to the others? Personally I find the first to be most readable but that's beside the point here. One key point of contention may be the array capacity, presumably the default capacity is the same between the three as it is unspecified. Is declaring arrays of unspecified capacity "bad"? I can assume it comes with some performance cost but how "bad" is it really?
/tldr:
Is there any difference between the 3 ways to make an empty
array?
What is the default capacity of an array when unspecified?
What is the performance cost of using arrays with unspecified capacity?
First, it's a slice not an array. Arrays and slices in Go are very different, arrays have a fixed size that is part of the type. I had trouble with this at first too :)
Not really. Any if the three is correct, and any difference should be too small to worry about. In my own code I generally use whatever is easiest in a particular case.
0
Nothing, until you need to add an item, then whatever it costs to allocate the storage needed.
What is the performance cost of using arrays with unspecified capacity?
There is certainly a cost when you start populating the slice. If you know how big the slice should grow, you can allocate capacity of the underlying array from the very begging as opposed to reallocating every time the underlying array fills up.
Here is a simple example with timing:
package main
import "fmt"
func main() {
limit := 500 * 1000 * 1000
mySlice := make([]int, 0, limit) //vs mySlice := make([]int, 0)
for i := 0; i < limit; i++ {
mySlice = append(mySlice, i)
}
fmt.Println(len(mySlice))
}
On my machine:
time go run my_file.go
With preallocation:
real 0m2.129s
user 0m2.073s
sys 0m1.357s
Without preallocation
real 0m7.673s
user 0m9.095s
sys 0m3.462s
Is there any difference between the 3 ways to make an empty array?
if empty array means len(array)==0, the answer is no, but actually only myArr3==nil is true.
What is the default capacity of an array when unspecified?
the default capacity will be same with the len you specify.
What is the performance cost of using arrays with unspecified capacity?
none

In go, is it worth avoiding creating slices of size 0?

I have a program in which I'm going to make lots and lots of slices, some of which might be empty:
nb := something() // something might return 0
slices = append(slices, make([]int, nb))
Does make([]int, 0) allocates some memory and is, thus, less memory efficient than a nil slice although they share the same behavior ? By how much ?
If so, is it worth doing a test to avoid useless allocations, or is the CPU time cost of the test not worth the saving in memory (or any other reason not to do so) ?
var sl slice
nb := something()
if nb > 0 {
sl = make([]int, nb)
}
slices = append(slices, sl)
There is no difference in the allocated memory between
var a []T // nil slice
and
a := make([]T, 0) // zero-length, zero-capacity, non-nil slice
The difference is in the slice header content. In the second case the slice pointer contains some fixed address, same for all 0-sized allocations.
If this code is in a performance critical part of the program, the difference makes ... quite a difference. In the first case you do zero the slice header, in the second case you go through 3-4 function calls, some range checks for cap and length, etc. before malloc returns a pointer to the zero base.
Does make([]int, 0) allocates some memory
Yes, it allocates a slice header but no backing array. If the slice header doesn't escape the current scope it may be allocated on the stack.
less memory efficient than a nil slice
In terms of memory used, they're the same.
is it worth doing a test to avoid useless allocations
In general yes, the 3 or 4 instructions it takes to compare an int are nothing compared to the cycles you'd need to do a memory allocation and initialization.

Why is iterating over a map so much slower than iterating over a slice in Golang?

I was implementing a sparse matrix using a map in Golang and I noticed that my code started taking much longer to complete after this change, after dismissing other possible causes, seems that the culprit is the iteration on the map itself. Go Playground link (doesn't work for some reason).
package main
import (
"fmt"
"time"
"math"
)
func main() {
z := 50000000
a := make(map[int]int, z)
b := make([]int, z)
for i := 0; i < z; i++ {
a[i] = i
b[i] = i
}
t0 := time.Now()
for key, value := range a {
if key != value { // never happens
fmt.Println("a", key, value)
}
}
d0 := time.Now().Sub(t0)
t1 := time.Now()
for key, value := range b {
if key != value { // never happens
fmt.Println("b", key, value)
}
}
d1 := time.Now().Sub(t1)
fmt.Println(
"a:", d0,
"b:", d1,
"diff:", math.Max(float64(d0), float64(d1)) / math.Min(float64(d0), float64(d1)),
)
}
Iterating over 50M items returns the following timings:
alix#local:~/Go/src$ go version
go version go1.3.3 linux/amd64
alix#local:~/Go/src$ go run b.go
a: 1.195424429s b: 68.588488ms diff: 17.777154632611037
I wonder, why is iterating over a map almost 20x as slow when compared to a slice?
This comes down to the representation in memory. How familiar are you with the representation of different data structures and the concept of algorithmic complexity? Iterating over an array or slice is simple. Values are contiguous in memory. However iterating over a map requires traversing the key space and doing lookups into the hash-table structure.
The dynamic ability of maps to insert keys of any value without using up tons of space allocating a sparse array, and the fact that look-ups can be done efficiently over the key space despite being not as fast as an array, are why hash tables are sometimes preferred over an array, although arrays (and slices) have a faster "constant" (O(1)) lookup time given an index.
It all comes down to whether you need the features of this or that data structure and whether you're willing to deal with the side-effects or gotchas involved.
Seems reasonable to put my comment as an answer. The underlying structures who's iteration performance you're comparing are a hash table and an array (https://en.wikipedia.org/wiki/Hash_table vs https://en.wikipedia.org/wiki/Array_data_structure). The range abstraction is actually (speculation, can't find the code) iterating all the keys, accessing each value, and assigning the two to k,v :=. If you're not familiar with accessing in the array it is constant time because you just add sizeof(type)*i to the starting pointer to get the item. I don't know what the internals of map are in golang but I know enough to know that it's memory representation and therefor access is nothing close that efficient.
The specs statement on the topic isn't much; http://golang.org/ref/spec#For_statements
If I find the time to look up the implementation of range for map and slice/array I will and put some more technical details.

Truncate buffer from front

The bytes.Buffer object has a Truncate(n int) method to discard all but the first n bytes.
I'd need the exact inverse of that - keeping the last n bytes.
I could do the following
b := buf.Bytes()
buf.Reset()
buf.Write(b[offset:])
but I'm not sure if this will re-use the slice efficiently.
Are there better options?
There are two alternatives:
The solution you give, which allows the first 'offset' bytes to be reused.
Create a bytes.NewBuffer(b[offset:]) and use that. This will not allow the first 'offset' bytes to be collected until you're done with the new buffer, but it avoids the cost of copying.
Let bytes.Buffer handle the buffer management. The internal grow method slides the data down. Use the Next method. For example,
package main
import (
"bytes"
"fmt"
)
func main() {
var buf bytes.Buffer
for i := 0; i < 8; i++ {
buf.WriteByte(byte(i))
}
fmt.Println(buf.Len(), buf.Bytes())
n := buf.Len() / 2
// Keep last n bytes.
if n > buf.Len() {
n = buf.Len()
}
buf.Next(buf.Len() - n)
fmt.Println(buf.Len(), buf.Bytes())
}
Output:
8 [0 1 2 3 4 5 6 7]
4 [4 5 6 7]
I reckon the problem with your idea is that "truncating the buffer from its start" is impossible simply because the memory allocator allocates memory in full chunks and there's no machinery in it to split an already allocated chunk into a set of "sub chunks" — essentially what you're asking for. So to support "trimming from the beginning" the implementation of bytes.Buffer would have to allocate a smaller buffer, move the "tail" there and then mark the original buffer for reuse.
This naturally leads us to another idea: use two (or more) buffers. They might either be allocated separately and treated as adjacent by your algorythms or you might use custom allocation: allocate one big slice and then reslice it twice or more times to produce several physically adjacent buffers, or slide one or more "window" slices over it. This means implementing a custom data structure of course…

Resources