performance of for range in go - for-loop

When ranging over an array, two values are returned for each iteration. The first is the index, and the second is a copy of the element at that index.
Here's my code:
var myArray = [5]int {1,2,3,4,5}
sum := 0
// first with copy
for _, value := range myArray {
sum += value
}
// second without copy
for i := range myArray {
sum += myArray[i]
}
Which one should i use for better performance?
Is there any difference for built-in types in these two pieces of code?

We can test this using Go's benchmarking tool (read more at https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go).
sum_test.go
package sum
import "testing"
func BenchmarkSumIterator(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for j := range ints {
sum += ints[j]
}
}
}
func BenchmarkSumRange(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for _, value := range ints {
sum += value
}
}
}
Run it with:
$ go test -bench=. sum_test.go
goos: linux
goarch: amd64
BenchmarkSumIterator-4 412796047 2.97 ns/op
BenchmarkSumRange-4 413581974 2.89 ns/op
PASS
ok command-line-arguments 3.010s
Range appears be to slightly more efficient. Running this benchmark a few more times also confirms this. It's worth noting that this may only be true for this specific case where you have a small fixed size array. You should try to make decisions like these based on what you'd encounter in production and also try to reconcile that with code readability.

the second one is faster but the difference is too low which you can ignore
the main difference is when you have a big size loop. in that case first loop takes more memory than the second one

Related

Why is accessing a variable so much slower than accessing len()?

I wrote this function uniq that takes in a sorted slice of ints
and returns the slice with duplicates removed:
func uniq(x []int) []int {
i := 0
for i < len(x)-1 {
if x[i] == x[i+1] {
copy(x[i:], x[i+1:])
x = x[:len(x)-1]
} else {
i++
}
}
return x
}
and uniq2, a rewrite of uniq with the same results:
func uniq2(x []int) []int {
i := 0
l := len(x)
for i < l-1 {
if x[i] == x[i+1] {
copy(x[i:], x[i+1:])
l--
} else {
i++
}
}
return x[:l]
}
The only difference between the two functions
is that in uniq2, instead of slicing x
and directly accessing len(x) each time,
I save len(x) to a variable l
and decrement it whenever I shift the slice.
I thought that uniq2 would be slightly faster than uniq
because len(x) would no longer be called iteration,
but in reality, it is inexplicably much slower.
With this test that generates a random sorted slice
and calls uniq/uniq2 on it 1000 times,
which I run on Linux:
func main() {
rand.Seed(time.Now().Unix())
for i := 0; i < 1000; i++ {
_ = uniq(genSlice())
//_ = uniq2(genSlice())
}
}
func genSlice() []int {
x := make([]int, 0, 1000)
for num := 1; num <= 10; num++ {
amount := rand.Intn(1000)
for i := 0; i < amount; i++ {
x = append(x, num)
}
}
return x
}
$ go build uniq.go
$ time ./uniq
uniq usually takes 5--6 seconds to finish.
while uniq2 is more than two times slower,
taking between 12--15 seconds.
Why is uniq2, where I save the slice length to a variable,
so much slower than uniq, where I directly call len?
Shouldn't it slightly faster?
You expect roughly the same execution time because you think they do roughly the same thing.
The only difference between the two functions is that in uniq2, instead of slicing x and directly accessing len(x) each time, I save len(x) to a variable l and decrement it whenever I shift the slice.
This is wrong.
The first version does:
copy(x[i:], x[i+1:])
x = x[:len(x)-1]
And second does:
copy(x[i:], x[i+1:])
l--
The first difference is that the first assigns (copies) a slice header which is a reflect.SliceHeader value, being 3 integer (24 bytes on 64-bit architecture), while l-- does a simple decrement, it's much faster.
But the main difference does not stem from this. The main difference is that since the first version changes the x slice (the header, the length included), you end up copying less and less elements, while the second version does not change x and always copies to the end of the slice. x[i+1:] is equivalent to x[x+1:len(x)].
To demonstrate, imagine you pass a slice with length=10 and having all equal elements. The first version will copy 9 elements first, then 8, then 7 etc. The second version will copy 9 elements first, then 9 again, then 9 again etc.
Let's modify your functions to count the number of copied elements:
func uniq(x []int) []int {
count := 0
i := 0
for i < len(x)-1 {
if x[i] == x[i+1] {
count += copy(x[i:], x[i+1:])
x = x[:len(x)-1]
} else {
i++
}
}
fmt.Println("uniq copied", count, "elements")
return x
}
func uniq2(x []int) []int {
count := 0
i := 0
l := len(x)
for i < l-1 {
if x[i] == x[i+1] {
count += copy(x[i:], x[i+1:])
l--
} else {
i++
}
}
fmt.Println("uniq2 copied", count, "elements")
return x[:l]
}
Testing it:
uniq(make([]int, 1000))
uniq2(make([]int, 1000))
Output is:
uniq copied 499500 elements
uniq2 copied 998001 elements
uniq2() copies twice as many elements!
If we test it with a random slice:
uniq(genSlice())
uniq2(genSlice())
Output is:
uniq copied 7956671 elements
uniq2 copied 11900262 elements
Again, uniq2() copies roughly 1.5 times more elements! (But this greatly depends on the random numbers.)
Try the examples on the Go Playground.
The "fix" is to modify uniq2() to copy until l:
copy(x[i:], x[i+1:l])
l--
With this "appropriate" change, performance is roughly the same.

Efficient allocation of slices (cap vs length)

Assuming I am creating a slice, which I know in advance that I want to populate via a for loop with 1e5 elements via successive calls to append:
// Append 1e5 strings to the slice
for i := 0; i<= 1e5; i++ {
value := fmt.Sprintf("Entry: %d", i)
myslice = append(myslice, value)
}
which is the more efficient way of initialising the slice and why:
a. declaring a nil slice of strings?
var myslice []string
b. setting its length in advance to 1e5?
myslice = make([]string, 1e5)
c. setting both its length and capacity to 1e5?
myslice = make([]string, 1e5, 1e5)
Your b and c solutions are identical: creating a slice with make() where you don't specify the capacity, the "missing" capacity defaults to the given length.
Also note that if you create the slice with a length in advance, you can't use append() to populate the slice, because it adds new elements to the slice, and it doesn't "reuse" the allocated elements. So in that case you have to assign values to the elements using an index expression, e.g. myslice[i] = value.
If you start with a slice with 0 capacity, a new backing array have to be allocated and "old" content have to be copied over whenever you append an element that does not fit into the capacity, so that solution must be slower inherently.
I would define and consider the following different solutions (I use an []int slice to avoid fmt.Sprintf() to intervene / interfere with our benchmarks):
var s []int
func BenchmarkA(b *testing.B) {
for i := 0; i < b.N; i++ {
s = nil
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkB(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkBLocal(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkD(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 1e5)
for j := range s {
s[j] = j
}
}
}
Note: I use package level variables in benchmarks (except BLocal), because some optimization may (and actually do) happen when using a local slice variable).
And the benchmark results:
BenchmarkA-4 1000 1081599 ns/op 4654332 B/op 30 allocs/op
BenchmarkB-4 3000 371096 ns/op 802816 B/op 1 allocs/op
BenchmarkBLocal-4 10000 172427 ns/op 802816 B/op 1 allocs/op
BenchmarkD-4 10000 167305 ns/op 802816 B/op 1 allocs/op
A: As you can see, starting with a nil slice is the slowest, uses the most memory and allocations.
B: Pre-allocating the slice with capacity (but still 0 length) and using append: it requires only a single allocation and is much faster, almost thrice as fast.
BLocal: Do note that when using a local slice instead of a package variable, (compiler) optimizations happen and it gets a lot faster: twice as fast, almost as fast as D.
D: Not using append() but assigning elements to a preallocated slice wins in every aspect, even when using a non-local variable.
For this use case, since you already know the number of string elements that you want to assign to the slice,
I would prefer approach b or c.
Since you will prevent resizing of the slice using these two approaches.
If you choose to use approach a, the slice will double its size everytime a new element is added after len equals capacity.
https://play.golang.org/p/kSuX7cE176j

for loop speed comparison

I was wondering how fast was the len operator in Go and I wrote a simple benchmark. My expectations were that by avoiding calling len during each loop iteration, the code would run faster, but it is in fact the opposite.
Here's the benchmark:
func sumArrayNumber(input []int) int {
var res int
for i, length := 0, len(input); i < length; i += 1 {
res += input[i]
}
return res
}
func sumArrayNumber2(input []int) int {
var res int
for i := 0; i < len(input); i += 1 {
res += input[i]
}
return res
}
var result int
var input = []int{3, 6, 22, 68, 11, -7, 22, 5, 0, 0, 1}
func BenchmarkSumArrayNumber(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber(input)
}
result = r
}
func BenchmarkSumArrayNumber2(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber2(input)
}
result = r
}
And here are the results:
goos: windows
goarch: amd64
BenchmarkSumArrayNumber-8 300000000 4.75 ns/op
BenchmarkSumArrayNumber2-8 300000000 4.67 ns/op
PASS
ok command-line-arguments 4.000s
I confirmed the resistent are consistents by doing the following:
doubling the input array size roughly double the execution time per op. The speed difference scales with the length of the input array.
exchanging the test order does not impact the results.
Why is the code checking len() at every loop iteration is faster?
One may argue that a difference of 0.08ns is not statistically relevant to say that one for-loop is faster than the other. You problably need to run the same test many times (more than 20 times at least), at that point you should be able to derive mean value and standard variation.
Moreover, there are many factors that can speedup the len() operator. Like CPU cache and compiler optimizations. I think that the most relevant factor in your specific example is that len() operator for slice and array just reads the len field in slice's data structure. Thus, it is O(1).

Golang: Find two number index where the sum of these two numbers equals to target number

The problem is: find the index of two numbers that nums[index1] + nums[index2] == target. Here is my attempt in golang (index starts from 1):
package main
import (
"fmt"
)
var nums = []int{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 25182, 25184, 25186, 25188, 25190, 25192, 25194, 25196} // The number list is too long, I put the whole numbers in a gist: https://gist.github.com/nickleeh/8eedb39e008da8b47864
var target int = 16021
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return 0, 0
}
hdict := make(map[int]int)
for i := 1; i < len(nums); i++ {
if val, ok := hdict[nums[i+1]]; ok {
return val, i + 1
} else {
hdict[target-nums[i+1]] = i + 1
}
}
return 0, 0
}
func main() {
fmt.Println(twoSum(nums, target))
}
The nums list is too long, I put it into a gist:
https://gist.github.com/nickleeh/8eedb39e008da8b47864
This code works fine, but I find the return 0,0 part is ugly, and it runs ten times slower than the Julia translation. I would like to know is there any part that is written terrible and affect the performance?
Edit:
Julia's translation:
function two_sum(nums, target)
if length(nums) <= 1
return false
end
hdict = Dict()
for i in 1:length(nums)
if haskey(hdict, nums[i])
return [hdict[nums[i]], i]
else
hdict[target - nums[i]] = i
end
end
end
In my opinion if no elements found adding up to target, best would be to return values which are invalid indices, e.g. -1. Although returning 0, 0 would be enough as a valid index pair can't be 2 equal indices, this is more convenient (because if you forget to check the return values and you attempt to use the invalid indices, you will immediately get a run-time panic, alerting you not to forget checking the validity of the return values). As so, in my solutions I will get rid of that i + 1 shifts as it makes no sense.
Benchmarking of different solutions can be found at the end of the answer.
If sorting allowed:
If the slice is big and not changing, and you have to call this twoSum() function many times, the most efficient solution would be to sort the numbers simply using sort.Ints() in advance:
sort.Ints(nums)
And then you don't have to build a map, you can use binary search implemented in sort.SearchInts():
func twoSumSorted(nums []int, target int) (int, int) {
for i, v := range nums {
v2 := target - v
if j := sort.SearchInts(nums, v2); v2 == nums[j] {
return i, j
}
}
return -1, -1
}
Note: Note that after sorting, the indices returned will be indices of values in the sorted slice. This may differ from indices in the original (unsorted) slice (which may or may not be a problem). If you do need indices from the original order (original, unsorted slice), you may store sorted and unsorted index mapping so you can get what the original index is. For details see this question:
Get the indices of the array after sorting in golang
If sorting is not allowed:
Here is your solution getting rid of that i + 1 shifts as it makes no sense. Slice and array indices are zero based in all languages. Also utilizing for ... range:
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int)
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
If the nums slice is big and the solution is not found fast (meaning the i index grows big) that means a lot of elements will be added to the map. Maps start with small capacity, and they are internally grown if additional space is required to host many elements (key-value pairs). An internal growing requires rehashing and rebuilding with the already added elements. This is "very" expensive.
It does not seem significant but it really is. Since you know the max elements that will end up in the map (worst case is len(nums)), you can create a map with a big-enough capacity to hold all elements for the worst case. The gain will be that no internal growing and rehashing will be required. You can provide the initial capacity as the second argument to make() when creating the map. This speeds up twoSum2() big time if nums is big:
func twoSum2(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int, len(nums))
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
Benchmarking
Here's a little benchmarking code to test execution speed of the 3 solutions with the input nums and target you provided. Note that in order to test twoSumSorted(), you first have to sort the nums slice.
Save this into a file named xx_test.go and run it with go test -bench .:
package main
import (
"sort"
"testing"
)
func BenchmarkTwoSum(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum(nums, target)
}
}
func BenchmarkTwoSum2(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum2(nums, target)
}
}
func BenchmarkTwoSumSorted(b *testing.B) {
sort.Ints(nums)
b.ResetTimer()
for i := 0; i < b.N; i++ {
twoSumSorted(nums, target)
}
}
Output:
BenchmarkTwoSum-4 1000 1405542 ns/op
BenchmarkTwoSum2-4 2000 722661 ns/op
BenchmarkTwoSumSorted-4 10000000 133 ns/op
As you can see, making a map with big enough capacity speeds up: it runs twice as fast.
And as mentioned, if nums can be sorted in advance, that is ~10,000 times faster!
If nums is always sorted, you can do a binary search to see if the complement to whichever number you're on is also in the slice.
func binary(haystack []int, needle, startsAt int) int {
pivot := len(haystack) / 2
switch {
case haystack[pivot] == needle:
return pivot + startsAt
case len(haystack) <= 1:
return -1
case needle > haystack[pivot]:
return binary(haystack[pivot+1:], needle, startsAt+pivot+1)
case needle < haystack[pivot]:
return binary(haystack[:pivot], needle, startsAt)
}
return -1 // code can never fall off here, but the compiler complains
// if you don't have any returns out of conditionals.
}
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := binary(nums, adjusted, 0); j != -1 {
return i, j
}
}
return 0, 0
}
playground example
Or you can use sort.SearchInts which implements binary searching.
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := sort.SearchInts(nums, adjusted); nums[j] == adjusted {
// sort.SearchInts returns the index where the searched number
// would be if it was there. If it's not, then nums[j] != adjusted.
return i, j
}
}
return 0, 0
}

How to remove items from a slice while ranging over it?

What is the best way to remove items from a slice while ranging over it?
For example:
type MultiDataPoint []*DataPoint
func (m MultiDataPoint) Json() ([]byte, error) {
for i, d := range m {
err := d.clean()
if ( err != nil ) {
//Remove the DP from m
}
}
return json.Marshal(m)
}
As you have mentioned elsewhere, you can allocate new memory block and copy only valid elements to it. However, if you want to avoid the allocation, you can rewrite your slice in-place:
i := 0 // output index
for _, x := range s {
if isValid(x) {
// copy and increment index
s[i] = x
i++
}
}
// Prevent memory leak by erasing truncated values
// (not needed if values don't contain pointers, directly or indirectly)
for j := i; j < len(s); j++ {
s[j] = nil
}
s = s[:i]
Full example: http://play.golang.org/p/FNDFswPeDJ
Note this will leave old values after index i in the underlying array, so this will leak memory until the slice itself is garbage collected, if values are or contain pointers. You can solve this by setting all values to nil or the zero value from i until the end of the slice before truncating it.
I know its answered long time ago but i use something like this in other languages, but i don't know if it is the golang way.
Just iterate from back to front so you don't have to worry about indexes that are deleted. I am using the same example as Adam.
m = []int{3, 7, 2, 9, 4, 5}
for i := len(m)-1; i >= 0; i-- {
if m[i] < 5 {
m = append(m[:i], m[i+1:]...)
}
}
There might be better ways, but here's an example that deletes the even values from a slice:
m := []int{1,2,3,4,5,6}
deleted := 0
for i := range m {
j := i - deleted
if (m[j] & 1) == 0 {
m = m[:j+copy(m[j:], m[j+1:])]
deleted++
}
}
Note that I don't get the element using the i, d := range m syntax, since d would end up getting set to the wrong elements once you start deleting from the slice.
Here is a more idiomatic Go way to remove elements from slices.
temp := s[:0]
for _, x := range s {
if isValid(x) {
temp = append(temp, x)
}
}
s = temp
Playground link: https://play.golang.org/p/OH5Ymsat7s9
Note: The example and playground links are based upon #tomasz's answer https://stackoverflow.com/a/20551116/12003457
One other option is to use a normal for loop using the length of the slice and subtract 1 from the index each time a value is removed. See the following example:
m := []int{3, 7, 2, 9, 4, 5}
for i := 0; i < len(m); i++ {
if m[i] < 5 {
m = append(m[:i], m[i+1:]...)
i-- // -1 as the slice just got shorter
}
}
I don't know if len() uses enough resources to make any difference but you could also run it just once and subtract from the length value too:
m := []int{3, 7, 2, 9, 4, 5}
for i, s := 0, len(m); i < s; i++ {
if m[i] < 5 {
m = append(m[:i], m[i+1:]...)
s--
i--
}
}
Something like:
m = append(m[:i], m[i+1:]...)
You don't even need to count backwards but you do need to check that you're at the end of the array where the suggested append() will fail. Here's an example of removing duplicate positive integers from a sorted list:
// Remove repeating numbers
numbers := []int{1, 2, 3, 3, 4, 5, 5}
log.Println(numbers)
for i, numbersCount, prevNum := 0, len(numbers), -1; i < numbersCount; numbersCount = len(numbers) {
if numbers[i] == prevNum {
if i == numbersCount-1 {
numbers = numbers[:i]
} else {
numbers = append(numbers[:i], numbers[i+1:]...)
}
continue
}
prevNum = numbers[i]
i++
}
log.Println(numbers)
Playground: https://play.golang.org/p/v93MgtCQsaN
I just implement a method which removes all nil elements in slice.
And I used it to solve a leetcode problems, it works perfectly.
/**
* Definition for singly-linked list.
* type ListNode struct {
* Val int
* Next *ListNode
* }
*/
func removeNil(lists *[]*ListNode) {
for i := 0; i < len(*lists); i++ {
if (*lists)[i] == nil {
*lists = append((*lists)[:i], (*lists)[i+1:]...)
i--
}
}
}
You can avoid memory leaks, as suggested in #tomasz's answer, controlling the capacity of the underlying array with a full slice expression. Look at the following function that remove duplicates from a slice of integers:
package main
import "fmt"
func removeDuplicates(a []int) []int {
for i, j := 0, 1; i < len(a) && j < len(a); i, j = i+1, j+1 {
if a[i] == a[j] {
copy(a[j:], a[j+1:])
// resize the capacity of the underlying array using the "full slice expression"
// a[low : high : max]
a = a[: len(a)-1 : len(a)-1]
i--
j--
}
}
return a
}
func main() {
a := []int{2, 3, 3, 3, 6, 9, 9}
fmt.Println(a)
a = removeDuplicates(a)
fmt.Println(a)
}
// [2 3 3 3 6 9 9]
// [2 3 6 9]
For reasons #tomasz has explained, there are issues with removing in place. That's why it is practice in golang not to do that, but to reconstruct the slice. So several answers go beyond the answer of #tomasz.
If elements should be unique, it's practice to use the keys of a map for this. I like to contribute an example of deletion by use of a map.
What's nice, the boolean values are available for a second purpose. In this example I calculate Set a minus Set b. As Golang doesn't have a real set, I make sure the output is unique. I use the boolean values as well for the algorithm.
The map gets close to O(n). I don't know the implementation. append() should be O(n). So the runtime is similar fast as deletion in place. Real deletion in place would cause a shifting of the upper end to clean up. If not done in batch, the runtime should be worse.
In this special case, I also use the map as a register, to avoid a nested loop over Set a and Set b to keep the runtime close to O(n).
type Set []int
func differenceOfSets(a, b Set) (difference Set) {
m := map[int]bool{}
for _, element := range a {
m[element] = true
}
for _, element := range b {
if _, registered := m[element]; registered {
m[element] = false
}
}
for element, present := range m {
if present {
difference = append(difference, element)
}
}
return difference
}
Try Sort and Binary search.
Example:
package main
import (
"fmt"
"sort"
)
func main() {
// Our slice.
s := []int{3, 7, 2, 9, 4, 5}
// 1. Iterate over it.
for i, v := range s {
func(i, v int) {}(i, v)
}
// 2. Sort it. (by whatever condition of yours)
sort.Slice(s, func(i, j int) bool {
return s[i] < s[j]
})
// 3. Cut it only once.
i := sort.Search(len(s), func(i int) bool { return s[i] >= 5 })
s = s[i:]
// That's it!
fmt.Println(s) // [5 7 9]
}
https://play.golang.org/p/LnF6o0yMJGT

Resources