for loop speed comparison - performance

I was wondering how fast was the len operator in Go and I wrote a simple benchmark. My expectations were that by avoiding calling len during each loop iteration, the code would run faster, but it is in fact the opposite.
Here's the benchmark:
func sumArrayNumber(input []int) int {
var res int
for i, length := 0, len(input); i < length; i += 1 {
res += input[i]
}
return res
}
func sumArrayNumber2(input []int) int {
var res int
for i := 0; i < len(input); i += 1 {
res += input[i]
}
return res
}
var result int
var input = []int{3, 6, 22, 68, 11, -7, 22, 5, 0, 0, 1}
func BenchmarkSumArrayNumber(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber(input)
}
result = r
}
func BenchmarkSumArrayNumber2(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber2(input)
}
result = r
}
And here are the results:
goos: windows
goarch: amd64
BenchmarkSumArrayNumber-8 300000000 4.75 ns/op
BenchmarkSumArrayNumber2-8 300000000 4.67 ns/op
PASS
ok command-line-arguments 4.000s
I confirmed the resistent are consistents by doing the following:
doubling the input array size roughly double the execution time per op. The speed difference scales with the length of the input array.
exchanging the test order does not impact the results.
Why is the code checking len() at every loop iteration is faster?

One may argue that a difference of 0.08ns is not statistically relevant to say that one for-loop is faster than the other. You problably need to run the same test many times (more than 20 times at least), at that point you should be able to derive mean value and standard variation.
Moreover, there are many factors that can speedup the len() operator. Like CPU cache and compiler optimizations. I think that the most relevant factor in your specific example is that len() operator for slice and array just reads the len field in slice's data structure. Thus, it is O(1).

Related

Why is accessing a variable so much slower than accessing len()?

I wrote this function uniq that takes in a sorted slice of ints
and returns the slice with duplicates removed:
func uniq(x []int) []int {
i := 0
for i < len(x)-1 {
if x[i] == x[i+1] {
copy(x[i:], x[i+1:])
x = x[:len(x)-1]
} else {
i++
}
}
return x
}
and uniq2, a rewrite of uniq with the same results:
func uniq2(x []int) []int {
i := 0
l := len(x)
for i < l-1 {
if x[i] == x[i+1] {
copy(x[i:], x[i+1:])
l--
} else {
i++
}
}
return x[:l]
}
The only difference between the two functions
is that in uniq2, instead of slicing x
and directly accessing len(x) each time,
I save len(x) to a variable l
and decrement it whenever I shift the slice.
I thought that uniq2 would be slightly faster than uniq
because len(x) would no longer be called iteration,
but in reality, it is inexplicably much slower.
With this test that generates a random sorted slice
and calls uniq/uniq2 on it 1000 times,
which I run on Linux:
func main() {
rand.Seed(time.Now().Unix())
for i := 0; i < 1000; i++ {
_ = uniq(genSlice())
//_ = uniq2(genSlice())
}
}
func genSlice() []int {
x := make([]int, 0, 1000)
for num := 1; num <= 10; num++ {
amount := rand.Intn(1000)
for i := 0; i < amount; i++ {
x = append(x, num)
}
}
return x
}
$ go build uniq.go
$ time ./uniq
uniq usually takes 5--6 seconds to finish.
while uniq2 is more than two times slower,
taking between 12--15 seconds.
Why is uniq2, where I save the slice length to a variable,
so much slower than uniq, where I directly call len?
Shouldn't it slightly faster?
You expect roughly the same execution time because you think they do roughly the same thing.
The only difference between the two functions is that in uniq2, instead of slicing x and directly accessing len(x) each time, I save len(x) to a variable l and decrement it whenever I shift the slice.
This is wrong.
The first version does:
copy(x[i:], x[i+1:])
x = x[:len(x)-1]
And second does:
copy(x[i:], x[i+1:])
l--
The first difference is that the first assigns (copies) a slice header which is a reflect.SliceHeader value, being 3 integer (24 bytes on 64-bit architecture), while l-- does a simple decrement, it's much faster.
But the main difference does not stem from this. The main difference is that since the first version changes the x slice (the header, the length included), you end up copying less and less elements, while the second version does not change x and always copies to the end of the slice. x[i+1:] is equivalent to x[x+1:len(x)].
To demonstrate, imagine you pass a slice with length=10 and having all equal elements. The first version will copy 9 elements first, then 8, then 7 etc. The second version will copy 9 elements first, then 9 again, then 9 again etc.
Let's modify your functions to count the number of copied elements:
func uniq(x []int) []int {
count := 0
i := 0
for i < len(x)-1 {
if x[i] == x[i+1] {
count += copy(x[i:], x[i+1:])
x = x[:len(x)-1]
} else {
i++
}
}
fmt.Println("uniq copied", count, "elements")
return x
}
func uniq2(x []int) []int {
count := 0
i := 0
l := len(x)
for i < l-1 {
if x[i] == x[i+1] {
count += copy(x[i:], x[i+1:])
l--
} else {
i++
}
}
fmt.Println("uniq2 copied", count, "elements")
return x[:l]
}
Testing it:
uniq(make([]int, 1000))
uniq2(make([]int, 1000))
Output is:
uniq copied 499500 elements
uniq2 copied 998001 elements
uniq2() copies twice as many elements!
If we test it with a random slice:
uniq(genSlice())
uniq2(genSlice())
Output is:
uniq copied 7956671 elements
uniq2 copied 11900262 elements
Again, uniq2() copies roughly 1.5 times more elements! (But this greatly depends on the random numbers.)
Try the examples on the Go Playground.
The "fix" is to modify uniq2() to copy until l:
copy(x[i:], x[i+1:l])
l--
With this "appropriate" change, performance is roughly the same.

Efficient allocation of slices (cap vs length)

Assuming I am creating a slice, which I know in advance that I want to populate via a for loop with 1e5 elements via successive calls to append:
// Append 1e5 strings to the slice
for i := 0; i<= 1e5; i++ {
value := fmt.Sprintf("Entry: %d", i)
myslice = append(myslice, value)
}
which is the more efficient way of initialising the slice and why:
a. declaring a nil slice of strings?
var myslice []string
b. setting its length in advance to 1e5?
myslice = make([]string, 1e5)
c. setting both its length and capacity to 1e5?
myslice = make([]string, 1e5, 1e5)
Your b and c solutions are identical: creating a slice with make() where you don't specify the capacity, the "missing" capacity defaults to the given length.
Also note that if you create the slice with a length in advance, you can't use append() to populate the slice, because it adds new elements to the slice, and it doesn't "reuse" the allocated elements. So in that case you have to assign values to the elements using an index expression, e.g. myslice[i] = value.
If you start with a slice with 0 capacity, a new backing array have to be allocated and "old" content have to be copied over whenever you append an element that does not fit into the capacity, so that solution must be slower inherently.
I would define and consider the following different solutions (I use an []int slice to avoid fmt.Sprintf() to intervene / interfere with our benchmarks):
var s []int
func BenchmarkA(b *testing.B) {
for i := 0; i < b.N; i++ {
s = nil
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkB(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkBLocal(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkD(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 1e5)
for j := range s {
s[j] = j
}
}
}
Note: I use package level variables in benchmarks (except BLocal), because some optimization may (and actually do) happen when using a local slice variable).
And the benchmark results:
BenchmarkA-4 1000 1081599 ns/op 4654332 B/op 30 allocs/op
BenchmarkB-4 3000 371096 ns/op 802816 B/op 1 allocs/op
BenchmarkBLocal-4 10000 172427 ns/op 802816 B/op 1 allocs/op
BenchmarkD-4 10000 167305 ns/op 802816 B/op 1 allocs/op
A: As you can see, starting with a nil slice is the slowest, uses the most memory and allocations.
B: Pre-allocating the slice with capacity (but still 0 length) and using append: it requires only a single allocation and is much faster, almost thrice as fast.
BLocal: Do note that when using a local slice instead of a package variable, (compiler) optimizations happen and it gets a lot faster: twice as fast, almost as fast as D.
D: Not using append() but assigning elements to a preallocated slice wins in every aspect, even when using a non-local variable.
For this use case, since you already know the number of string elements that you want to assign to the slice,
I would prefer approach b or c.
Since you will prevent resizing of the slice using these two approaches.
If you choose to use approach a, the slice will double its size everytime a new element is added after len equals capacity.
https://play.golang.org/p/kSuX7cE176j

performance of for range in go

When ranging over an array, two values are returned for each iteration. The first is the index, and the second is a copy of the element at that index.
Here's my code:
var myArray = [5]int {1,2,3,4,5}
sum := 0
// first with copy
for _, value := range myArray {
sum += value
}
// second without copy
for i := range myArray {
sum += myArray[i]
}
Which one should i use for better performance?
Is there any difference for built-in types in these two pieces of code?
We can test this using Go's benchmarking tool (read more at https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go).
sum_test.go
package sum
import "testing"
func BenchmarkSumIterator(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for j := range ints {
sum += ints[j]
}
}
}
func BenchmarkSumRange(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for _, value := range ints {
sum += value
}
}
}
Run it with:
$ go test -bench=. sum_test.go
goos: linux
goarch: amd64
BenchmarkSumIterator-4 412796047 2.97 ns/op
BenchmarkSumRange-4 413581974 2.89 ns/op
PASS
ok command-line-arguments 3.010s
Range appears be to slightly more efficient. Running this benchmark a few more times also confirms this. It's worth noting that this may only be true for this specific case where you have a small fixed size array. You should try to make decisions like these based on what you'd encounter in production and also try to reconcile that with code readability.
the second one is faster but the difference is too low which you can ignore
the main difference is when you have a big size loop. in that case first loop takes more memory than the second one

How to generate a stream of *unique* random numbers in Go using the standard library

How can I generate a stream of unique random number in Go?
I want to guarantee there are no duplicate values in array a using math/rand and/or standard Go library utilities.
func RandomNumberGenerator() *rand.Rand {
s1 := rand.NewSource(time.Now().UnixNano())
r1 := rand.New(s1)
return r1
}
rng := RandomNumberGenerator()
N := 10000
for i := 0; i < N; i++ {
a[i] = rng.Int()
}
There are questions and solutions on how to generate a series of random number in Go, for example, here.
But I would like to generate a series of random numbers that does not duplicate previous values. Is there a standard/recommended way to achieve this in Go?
My guess is to (1) use permutation or to (2) keep track of previously generated numbers and regenerate a value if it's been generated before.
But solution (1) sounds like overkill if I only want a few number and (2) sounds very time consuming if I end up generating a long series of random numbers due to collision, and I guess it's also very memory-consuming.
Use Case: To benchmark a Go program with 10K, 100K, 1M pseudo-random number that has no duplicates.
You should absolutely go with approach 2. Let's assume you're running on a 64-bit machine, and thus generating 63-bit integers (64 bits, but rand.Int never returns negative numbers). Even if you generate 4 billion numbers, there's still only a 1 in 4 billion chance that any given number will be a duplicate. Thus, you'll almost never have to regenerate, and almost never never have to regenerate twice.
Try, for example:
type UniqueRand struct {
generated map[int]bool
}
func (u *UniqueRand) Int() int {
for {
i := rand.Int()
if !u.generated[i] {
u.generated[i] = true
return i
}
}
}
I had similar task to pick elements from initial slice by random uniq index. So from slice with 10k elements get 1k random uniq elements.
Here is simple head on solution:
import (
"time"
"math/rand"
)
func getRandomElements(array []string) []string {
result := make([]string, 0)
existingIndexes := make(map[int]struct{}, 0)
randomElementsCount := 1000
for i := 0; i < randomElementsCount; i++ {
randomIndex := randomIndex(len(array), existingIndexes)
result = append(result, array[randomIndex])
}
return result
}
func randomIndex(size int, existingIndexes map[int]struct{}) int {
rand.Seed(time.Now().UnixNano())
for {
randomIndex := rand.Intn(size)
_, exists := existingIndexes[randomIndex]
if !exists {
existingIndexes[randomIndex] = struct{}{}
return randomIndex
}
}
}
I see two reasons for wanting this. You want to test a random number generator, or you want unique random numbers.
You're Testing A Random Number Generator
My first question is why? There's plenty of solid random number generators available. Don't write your own, it's basically dabbling in cryptography and that's never a good idea. Maybe you're testing a system that uses a random number generator to generate random output?
There's a problem: there's no guarantee random numbers are unique. They're random. There's always a possibility of collision. Testing that random output is unique is incorrect.
Instead, you want to test the results are distributed evenly. To do this I'll reference another answer about how to test a random number generator.
You Want Unique Random Numbers
From a practical perspective you don't need guaranteed uniqueness, but to make collisions so unlikely that it's not a concern. This is what UUIDs are for. They're 128 bit Universally Unique IDentifiers. There's a number of ways to generate them for particular scenarios.
UUIDv4 is basically just a 122 bit random number which has some ungodly small chance of a collision. Let's approximate it.
n = how many random numbers you'll generate
M = size of the keyspace (2^122 for a 122 bit random number)
P = probability of collision
P = n^2/2M
Solving for n...
n = sqrt(2MP)
Setting P to something absurd like 1e-12 (one in a trillion), we find you can generate about 3.2 trillion UUIDv4s with a 1 in a trillion chance of collision. You're 1000 times more likely to win the lottery than have a collision in 3.2 trillion UUIDv4s. I think that's acceptable.
Here's a UUIDv4 library in Go to use and a demonstration of generating 1 million unique random 128 bit values.
package main
import (
"fmt"
"github.com/frankenbeanies/uuid4"
)
func main() {
for i := 0; i <= 1000000; i++ {
uuid := uuid4.New().Bytes()
// use the uuid
}
}
you can generate a unique random number with len(12) using UnixNano in golang time package :
uniqueNumber:=time.Now().UnixNano()/(1<<22)
println(uniqueNumber)
it's always random :D
1- Fast positive and negative int32 unique pseudo random numbers in 296ms using std lib:
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const n = 1000000
rand.Seed(time.Now().UTC().UnixNano())
duplicate := 0
mp := make(map[int32]struct{}, n)
var r int32
t := time.Now()
for i := 0; i < n; {
r = rand.Int31()
if i&1 == 0 {
r = -r
}
if _, ok := mp[r]; ok {
duplicate++
} else {
mp[r] = zero
i++
}
}
fmt.Println(time.Since(t))
fmt.Println("len: ", len(mp))
fmt.Println("duplicate: ", duplicate)
positive := 0
for k := range mp {
if k > 0 {
positive++
}
}
fmt.Println(`n=`, n, `positive=`, positive)
}
var zero = struct{}{}
output:
296.0169ms
len: 1000000
duplicate: 118
n= 1000000 positive= 500000
2- Just fill the map[int32]struct{}:
for i := int32(0); i < n; i++ {
m[i] = zero
}
When reading it is not in order in Go:
for k := range m {
fmt.Print(k, " ")
}
And this just takes 183ms for 1000000 unique numbers, no duplicate (The Go Playground):
package main
import (
"fmt"
"time"
)
func main() {
const n = 1000000
m := make(map[int32]struct{}, n)
t := time.Now()
for i := int32(0); i < n; i++ {
m[i] = zero
}
fmt.Println(time.Since(t))
fmt.Println("len: ", len(m))
// for k := range m {
// fmt.Print(k, " ")
// }
}
var zero = struct{}{}
3- Here is the simple but slow (this takes 22s for 200000 unique numbers), so you may generate and save it to a file once:
package main
import "time"
import "fmt"
import "math/rand"
func main() {
dup := 0
t := time.Now()
const n = 200000
rand.Seed(time.Now().UTC().UnixNano())
var a [n]int32
var exist bool
for i := 0; i < n; {
r := rand.Int31()
exist = false
for j := 0; j < i; j++ {
if a[j] == r {
dup++
fmt.Println(dup)
exist = true
break
}
}
if !exist {
a[i] = r
i++
}
}
fmt.Println(time.Since(t))
}
Temporary workaround based on #joshlf's answer
type UniqueRand struct {
generated map[int]bool //keeps track of
rng *rand.Rand //underlying random number generator
scope int //scope of number to be generated
}
//Generating unique rand less than N
//If N is less or equal to 0, the scope will be unlimited
//If N is greater than 0, it will generate (-scope, +scope)
//If no more unique number can be generated, it will return -1 forwards
func NewUniqueRand(N int) *UniqueRand{
s1 := rand.NewSource(time.Now().UnixNano())
r1 := rand.New(s1)
return &UniqueRand{
generated: map[int]bool{},
rng: r1,
scope: N,
}
}
func (u *UniqueRand) Int() int {
if u.scope > 0 && len(u.generated) >= u.scope {
return -1
}
for {
var i int
if u.scope > 0 {
i = u.rng.Int() % u.scope
}else{
i = u.rng.Int()
}
if !u.generated[i] {
u.generated[i] = true
return i
}
}
}
Client side code
func TestSetGet2(t *testing.T) {
const N = 10000
for _, mask := range []int{0, -1, 0x555555, 0xaaaaaa, 0x333333, 0xcccccc, 0x314159} {
rng := NewUniqueRand(2*N)
a := make([]int, N)
for i := 0; i < N; i++ {
a[i] = (rng.Int() ^ mask) << 1
}
//Benchmark Code
}
}

Golang: Find two number index where the sum of these two numbers equals to target number

The problem is: find the index of two numbers that nums[index1] + nums[index2] == target. Here is my attempt in golang (index starts from 1):
package main
import (
"fmt"
)
var nums = []int{0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 25182, 25184, 25186, 25188, 25190, 25192, 25194, 25196} // The number list is too long, I put the whole numbers in a gist: https://gist.github.com/nickleeh/8eedb39e008da8b47864
var target int = 16021
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return 0, 0
}
hdict := make(map[int]int)
for i := 1; i < len(nums); i++ {
if val, ok := hdict[nums[i+1]]; ok {
return val, i + 1
} else {
hdict[target-nums[i+1]] = i + 1
}
}
return 0, 0
}
func main() {
fmt.Println(twoSum(nums, target))
}
The nums list is too long, I put it into a gist:
https://gist.github.com/nickleeh/8eedb39e008da8b47864
This code works fine, but I find the return 0,0 part is ugly, and it runs ten times slower than the Julia translation. I would like to know is there any part that is written terrible and affect the performance?
Edit:
Julia's translation:
function two_sum(nums, target)
if length(nums) <= 1
return false
end
hdict = Dict()
for i in 1:length(nums)
if haskey(hdict, nums[i])
return [hdict[nums[i]], i]
else
hdict[target - nums[i]] = i
end
end
end
In my opinion if no elements found adding up to target, best would be to return values which are invalid indices, e.g. -1. Although returning 0, 0 would be enough as a valid index pair can't be 2 equal indices, this is more convenient (because if you forget to check the return values and you attempt to use the invalid indices, you will immediately get a run-time panic, alerting you not to forget checking the validity of the return values). As so, in my solutions I will get rid of that i + 1 shifts as it makes no sense.
Benchmarking of different solutions can be found at the end of the answer.
If sorting allowed:
If the slice is big and not changing, and you have to call this twoSum() function many times, the most efficient solution would be to sort the numbers simply using sort.Ints() in advance:
sort.Ints(nums)
And then you don't have to build a map, you can use binary search implemented in sort.SearchInts():
func twoSumSorted(nums []int, target int) (int, int) {
for i, v := range nums {
v2 := target - v
if j := sort.SearchInts(nums, v2); v2 == nums[j] {
return i, j
}
}
return -1, -1
}
Note: Note that after sorting, the indices returned will be indices of values in the sorted slice. This may differ from indices in the original (unsorted) slice (which may or may not be a problem). If you do need indices from the original order (original, unsorted slice), you may store sorted and unsorted index mapping so you can get what the original index is. For details see this question:
Get the indices of the array after sorting in golang
If sorting is not allowed:
Here is your solution getting rid of that i + 1 shifts as it makes no sense. Slice and array indices are zero based in all languages. Also utilizing for ... range:
func twoSum(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int)
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
If the nums slice is big and the solution is not found fast (meaning the i index grows big) that means a lot of elements will be added to the map. Maps start with small capacity, and they are internally grown if additional space is required to host many elements (key-value pairs). An internal growing requires rehashing and rebuilding with the already added elements. This is "very" expensive.
It does not seem significant but it really is. Since you know the max elements that will end up in the map (worst case is len(nums)), you can create a map with a big-enough capacity to hold all elements for the worst case. The gain will be that no internal growing and rehashing will be required. You can provide the initial capacity as the second argument to make() when creating the map. This speeds up twoSum2() big time if nums is big:
func twoSum2(nums []int, target int) (int, int) {
if len(nums) <= 1 {
return -1, -1
}
m := make(map[int]int, len(nums))
for i, v := range nums {
if j, ok := m[v]; ok {
return j, i
}
m[target-v] = i
}
return -1, -1
}
Benchmarking
Here's a little benchmarking code to test execution speed of the 3 solutions with the input nums and target you provided. Note that in order to test twoSumSorted(), you first have to sort the nums slice.
Save this into a file named xx_test.go and run it with go test -bench .:
package main
import (
"sort"
"testing"
)
func BenchmarkTwoSum(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum(nums, target)
}
}
func BenchmarkTwoSum2(b *testing.B) {
for i := 0; i < b.N; i++ {
twoSum2(nums, target)
}
}
func BenchmarkTwoSumSorted(b *testing.B) {
sort.Ints(nums)
b.ResetTimer()
for i := 0; i < b.N; i++ {
twoSumSorted(nums, target)
}
}
Output:
BenchmarkTwoSum-4 1000 1405542 ns/op
BenchmarkTwoSum2-4 2000 722661 ns/op
BenchmarkTwoSumSorted-4 10000000 133 ns/op
As you can see, making a map with big enough capacity speeds up: it runs twice as fast.
And as mentioned, if nums can be sorted in advance, that is ~10,000 times faster!
If nums is always sorted, you can do a binary search to see if the complement to whichever number you're on is also in the slice.
func binary(haystack []int, needle, startsAt int) int {
pivot := len(haystack) / 2
switch {
case haystack[pivot] == needle:
return pivot + startsAt
case len(haystack) <= 1:
return -1
case needle > haystack[pivot]:
return binary(haystack[pivot+1:], needle, startsAt+pivot+1)
case needle < haystack[pivot]:
return binary(haystack[:pivot], needle, startsAt)
}
return -1 // code can never fall off here, but the compiler complains
// if you don't have any returns out of conditionals.
}
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := binary(nums, adjusted, 0); j != -1 {
return i, j
}
}
return 0, 0
}
playground example
Or you can use sort.SearchInts which implements binary searching.
func twoSum(nums []int, target int) (int, int) {
for i, num := range nums {
adjusted := target - num
if j := sort.SearchInts(nums, adjusted); nums[j] == adjusted {
// sort.SearchInts returns the index where the searched number
// would be if it was there. If it's not, then nums[j] != adjusted.
return i, j
}
}
return 0, 0
}

Resources