Why adding items to map during its iteration produce inconsistent result? - go

From Go Spec:
If map entries are created during iteration, that entry may be produced during the iteration or may be skipped.
So what I expect from that statement is that the following code should at least print number 1, and how many more numbers which are going to be printed is not predictable and is different each time you run the program:
package main
import (
"fmt"
)
func main() {
test := make(map[int]int)
test[1] = 1
j := 2
for i, v := range test {
fmt.Println(i, v)
test[j] = j
j++
}
return
}
Go playground link
On my own laptop (Go version 1.8) at maximum it prints till 8, in playground (still version 1.8) it prints exactly till 3!
I don't care much about the result from playground since its go is not vanilla but I wonder why on my local it never prints more than 8? even I tried to add more items in each iteration to make the possibility of going over 8 higher but there's no difference.
EDIT: my own explanation based on #Schwern 's answer
when the map is created with make function and without any size parameter only 1 bucket is assigned and in go each bucket has a size of 8 elements, so when the range starts it sees that the map has only 1 bucket and it will iterate at maximum 8 times. If I use a size parameter bigger than 7 like make(map[int]int, 8) two buckets is created and there would be possibility that I get more than 8 iterations over the added items.

This is an issue inherent in the design of most hash tables. Here's a simple explanation hand waving a lot of unnecessary detail.
Under the hood, a hash table is an array. Each key is mapped onto an element in the array using a hash function. For example, "foo" might map to element 8, "bar" might map to element 4, and so on. Some elements are empty.
for k,v := range hash iterates through this array in whatever order they happen to appear. The ordering is unpredictable to avoid a collision attack.
When you add to a hash, it adds to the underlying array. It might even have to allocate a new, larger array. It's unpredictable where that new key will land in the hash's array.
So if you add more pairs while you're iterating through the hash, any pair that gets put into the array before the current index won't be seen; the iteration has already past that point. Anything that gets put after might be seen; the iteration has yet to reach that point, but the array might get reallocated and the pairs possibly rehashed.
but I wonder why on my local it never prints more than 8
Because the underlying array is probably of length 8. Go likely allocates the underlying array in powers of 2 and probably starts at 8. The range hash probably starts by checking the length of the underlying array and will not go further, even if it's grown.
Long story short: don't add keys to a hash while iterating through it.

Related

for loop value semantic in golang

First question about Go in SO. The code below shows, n has the same address in each iteration. I am aware that such a for loop is called value semantic by some people and what's actually ranged over is a copy of the slice not the actual slice itself. Why does n in each iteration has the same address? Is it because each element in the slice is copied rather than the whole slice is copied once beforehand. If only each element from the original slice is copied, then a single memory address can be reused in each iteration?
package main
import (
"fmt"
)
func main() {
numbers := []int{1, 2}
for i, n := range numbers {
fmt.Println(&n, &numbers[i])
}
}
A sample result from go playground:
0xc000122030 0xc000122020
0xc000122030 0xc000122028
You are slightly wrong in your question, it is not a copy of the slice that is being iterated over. In Go when you pass a slice you really pass a pointer to memory and the size and capacity of that memory, this is called a slice header. The header is copied, but the copy points to the same underlying memory, meaning that when you pass a []int to a function, change the values in that function, the values will be changed in the original []int in the outer code as well.
This is in contrast to an array like [5]int which is passed by value, meaninig this would really be copied when you pass it around. In Go structs, strings, numbers and arrays are passed by value. Slices are really also passed by value but as described above, the value in this case contains a pointer to memory. Passing a copy of a pointer still lets you change the memory pointed to.
Now to your experiment:
for i, n := range numbers
will create two variables before the loop starts: integers i and n. In each loop iteration i will be incremented by 1 and n will be assigned the value (a copy of the integer value that is) of numbers[i].
This means there really are only two variables i and n. They are the same which is what you see in your output.
The addresses of numbers[i] are different of course, they are the memory addresses of the items in the array.
The Go Wiki has a Common Mistakes page talking about this exact issue. It also provides an explanation of how to avoid this issue in real code. The quick answer is that this is done for efficiency, and has little to do with the slice. n is a single variable / memory location that gets assigned a new value on each iteration.
If you want additional insight into why this happens under the hood, take a look at this post.

Ignoring values in Go's range

I have a question about Go's range and the way in which ignoring return values work.
As the documentations stands, each iteration, range produces two values - an index and a copy of the current element, so if during an iteration we want to change a field in the current elmenet, or the element itself we have to reference it via elements[index], because elment is just a copy (assuming that the loop looks something like this for index, element := range elements).
Go also allows to ignore some of the return values using _, so we can write for index, _ := elements.
But it got me thinking. Is Go smart enough to actually don't make a copy when using the _ in range? If it isn't, then if elements in elements are quite big and we have multiple processes/go-routines running our code, we are wasting memory, and it would be better to use looping based on len(elements).
Am I right?
EDIT
We can also use
for index := range elements
which seems like a good alternative.
The for range construct may produce up to 2 values, but not necessarily 2 (it can also be a single index value or map key or value sent on channel, or even none at all).
Quoting from Spec: For statements:
If the range expression is a channel, at most one iteration variable is permitted, otherwise there may be up to two.
Continuing the quote:
If the last iteration variable is the blank identifier, the range clause is equivalent to the same clause without that identifier.
What this means is that:
for i, _ := range something {}
Is equivalent to:
for i := range something {}
And another quote from Spec: For statements, a little bit down the road:
For an array, pointer to array, or slice value a, the index iteration values are produced in increasing order, starting at element index 0. If at most one iteration variable is present, the range loop produces iteration values from 0 up to len(a)-1 and does not index into the array or slice itself.
So the extra blank identifier in the first case does not cause extra allocations or copying, and it's completely unnecessary and useless.

Maps in Go - how to avoid double key lookup?

Suppose I want to update some existing value in a map, or do something else if the key is not found. How do I do this, without performing 2 lookups? What's the golang equivalent of the following C++ code:
auto it = m.find(key);
if (it != m.end()) {
// update the value, without performing a second lookup
it->second = calc_new_value(it->second);
} else {
// do something else
m.insert(make_pair(key, 42));
}
Go does not expose the map's internal (key,value) pair data structure like C++ does, so you can't replicate this exactly.
One possible work around would be to make the values of your map pointers, so you can keep the same values in the map but update what they point to. For example, if m is a map[int]*int, you could change a value with:
v := m[10]
*v = 42
With that said, I wouldn't be surprised if the savings from reducing the number of hash lookups will be eaten by the additional memory management overhead. So it would be worth benchmarking whatever solution you settle on.
You cannot. The situation is actually the same with Python dicts. However it shouldn't matter. Both lookup and assignment to a Go map are amortized O(1). Combining the two operations has the same time complexity.

What are the differences between an array and a range in ruby?

just wondering what the subtle difference between an array and a range is. I came across an example where I have x = *(1..10) output x as an array and *(1..10) == (1..10).to_a throws an error. This means to me there is a subtle difference between the two and I'm just curious what it is.
Firstly, when you're not in the middle of an assignment or parameter-passing, *(1..10) is a syntax error because the splat operator doesn't parse that way. That's not really related to arrays or ranges per se, but I thought I'd clear up why that's an error.
Secondly, arrays and ranges are really apples and oranges. An array is an object that's a collection of arbitrary elements. A range is an object that has a "start" and an "end", and knows how to move from the start to the end without having to enumerate all the elements in between.
Finally, when you convert a range to an array with to_a, you're not really "converting" it so much as you're saying, "start at the beginning of this range and keep giving me elements until you reach the end". In the case of "(1..10)", the range is giving you 1, then 2, then 3, and so on, until you get to 10.
One difference is that ranges do not separately store every element in itself, unlike an array.
r = (1..1000000) # very fast
r.to_a # sloooooow
You lose the ability to index to an arbitrary point, however.

Is this linear search implementation actually useful?

In Matters Computational I found this interesting linear search implementation (it's actually my Java implementation ;-)):
public static int linearSearch(int[] a, int key) {
int high = a.length - 1;
int tmp = a[high];
// put a sentinel at the end of the array
a[high] = key;
int i = 0;
while (a[i] != key) {
i++;
}
// restore original value
a[high] = tmp;
if (i == high && key != tmp) {
return NOT_CONTAINED;
}
return i;
}
It basically uses a sentinel, which is the searched for value, so that you always find the value and don't have to check for array boundaries. The last element is stored in a temp variable, and then the sentinel is placed at the last position. When the value is found (remember, it is always found due to the sentinel), the original element is restored and the index is checked if it represents the last index and is unequal to the searched for value. If that's the case, -1 (NOT_CONTAINED) is returned, otherwise the index.
While I found this implementation really clever, I wonder if it is actually useful. For small arrays, it seems to be always slower, and for large arrays it only seems to be faster when the value is not found. Any ideas?
EDIT
The original implementation was written in C++, so that could make a difference.
It's not thread-safe, for example, you can lose your a[high] value through having a second thread start after the first has changed a[high] to key, so will record key to tmp, and finish after the first thread has restored a[high] to its original value. The second thread will restore a[high] to what it first saw, which was the first thread's key.
It's also not useful in java, since the JVM will include bounds checks on your array, so your while loop is checking that you're not going past the end of your array anyway.
Will you ever notice any speed increase from this? No
Will you notice a lack of readability? Yes
Will you notice an unnecessary array mutation that might cause concurrency issues? Yes
Premature optimization is the root of all evil.
Doesn't seem particularly useful. The "innovation" here is just to get rid of the iteration test by combining it with the match test. Modern processors spend 0 time on iteration checks these days (all the computation and branching gets done in parallel with the match test code).
In any case, binary search kicks the ass of this code on large arrays, and is comparable on small arrays. Linear search is so 1960s.
See also the 'finding a tiger in africa' joke.
Punchline = An experienced programmer places a tiger in cairo so that the search terminates.
A sentinel search goes back to Knuth. It value is that it reduces the number of tests in a loop from two ("does the key match? Am I at the end?") to just one.
Yes, its useful, in the sense that it should significantly reduce search times for modest size unordered arrays, by virtue of eliminating conditional branch mispredictions. This also reduces insertion times (code not shown by the OP) for such arrays, because you don't have to order the items.
If you have larger arrays of ordered items, a binary search will be faster, at the cost of larger insertion time to ensure the array is ordered.
For even larger sets, a hash table will be the fastest.
The real question is what is the distribution of sizes of your arrays?
Yes - it does because while loop doesn't have 2 comparisons as opposed to standard search.
It is twice as fast.It is given as optimization in Knuth Vol 3.

Resources