I have a question about Go's range and the way in which ignoring return values work.
As the documentations stands, each iteration, range produces two values - an index and a copy of the current element, so if during an iteration we want to change a field in the current elmenet, or the element itself we have to reference it via elements[index], because elment is just a copy (assuming that the loop looks something like this for index, element := range elements).
Go also allows to ignore some of the return values using _, so we can write for index, _ := elements.
But it got me thinking. Is Go smart enough to actually don't make a copy when using the _ in range? If it isn't, then if elements in elements are quite big and we have multiple processes/go-routines running our code, we are wasting memory, and it would be better to use looping based on len(elements).
Am I right?
EDIT
We can also use
for index := range elements
which seems like a good alternative.
The for range construct may produce up to 2 values, but not necessarily 2 (it can also be a single index value or map key or value sent on channel, or even none at all).
Quoting from Spec: For statements:
If the range expression is a channel, at most one iteration variable is permitted, otherwise there may be up to two.
Continuing the quote:
If the last iteration variable is the blank identifier, the range clause is equivalent to the same clause without that identifier.
What this means is that:
for i, _ := range something {}
Is equivalent to:
for i := range something {}
And another quote from Spec: For statements, a little bit down the road:
For an array, pointer to array, or slice value a, the index iteration values are produced in increasing order, starting at element index 0. If at most one iteration variable is present, the range loop produces iteration values from 0 up to len(a)-1 and does not index into the array or slice itself.
So the extra blank identifier in the first case does not cause extra allocations or copying, and it's completely unnecessary and useless.
Related
First question about Go in SO. The code below shows, n has the same address in each iteration. I am aware that such a for loop is called value semantic by some people and what's actually ranged over is a copy of the slice not the actual slice itself. Why does n in each iteration has the same address? Is it because each element in the slice is copied rather than the whole slice is copied once beforehand. If only each element from the original slice is copied, then a single memory address can be reused in each iteration?
package main
import (
"fmt"
)
func main() {
numbers := []int{1, 2}
for i, n := range numbers {
fmt.Println(&n, &numbers[i])
}
}
A sample result from go playground:
0xc000122030 0xc000122020
0xc000122030 0xc000122028
You are slightly wrong in your question, it is not a copy of the slice that is being iterated over. In Go when you pass a slice you really pass a pointer to memory and the size and capacity of that memory, this is called a slice header. The header is copied, but the copy points to the same underlying memory, meaning that when you pass a []int to a function, change the values in that function, the values will be changed in the original []int in the outer code as well.
This is in contrast to an array like [5]int which is passed by value, meaninig this would really be copied when you pass it around. In Go structs, strings, numbers and arrays are passed by value. Slices are really also passed by value but as described above, the value in this case contains a pointer to memory. Passing a copy of a pointer still lets you change the memory pointed to.
Now to your experiment:
for i, n := range numbers
will create two variables before the loop starts: integers i and n. In each loop iteration i will be incremented by 1 and n will be assigned the value (a copy of the integer value that is) of numbers[i].
This means there really are only two variables i and n. They are the same which is what you see in your output.
The addresses of numbers[i] are different of course, they are the memory addresses of the items in the array.
The Go Wiki has a Common Mistakes page talking about this exact issue. It also provides an explanation of how to avoid this issue in real code. The quick answer is that this is done for efficiency, and has little to do with the slice. n is a single variable / memory location that gets assigned a new value on each iteration.
If you want additional insight into why this happens under the hood, take a look at this post.
This is my situation: the input is a string that contains a normal mathematical operation like 5+3*4. Functions are also possible, i.e. min(5,A*2). This string is already tokenized, and now I want to parse it using stacks (so no AST). I first used the Shunting Yard Algorithm, but here my main problem arise:
Suppose you have this (tokenized) string: min(1,2,3,+) which is obviously invalid syntax. However, SYA turns this into the output stack 1 2 3 + min(, and hopefully you see the problem coming. When parsing from left to right, it sees the + first, calculating 2+3=5, and then calculating min(1,5), which results in 1. Thus, my algorithm says this expression is completely fine, while it should throw a syntax error (or something similar).
What is the best way to prevent things like this? Add a special delimiter (such as the comma), use a different algorithm, or what?
In order to prevent this issue, you might have to keep track of the stack depth. The way I would do this (and I'm not sure it is the "best" way) is with another stack.
The new stack follows these rules:
When an open parentheses, (, or function is parsed, push a 0.
Do this in case of nested functions
When a closing parentheses, ), is parsed, pop the last item off and add it to the new last value on the stack.
The number that just got popped off is how many values were returned by the function. You probably want this to always be 1.
When a comma or similar delimiter is parsed, pop from the stack, add that number to the new last element, then push a 0.
Reset so that we can begin verifying the next argument of a function
The value that just got popped off is how many values were returned by the statement. You probably want this to always be 1.
When a number is pushed to the output, increment the top element of this stack.
This is how many values are available in the output. Numbers increase the number of values. Binary operators need to have at least 2.
When a binary operator is pushed to the output, decrement the top element
A binary operator takes 2 values and outputs 1, thus reducing the overall number of values left on the output by 1.
In general, an n-ary operator that takes n values and returns m values should add (m-n) to the top element.
If this value ever becomes negative, throw an error!
This will find that the last argument in your example, which just contains a +, will decrement the top of the stack to -1, automatically throwing an error.
But then you might notice that a final argument in your example of, say, 3+ would return a zero, which is not negative. In this case, you would throw an error in one of the steps where "you probably want this to always be 1."
From Go Spec:
If map entries are created during iteration, that entry may be produced during the iteration or may be skipped.
So what I expect from that statement is that the following code should at least print number 1, and how many more numbers which are going to be printed is not predictable and is different each time you run the program:
package main
import (
"fmt"
)
func main() {
test := make(map[int]int)
test[1] = 1
j := 2
for i, v := range test {
fmt.Println(i, v)
test[j] = j
j++
}
return
}
Go playground link
On my own laptop (Go version 1.8) at maximum it prints till 8, in playground (still version 1.8) it prints exactly till 3!
I don't care much about the result from playground since its go is not vanilla but I wonder why on my local it never prints more than 8? even I tried to add more items in each iteration to make the possibility of going over 8 higher but there's no difference.
EDIT: my own explanation based on #Schwern 's answer
when the map is created with make function and without any size parameter only 1 bucket is assigned and in go each bucket has a size of 8 elements, so when the range starts it sees that the map has only 1 bucket and it will iterate at maximum 8 times. If I use a size parameter bigger than 7 like make(map[int]int, 8) two buckets is created and there would be possibility that I get more than 8 iterations over the added items.
This is an issue inherent in the design of most hash tables. Here's a simple explanation hand waving a lot of unnecessary detail.
Under the hood, a hash table is an array. Each key is mapped onto an element in the array using a hash function. For example, "foo" might map to element 8, "bar" might map to element 4, and so on. Some elements are empty.
for k,v := range hash iterates through this array in whatever order they happen to appear. The ordering is unpredictable to avoid a collision attack.
When you add to a hash, it adds to the underlying array. It might even have to allocate a new, larger array. It's unpredictable where that new key will land in the hash's array.
So if you add more pairs while you're iterating through the hash, any pair that gets put into the array before the current index won't be seen; the iteration has already past that point. Anything that gets put after might be seen; the iteration has yet to reach that point, but the array might get reallocated and the pairs possibly rehashed.
but I wonder why on my local it never prints more than 8
Because the underlying array is probably of length 8. Go likely allocates the underlying array in powers of 2 and probably starts at 8. The range hash probably starts by checking the length of the underlying array and will not go further, even if it's grown.
Long story short: don't add keys to a hash while iterating through it.
In my university notes the pseudocode of Build Heap is written almost like this (Only difference were parenthesis I have brackets):
And I searched on the internet and there are several like this:
But shouldn't be something like that?
BuildHeap(A) {
heapsize <- length[A]
for i <- floor(length[A]/2) downto 1
Heapify(A,i)
}
Why they writing heap_size[A] = length[A]?
If you have many heaps, A, B, C. And only one variable heap-size, How will you remember the sizes of all the heaps? You will have an attribute heap-size for all the heaps.
In many pseudocode the attributes of an object O are written as Attriubute[O] or Attribute(O) , (Sometimes they are also written as O.attribute ).
The first example assumes that you are storing the heap size of a particular heap as an attribute of the heap.
The second example might be storing the heap size in a local variable which gets its value from the length attribute (Length[A]) of the heap.
Here is a text about pseudocode from Introduction To Algorithms:
Compound data are typically organized into objects, which are comprised of attributes or fields . A particular field is accessed using the field name followed by the name of its object in square brackets. For example, we treat an array as an object with the attribute length indicating how many elements it contains. To specify the number of elements in an array A, we write length[A]. Although we use square brackets for both array indexing and object attributes, it will usually be clear from the context which interpretation is intended
just wondering what the subtle difference between an array and a range is. I came across an example where I have x = *(1..10) output x as an array and *(1..10) == (1..10).to_a throws an error. This means to me there is a subtle difference between the two and I'm just curious what it is.
Firstly, when you're not in the middle of an assignment or parameter-passing, *(1..10) is a syntax error because the splat operator doesn't parse that way. That's not really related to arrays or ranges per se, but I thought I'd clear up why that's an error.
Secondly, arrays and ranges are really apples and oranges. An array is an object that's a collection of arbitrary elements. A range is an object that has a "start" and an "end", and knows how to move from the start to the end without having to enumerate all the elements in between.
Finally, when you convert a range to an array with to_a, you're not really "converting" it so much as you're saying, "start at the beginning of this range and keep giving me elements until you reach the end". In the case of "(1..10)", the range is giving you 1, then 2, then 3, and so on, until you get to 10.
One difference is that ranges do not separately store every element in itself, unlike an array.
r = (1..1000000) # very fast
r.to_a # sloooooow
You lose the ability to index to an arbitrary point, however.