Making maps in go before anything - go

I am following the go tour and something bothered me.
Maps must be created with make (not new) before use
Fair enough:
map = make(map[int]Cats)
However the very next slide shows something different:
var m = map[string]Vertex{
"Bell Labs": Vertex{
40.68433, -74.39967,
},
"Google": Vertex{
37.42202, -122.08408,
},
}
This slide shows how you can ignore make when creating maps
Why did the tour say maps have to be created with make before they can be used? Am I missing something here?

Actually the only reason to use make to create a map is to preallocate a specific number of values, just like with slices (except you can't set a cap on a map)
m := map[int]Cats{}
s := []Cats{}
//is the same as
m := make(map[int]Cats)
s := make([]Cats, 0, 0)
However if you know you will have a minimum of X amount of items in a map you can do something like:
m := make(map[int]Cats, 100)// this will speed things up initially
Also check http://dave.cheney.net/2014/08/17/go-has-both-make-and-new-functions-what-gives

So they're actually right that you always need to use make before using a map. The reason it looks like they aren't in the example you gave is that the make call happens implicitly. So, for example, the following two are equivalent:
m := make(map[int]string)
m[0] = "zero"
m[1] = "one"
// Equivalent to:
m := map[int]string{
0: "zero",
1: "one",
}
Make vs New
Now, the reason to use make vs new is slightly more subtle. The reason is that new only allocates space for a variable of the given type, whereas make actually initializes it.
To give you a sense of this distinction, imagine we had a binary tree type like this:
type Tree struct {
root *node
}
type node struct {
val int
left, right *node
}
Now you can imagine that if we had a Tree which was allocated and initialized and had some values in it, and we made a copy of that Tree value, the two values would point to the same underlying data since they'd both have the same value for root.
So what would happen if we just created a new Tree without initializing it? Something like t := new(Tree) or var t Tree? Well, t.root would be nil, so if we made a copy of t, both variables would not point to the same underlying data, and so if we added some elements to the Tree, we'd end up with two totally separate Trees.
The same is true of maps and slices (and some others) in Go. When you make a copy of a slice variable or a map variable, both the old and the new variables refer to the same underlying data, just like an array in Java or C. Thus, if you just use new, and then make a copy and initialize the underlying data later, you'll have two totally separate data structures, which is usually not what you want.

Related

Is there a bug in handling slices with references in Go?

I'm trying to build a new list of structs that contains references to items that exist in another slice. It's easier to understand if you see it, so I've prepared a snippet that you can run.
I have a list (dummylist) of two points (Cartesian coordinates) that I want to parse to build a new list (mylist) with items having some features (in the example, X > 80). I've defined two points: {X:90.0, Y:50.0} and {X:20.0 , Y:30.0}. I expect that mylist will contain {X:90.0, Y:50.0}, instead at the end there is {X:20.0 , Y:30.0}. With some print here and there I can verify that the algorithm is working fine (it enters in the "if" condition in the right case), but, at the end, "mylist" contains the wrong element.
package main
import(
"fmt"
)
func main() {
type point struct {
X float64
Y float64
}
type pointsList []point
type pointContainer struct {
Point *point
}
type pointContainerList []pointContainer
// Prepare a slice with two elements
dummylist := new(pointsList)
*dummylist = append(*dummylist, point{X:90.0, Y:50.0})
*dummylist = append(*dummylist, point{X:20.0 , Y:30.0})
// My empty list
mylist := new(pointContainerList)
fmt.Println(fmt.Sprintf("---- At the beginning, mylist contains %d points", len(*mylist)))
// Filter the initial list to take only elements
for _, pt := range *dummylist {
fmt.Println("\n---- Evaluating point ", pt)
if pt.X > 80 {
fmt.Println("Appending", pt)
*mylist = append(*mylist, pointContainer{Point: &pt})
fmt.Println("Inserted point:", (*mylist)[0].Point, "len = ", len(*mylist))
}
}
// mylist should contain {X:90.0, Y:50.0}, instead...
fmt.Println(fmt.Sprintf("\n---- At the end, mylist contains %d points", len(*mylist)))
fmt.Println("Content of mylist:", (*mylist)[0].Point)
}
Here you can run the code:
https://play.golang.org/p/AvrC3JJBLdT
Some helpful consideration:
I've seen through multiple tests that, at the end, mylist contains the last parsed item in the loop. I think there is a problem with references. It's like if the inserted item in the list (in the first iteration) is dependent on the "pt" of other iterations. Instead, if I use indexes (for i, pt := range *dummylist and (*dummylist)[i]), everything works fine.
Before talking about bugs in Golang... am I missing something?
Yes, you're missing something. On this line:
*mylist = append(*mylist, pointContainer{Point: &pt})
you're putting the address of the loop variable &pt into your structure. As the loop continues, the value of pt changes. (Or to put it another way, &pt will be the same pointer for each iteration of the loop).
From the go language specification:
...
The iteration values are assigned to the respective iteration
variables as in an assignment statement.
The iteration variables may be declared by the "range" clause using a
form of short variable declaration (:=). In this case their types are
set to the types of the respective iteration values and their scope is
the block of the "for" statement; they are re-used in each iteration.
If the iteration variables are declared outside the "for" statement,
after execution their values will be those of the last iteration.
One solution would be to create a new value, but I'm not sure what you're gaining from so many pointers: []point would probably be more effective (and less error-prone) than a pointer to a slice of structs of pointers to points.

Is there a way to delete first element from map?

Can I delete the first element in map? It is possible with slices slice = append(slice, slice[1:]...), but can I do something like this with maps?
Maps being hashtables don't have a specified order, so there's no way to delete keys in a defined order, unless you track keys in a separate slice, in the order you're adding them, something like:
type orderedMap struct {
data map[string]int
keys []string
mu *sync.RWMutex
}
func (o *orderedMap) Shift() (int, error) {
o.mu.Lock()
defer o.mu.Unlock()
if len(o.keys) == 0 {
return 0, ErrMapEmpty
}
i := o.data[o.keys[0]]
delete(o.data, o.keys[0])
o.keys = o.keys[1:]
return i, nil
}
Just to be unequivocal about why you can't really delete the "first" element from a map, let me reference the spec:
A map is an unordered group of elements of one type, called the element type, indexed by a set of unique keys of another type, called the key type. The value of an uninitialized map is nil.
Added the emphasis on the fact that map items are unordered
Using a slice to preserve some notion of the order of keys is, fundamentally, flawed, though. Given operations like this:
foo := map[string]int{
"foo": 1,
"bar": 2,
}
// a bit later:
foo["foo"] = 3
Is the index/key foo now updated, or reassigned? Should it be treated as a new entry, appended to the slice if keys, or is it an in-place update? Things get muddled really quickly. The simple fact of the matter is that the map type doesn't contain an "order" of things, trying to make it have an order quickly devolves in a labour intensive task where you'll end up writing your own type.
As I said earlier: it's a hashtable. Elements within get reshuffled behind the scenes if the hashing algorithm used for the keys produces collisions, for example. This question has the feel of an X-Y problem: why do you need the values in the map to be ordered? Maybe a map simply isn't the right approach for your particular problem.

Randomly selecting elements from slices produced by a map in restricted key range in golang. Is there an O(1) shortcut?

In my program to simulate many-particle evolution, I have a map that takes a key value pop (the population size) and returns a slice containing the sites that have this population: myMap[pop][]int. These slices are generically quite large.
At each evolution step I choose a random population size RandomPop. I would then like to randomly choose a site that has a population of at least RandomPop. The sitechosen is used to update my population structures and I utilize a second map to efficiently update myMap keys. My current (slow) implementation looks like
func Evolve( ..., myMap map[int][]int ,...){
RandomPop = rand.Intn(rangeofpopulation)+1
for i:=RandPop,; i<rangeofpopulation;i++{
preallocatedslice=append(preallocatedslice,myMap[i]...)
}
randomindex:= rand.Intn(len(preallocatedslice))
sitechosen= preallocatedslice[randomindex]
UpdateFunction(site)
//reset preallocated slice
preallocatedslice=preallocatedslice[0:0]
}
This code (obviously) hits a huge bottle-neck when copying values from the map to preallocatedslice, with runtime.memmove eating 87% of my CPU usage. I'm wondering if there is an O(1) way to randomly choose an entry contained in the union of slices indicated by myMap with key values between 0 and RandomPop ? I am open to packages that allow you to manipulate custom hashtables if anyone is aware of them. Suggestions don't need to be safe for concurrency
Other things tried: I previously had my maps record all sites with values of at least pop but that took up >10GB of memory and was stupid. I tried stashing pointers to the relevant slices to make a look-up slice, but go forbids this. I could sum up the lengths of each slice and generate a random number based on this and then iterate through the slices in myMap by length, but this is going to be much slower than just keeping an updated cdf of my population and doing a binary search on it. The binary search is fast, but updating the cdf, even if done manually, is O(n). I was really hoping to abuse hashtables to speed up random selection and update if possible
A vague thought I have is concocting some sort of nested structure of maps pointing to their contents and also to the map with a key one less than theirs or something.
I was looking at your code and I have a question.
Why do you have to copy values from the map to the slice? I mean, I think that I am following the logic behind... but I wonder if there is a way to skip that step.
So we have:
func Evolve( ..., myMap map[int][]int ,...){
RandomPop = rand.Intn(rangeofpopulation)+1
for i:=RandPop,; i<rangeofpopulation;i++{
// slice of preselected `sites`. one of this will be 'siteChosen'
// we expect to have `n sites` on `preAllocatedSlice`
// where `n` is the amount of iterations,
// ie; n = rangeofpopulation - RandPop
preallocatedslice=append(preallocatedslice,myMap[i]...)
}
// Once we have a list of sites, we select `one`
// under a normal distribution every site ha a chance of 1/n to be selected.
randomindex:= rand.Intn(len(preallocatedslice))
sitechosen= preallocatedslice[randomindex]
UpdateFunction(site)
...
}
But what if we change that to:
func Evolve( ..., myMap map[int][]int ,...){
if len(myMap) == 0 {
// Nothing to do, print a log!
return
}
// This variable will hold our site chosen!
var siteChosen []int
// Our random population size is a value from 1 to rangeOfPopulation
randPopSize := rand.Intn(rangeOfPopulation) + 1
for i := randPopSize; i < rangeOfPopulation; i++ {
// We are going to pretend that the current candidate is the siteChosen
siteChosen = myMap[i]
// Now, instead of copying `myMap[i]` to preAllocatedSlice
// We will test if the current candidate is actually the 'siteChosen` here:
// We know that the chances for an specific site to be the chosen is 1/n,
// where n = rangeOfPopulation - randPopSize
n := float64(rangeOfPopulation - randPopSize)
// we roll the dice...
isTheChosenOne := rand.Float64() > 1/n
if isTheChosenOne {
// If the candidate is the Chosen site,
// then we don't need to iterate over all the other elements.
break
}
}
// here we know that `siteChosen` is a.- a selected candidate, or
// b.- the last element assigned in the loop
// (in the case that `isTheChosenOne` was always false [which is a probable scenario])
UpdateFunction(siteChosen)
...
}
Also if you want to can calculate n, or 1/n outside the loop.
So the idea is testing inside the loop if the candidate is the siteChosen, and avoid copying the candidates to this preselection pool.

Why should I use the & sign on structs?

In the gotour, there is a section: struct literals.
package main
import "fmt"
type Vertex struct {
X, Y int
}
var (
v1 = Vertex{1, 2} // has type Vertex
v2 = Vertex{X: 1} // Y:0 is implicit
v3 = Vertex{} // X:0 and Y:0
p = &Vertex{1, 2} // has type *Vertex
)
func main() {
fmt.Println(v1, p, v2, v3)
}
What's the difference between a struct with an ampersand and the one without? I know that the ones with ampersands point to the same reference, but why should I use them over the regular ones?
var p = &Vertex{} // why should I use this
var c = Vertex{} // over this
It's true that p (&Vertex{}) has type *Vertex and that c (Vertex{}) has type Vertex.
However, I don't believe that statement really answers the question of why one would choose one over the other. It'd be kind of like answering the question "why use planes over cars" with something like "planes are winged, cars aren't." (Obviously we all know what wings are, but you get the point).
But it also wouldn't be very helpful to simply say "go learn about pointers" (though I think it is a really good idea to so nonetheless).
How you choose basically boils down to the following.
Realize that &Vertex{} and Vertex{} are fundamentally initialized in the same way.
There might be some low-level memory allocation differences (i.e. stack vs heap), but you really should just let the compiler worry about these details
What makes one more useful and performant than the other is determined by how they are used in the program.
"Do I want a pointer to the struct (p), or just the struct (c)?"
Note that you can get a pointer using the & operator (e.g. &c); you can dereference a pointer to get the value using * (e.g. *p)
So depending on what you choose, you may end up doing a lot of *p or &c
Bottom-line: create what you will use; if you don't need a pointer, don't make one (this will help more with "optimizations" in the long run).
Should I use Vertex{} or &Vertex{}?
For the Vertex given in your example, I'd choose Vertex{} to a get a simple value object.
Some reasons:
Vertex in the example is pretty small in terms of size. Copying is cheap.
Garbage collection is simplified, garbage creation may be mitigated by the compiler
Pointers can get tricky and add unnecessary cognitive load to the programmer (i.e. gets harder to maintain)
Pointers aren't something you want if you ever get into concurrency (i.e. it's unsafe)
Vertex doesn't really contain anything worth mutating (just return/create a new Vertex when needed).
Some reasons why you'd want &Struct{}
If Struct has a member caching some state that needs to be changed inside the original object itself.
Struct is huge, and you've done enough profiling to determine that the cost of copying is significant.
As an aside: you should try to keep your structs small, it's just good practice.
You find yourself making copies by accident (this is a bit of a stretch): v := Struct{}
v2 := v // Copy happens
v := &Struct{}
v2 := v // Only a pointer is copied
The comments pretty much spell it out:
v1 = Vertex{1, 2} // has type Vertex
p = &Vertex{1, 2} // has type *Vertex
As in many other languages, & takes the address of the identifier following it. This is useful when you want a pointer rather than a value.
If you need to understand more about pointers in programming, you could start with this for go, or even the wikipedia page.

Go error: non-constant array bound

I'm trying to calculate the necessary length for an array in a merge sort implementation I'm writing in go. It looks like this:
func merge(array []int, start, middle, end int) {
leftLength := middle - start + 1
rightLength := end - middle
var left [leftLength]int
var right [rightLength]int
//...
}
I then get this complaint when running go test:
./mergesort.go:6: non-constant array bound leftLength
./mergesort.go:7: non-constant array bound rightLength
I assume go does not enjoy users instantiating an Array's length with a calculated value. It only accepts constants. Should I just give up and use a slice instead? I expect a slice is a dynamic array meaning it's either a linked list or copies into a larger array when it gets full.
You can't instantiate an array like that with a value calculated at runtime. Instead use make to initialize a slice with the desired length. It would look like this;
left := make([]int, leftLength)

Resources