I am writing a golang program that has to perform multiple download requests at a time utilizing goroutines running in parallel (using GOMAXPROCS). In addition, there is a form of state kept, which is which components have been downloaded and which components are left to be downloaded. The mutex solution would be to lock this structure keeping track of which components have been successfully downloaded. I have read that when attempting to keep state, mutexes are the best option.
However, I am wondering what would be a solution utilizing channels (passing ownership instead of providing exclusive access to state) instead of mutexes, or are mutexes the best option?
P.S.
So far I have thought of passing the global structure keeping state between go routines which are all utilizing one channel (a read-write channel). A go routine attempts to read the structure from the channel and then write it back when it's done.The problem I found with this, is that when the last running go routine [assume all others have finished and stopped running] gives up its posession of the structure by writing to the channel, it will result in deadlock since there are no receivers. In addition, this is still attempting to use channels as mutexes [attempting to provide exclusive access].
Related
I could not find anything about this question except this explanation by Wikipedia https://en.wikipedia.org/wiki/Channel_(programming). But I'm not satisfied with the explanation.
What problem do channels solve?
Why don't we just use normal variables to send and receive data instead?
If by "normal variables" you mean, for example, a slice that multiple goroutines write to and read from, then this is a guaranteed way to get data races (you don't want to get data races). You can avoid concurrent access by using some kind of synchronization (such as Mutex or RWLock).
At this point, you
reinvented channels (which are basically that, a slice under a mutex)
spent more time than you needed to and still your solution is inferior (there's no syntax support, you can't use your slices in select, etc.)
Channels solve the problem of concurrent read and write. Basically, prevent the situation when one goroutine reads a variable and another one writes the same variable.
Also channels may have buffer, so you can write several values before locking.
Of course, you don't have to use channels. There are other ways to send data between goroutines. For example, you can use atomic operations when assigning or reading a value from a shared variable, or use mutex whenever you access it.
I understand that with MSI, if we have a piece of memory in shared state, even if no one else uses it, we would have to broadcast that we are moving to modified. This is a problem that MESI fixes.
However, when we do use MESI, when moving from invalid to exclusive, we need to broadcast that we want to read this, and wait if there are not HIT reponses. How is this any better?
I need to read it (from memory) and then we do the things I said. But it seems like I don't save myself any broadcasting, I just do it when getting my stuff instead of doing it when I move from shared to modified.
Consider the case where you load first, then store. With MSI you'd read into Shared, then need to go off-core again to get exclusive ownership before committing a store.
With MESI you read into Exclusive state for the pure load, and then flipping to Modified is local; no off-core communication.
Turns out this is the example Wikipedia gives in https://en.wikipedia.org/wiki/MESI_protocol#Advantages_of_MESI_over_MSI
As tile, I am referring to Go package sync.Map, can its functions be considered as atomic? Mainly the Load, Store, LoadOrStore, and Delete function.
I also build a simple example go playground, is it guaranteed that only one goroutine can enter the code range line 15 - 17? As my test seems it can be guaranteed.
Please help to explain.
The godoc page for the sync package says: "Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination."
This statement guarantees that there's no need for additional mutexes or synchronization across goroutines. I wouldn't call that claim "atomic" (which has a very precise meaning), but it does mean that you don't have to worry about multiple goroutines being able to enter a LoadOrStore block (with the same key) like in your example.
To profile my app, I want to know how many goroutines are waiting to write to or read from a channel; I can't find anything relevant in the reflect package.
I can maintain an explicit counter of course, but I'd expect golang runtime to know that, so I try to avoid reinventing the wheel.
So, is there a way to do that without maintaining the counter manually?
To track overall load you are probably looking for runtime.NumGoroutine()
https://golang.org/pkg/runtime/#NumGoroutine
Though it's not exactly number of just blocked Go routines it should be very close to it and not exceed the runtime.NumGoroutine() - GOMAXPROCS
For tracking Go routines per channel you can do next:
Use https://golang.org/pkg/runtime/pprof/#Do to mark routines with a specific channel.
Use http/pprof to get information on current profile and parse output - see this answer for details https://stackoverflow.com/a/38414527/1975086. Or maybe you can look into http/pprof and find out how it gets the information so you can get it within your app in typed way.
More specifically, in my case I have a webserver and a globally-accesible struct that the web server uses to generate a page. I have another Goroutine that's always updating that struct with new values periodically. Will this cause issues? Do I need to implement a mechanism to ensure it's not reading while it's being updated?
No, that is the very definition of not safe, and would be caught by the race detector if you tested it. You will absolutely need to synchronize access, for example using sync.Mutex or sync.RWMutex.
If it is not critical to always have the latest values, you can also allow each goroutine to cache a copy of the struct, and then regularly update their copy from the "master" copy every so often. If there is frequent access to the struct, this can help avoid some performance issues due to lock contention.