How to use channels to safely synchronise data in Go - go

Below is an example of how to use mutex lock in order to safely access data. How would I go about doing the same with the use of CSP (communication sequential processes) instead of using mutex lock’s and unlock’s?
type Stack struct {
top *Element
size int
sync.Mutex
}
func (ss *Stack) Len() int {
ss.Lock()
size := ss.size
ss.Unlock()
return size
}
func (ss *Stack) Push(value interface{}) {
ss.Lock()
ss.top = &Element{value, ss.top}
ss.size++
ss.Unlock()
}
func (ss *SafeStack) Pop() (value interface{}) {
ss.Lock()
size := ss.size
ss.Unlock()
if size > 0 {
ss.Lock()
value, ss.top = ss.top.value, ss.top.next
ss.size--
ss.Unlock()
return
}
return nil
}

If you actually were to look at how Go implements channels, you'd essentially see a mutex around an array with some additional thread handling to block execution until the value is passed through. A channel's job is to move data from one spot in memory to another with ease. Therefore where you have locks and unlocks, you'd have things like this example:
func example() {
resChan := make(int chan)
go func(){
resChan <- 1
}()
go func(){
res := <-resChan
}
}
So in the example, the first goroutine is blocked after sending the value until the second goroutine reads from the channel.
To do this in Go with mutexes, one would use sync.WaitGroup which will add one to the group on setting the value, then release it from the group and the second goroutine will lock and then unlock the value.
The oddities in your example are 1 no goroutines, so it's all happening in a single main goroutine and the locks are being used more traditionally (as in c thread like) so channels won't really accomplish anything. The example you have would be considered an anti-pattern, like the golang proverb says "Don't communicate by sharing memory, share memory by communicating."

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

Concurrent queue which returns channels, locking doubts

There is queue of not important structs Message, which has the classic push and pop methods:
type Queue struct {
messages list.List
}
//The implementation is not relevant for the sake of the question
func (q *Queue) Push(msg Message) { /*...*/ }
func (q *Queue) Pop() (Message, bool) { /*...*/ }
/*
* NewTimedChannel runs a goroutine which pops a message from the queue every
* given time duration and sends it over the returned channel
*/
func (q *Queue) NewTimedChannel(t time.Duration) (<-chan Message) {/*...*/}
The client of the Push function will be a web gui in which users will post their messages.
The client of the channel returned by NewTimedChannel will be a service which sends each message to a not relevant endpoint over the network.
I'm a newbie in concurrency and go and I have the following question:
I know that since Queue.messages is a shared state between the main goroutine which deals with pushing the message after the user submit a web form and the ones created for each NewTimedChannel invocation, I need to lock it.
Do I need to lock and unlock using the sync.Mutex in all the Push, Pop and NewTimedChannel methods?
And is there a more idiomatic way to handle this specific problem in the go environment?
As others have pointed out, it requires synchronization or there will be a data race.
There is a saying in Go, "Don't communicate by sharing memory, share memory by communicating." As in this case, I think an idomatic way is to make channels send to a seprate goroutine which synchronize all the operations together using select. The code can easily be extended by adding more channels to support more kinds of operations (like the timed channel in your code which I don't fully understand what does it do), and by using select and other utils, it can easily handle more complex synchronizing by using locks. I write some sample code:
type SyncQueue struct {
Q AbsQueue
pushCh,popMsgCh chan Message
popOkCh chan bool
popCh chan struct{}
}
// An abstract of the Queue type. You can remove the abstract layer.
type AbsQueue interface {
Push(Message)
Pop() (Message,bool)
}
func (sq SyncQueue) Push(m Message) {
sq.pushCh <- m
}
func (sq SyncQueue) Pop() (Message,bool) {
sq.popCh <- struct{}{} // send a signal for pop. struct{}{} cost no memory at all.
return <-sq.popMsgCh,<-sq.popOkCh
}
// Every pop and push get synchronized here.
func (sq SyncQueue) Run() {
for {
select {
case m:=<-pushCh:
Q.Push(m)
case <-popCh:
m,ok := Q.Pop()
sq.popMsgCh <- m
sq.popOkCh <- ok
}
}
}
func NewSyncQueue(Q AbsQueue) *SyncQueue {
sq:=SyncQueue {
Q:Q,
pushCh: make(chan Message),popMsgCh: make(chan Message),
pushOkCh: make(chan bool), popCh: make(chan struct{}),
}
go sq.Run()
return &sq
}
Note that for simpilicity, I did not use a quit channel or a context.Context, so the goroutine of sq.Run() has no way of exiting and would cause a memory leak.
Do I need to lock and unlock using the sync.Mutex in all the Push, Pop and NewTimedChannel methods?
Yes.
And is there a more idiomatic way to handle this specific problem in
the go environment?
For insight, have a look at the last answer for this question:
How do I (succinctly) remove the first element from a slice in Go?

Golang goroutine cannot use function return value(s)

I wish I could do something like the following code with golang:
package main
import (
"fmt"
"time"
)
func getA() (int) {
fmt.Println("getA: Calculating...")
time.Sleep(300 * time.Millisecond)
fmt.Println("getA: Done!")
return 100
}
func getB() (int) {
fmt.Println("getB: Calculating...")
time.Sleep(400 * time.Millisecond)
fmt.Println("getB: Done!")
return 200
}
func main() {
A := go getA()
B := go getB()
C := A + B // waits till getA() and getB() return
fmt.Println("Result:", C)
fmt.Println("All done!")
}
More specifically, I wish Go could handle concurrency behind the scene.
This might be a bit off topic, but I am curious what people think of having such implicit concurrency support. Is it worth to put some effort on it? and what are the potential difficulties and drawbacks?
Edit:
To be clear, the question is not about "what Go is doing right now?", and not even "how it is implemented?" though I appreciate #icza post on what exactly we should expect from Go as it is right now. The question is why it does not or is not capable of returning values, and what are the potential complications of doing that?
Back to my simple example:
A := go getA()
B := go getB()
C := A + B // waits till getA() and getB() return
I do not see any issues concerning the scope of variables, at least from the syntax point of view. The scope of A, B, and C is clearly defined by the block they are living inside (in my example the scope is the main() function). However, a perhaps more legitimate question would be if those variables (here A and B) are "ready" to read from? Of course they should not be ready and accessible till getA() and getB() are finished. In fact, this is all I am asking for: The compiler could implement all the bell and whistles behind the scene to make sure the execution will be blocked till A and B are ready to consume (instead of forcing the programmer to explicitly implement those waits and whistles using channels).
This could make the concurrent programming much simpler, especially for the cases where computational tasks are independent of each other. The channels still could be used for explicit communication and synchronization, if really needed.
But this is very easily and idiomatically doable. The language provides the means: Channel types.
Simply pass a channel to the functions, and have the functions send the result on the channel instead of returning them. Channels are safe for concurrent use.
Only one goroutine has access to the value at any given time. Data races cannot occur, by design.
For more, check out the question: If I am using channels properly should I need to use mutexes?
Example solution:
func getA(ch chan int) {
fmt.Println("getA: Calculating...")
time.Sleep(300 * time.Millisecond)
fmt.Println("getA: Done!")
ch <- 100
}
func getB(ch chan int) {
fmt.Println("getB: Calculating...")
time.Sleep(400 * time.Millisecond)
fmt.Println("getB: Done!")
ch <- 200
}
func main() {
cha := make(chan int)
chb := make(chan int)
go getA(cha)
go getB(chb)
C := <-cha + <-chb // waits till getA() and getB() return
fmt.Println("Result:", C)
fmt.Println("All done!")
}
Output (try it on the Go Playground):
getB: Calculating...
getA: Calculating...
getA: Done!
getB: Done!
Result: 300
All done!
Note:
The above example can be implemented with a single channel too:
func main() {
ch := make(chan int)
go getA(ch)
go getB(ch)
C := <-ch + <-ch // waits till getA() and getB() return
fmt.Println("Result:", C)
fmt.Println("All done!")
}
Output is the same. Try this variant on the Go Playground.
Edit:
The Go spec states that the return values of such functions are discarded. More on this: What happens to return value from goroutine.
What you propose bleeds from multiple wounds. Each variable in Go has a scope (in which they can be referred to). Accessing variables do not block. Execution of statements or operators may block (e.g. Receive operator or Send statement).
Your proposal:
go A := getA()
go B := getB()
C := A + B // waits till getA() and getB() return
What is the scope of A and B? 2 reasonable answer would be a) from the go statement or after the go statement. Either way we should be able to access it after the go statement. After the go statement they would be in scope, reading/writing their values should not block.
But then if C := A + B would not block (because it is just reading a variable), then either
a) at this line A and B should already be populated which means the go statement would need to wait for getA() to complete (but then it defeats the purpose of go statement)
b) or else we would need some external code to synchronize but then again we don't gain anything (just make it worse compared to the solution with channels).
Do not communicate by sharing memory; instead, share memory by communicating.
By using channels, it is crystal clear what (may) block and what not. It is crystal clear that when a receive from the channel completes, the goroutine is done. And it gives us the mean to execute the receive whenever we're want to (at the point when its value is needed and we're willing to wait for it), and also the mean to check if the value is ready without blocking (comma-ok idiom).

Design patterns for map channel?

I am writing a DNS protocol parser in golang, the idea is to use a map like this
var tidMap map[uint16] (chan []byte)
So for the tidMap map, key is the tid (transaction ID), value is a byte array channel.
The idea is that a goroutine will try get value from the channel, another goroutine will try read bytes by listening every imcoming packet, and once found transaction ID, will set response data to the tidMap, so the former goroutine will continue handle the response.
One problem with the design is that I need the make sure the channel has buffer length of 1, so extra values can be pushed into channel without blocking.
So how can I specify channel buffer length in tidMap declaration?
var tidMap map[int] make(chan int, 1)
You can't use make() there.
The length of the channel buffer doesn't convey type, so you will have to add logic to test if the map entry exists, if it doesn't:
tidMap[0] = make(chan int, 1)
The short answer: you can't. When you make a map, you define the data types of its keys and values, and the capacity of a channel is not part of its type.
The longer answer is: create an abstract data type that hides this implementation detail. Something like this:
type ChannelMap struct {
tidMap map[int](chan []byte)
}
func NewChannelMap() *ChannelMap { ... }
func (c *ChannelMap) Put(tid int) (chan int) {
res := make(chan int, 1)
c.tidMap[tid] = res
return res
}
func (c *ChannelMap) Get(tid int) (chan int) {
return c.tidMap[tid]
}
And just to be sure: giving the channel a capacity of 1 does not ensure that senders will never block; if your channel consumers are too slow, producers can fill the channel up to its capacity and will then block until the channel has room again.

More idiomatic way of adding channel result to queue on completion

So, right now, I just pass a pointer to a Queue object (implementation doesn't really matter) and call queue.add(result) at the end of goroutines that should add things to the queue.
I need that same sort of functionality—and of course doing a loop checking completion with the comma ok syntax is unacceptable in terms of performance versus the simple queue add function call.
Is there a way to do this better, or not?
There are actually two parts to your question: how does one queue data in Go, and how does one use a channel without blocking.
For the first part, it sounds like what you need to do is instead of using the channel to add things to the queue, use the channel as a queue. For example:
var (
ch = make(chan int) // You can add an int parameter to this make call to create a buffered channel
// Do not buffer these channels!
gFinished = make(chan bool)
processFinished = make(chan bool)
)
func f() {
go g()
for {
// send values over ch here...
}
<-gFinished
close(ch)
}
func g() {
// create more expensive objects...
gFinished <- true
}
func processObjects() {
for val := range ch {
// Process each val here
}
processFinished <- true
}
func main() {
go processObjects()
f()
<-processFinished
}
As for how you can make this more asynchronous, you can (as cthom06 pointed out) pass a second integer to the make call in the second line which will make send operations asynchronous until the channel's buffer is full.
EDIT: However (as cthom06 also pointed out), because you have two goroutines writing to the channel, one of them has to be responsible for closing the channel. Also, my previous revision would exit before processObjects could complete. The way I chose to synchronize the goroutines is by creating a couple more channels that pass around dummy values to ensure that the cleanup gets finished properly. Those channels are specifically unbuffered so that the sends happen in lock-step.

Resources