How to optimize thread safe queue in Go? - go

I have the following request queue:
type RequestQueue struct {
Requests []*http.Request
Mutex *sync.Mutex
}
func (rq *RequestQueue) Enqueue(req *http.Request) {
rq.Mutex.Lock()
defer rq.Mutex.Unlock()
rq.Requests = append(rq.Requests, req)
}
func (rq *queue) Dequeue() (*http.Request, error) {
rq.Mutex.Lock()
defer rq.Mutex.Unlock()
if len(rq.Requests) == 0 {
return nil, errors.New("dequeue: queue is empty")
}
req := rq.Requests[0]
rq.Requests = rq.Requests[1:]
return req, nil
}
Is it possible to do this with just the atomic package, without Mutex, type being simply type AtomicRequestQueue []*http.Request, and will that bring any performance benefit?

Use a channel, like chan *http.Request. A channel is literally a FIFO queue.
What you call Enqueue will just be a send operation c <- req, and what you call Dequeue will just be a receive operation req := <-c.
Is it possible to do this with just the atomic package
You didn't state what is the real purpose of this thread-safe queue, however the use case you presented above seems to need synchronization, i.e. mutual exclusive access to the shared resource. The types in the atomic package only guarantee that the result of the operation will be observed by other threads in a consistent fashion. There's no mutual exclusiveness involved.
If your queue needs more business logic than you are actually showing, a channel might be too primitive; in that case mutex locking might be your best bet. You may use sync.RWMutex to reduce lock contention if you expect to have a lot of reads.

Related

Go Concurrency Circular Logic

I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel. If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is appended onto the stack slice in the dispatcher and the dispatcher will try to send again later when a worker becomes available, and/or no more messages are received on the dispatchchan. This is because the dispatchchan and the jobchan are unbuffered, and the go routine the workers are running will append other messages to the dispatcher up to a certain point and I don't want the workers blocked waiting on the dispatcher and creating a deadlock. Here's the dispatcher code I've come up with so far:
func dispatch() {
var stack []string
acount := 0
for {
select {
case d := <-dispatchchan:
stack = append(stack, d)
case c := <-mw:
acount = acount + c
case jobchan <-stack[0]:
if len(stack) > 1 {
stack[0] = stack[len(stack)-1]
stack = stack[:len(stack)-1]
} else {
stack = nil
}
default:
if acount == 0 && len(stack) == 0 {
close(jobchan)
close(dispatchchan)
close(mw)
wg.Done()
return
}
}
}
Complete example at https://play.golang.wiki/p/X6kXVNUn5N7
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool. If the worker routine is doing [m]eaningful [w]ork it throws int 1 on the mw channel and when it finishes its work and goes back into the for loop listening to the jobchan it throws int -1 on the mw. This way the dispatcher knows if there's any work being done by the worker pool, or if the pool is idle. If the pool is idle and there are no more messages on the stack, then the dispatcher closes the channels and return control to the main func.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error. What I'm trying to figure out is how to ensure that when I hit that case, either stack[0] has a value in it or not. I don't want that case to send an empty string to the jobchan.
Any help is greatly appreciated. If there's a more idomatic concurrency pattern I should consider, I'd love to hear about it. I'm not 100% sold on this solution but this is the farthest I've gotten so far.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error.
I can't reproduce it with your playground link, but it's believable, because at lest one gofunc worker might have been ready to receive on that channel.
My output has been Msgcnt: 0, which is also easily explained, because gofunc might not have been ready to receive on jobschan when dispatch() runs its select. The order of these operations is not defined.
trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel
A channel needs no dispatcher. A channel is the dispatcher.
If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is [...] will [...] send again later when a worker becomes available, [...] or no more messages are received on the dispatchchan.
With a few creative edits, it was easy to turn that into something close to the definition of a buffered channel. It can be read from immediately, or it can take up to some "limit" of messages that can't be immediately dispatched. You do define limit, though it's not used elsewhere within your code.
In any function, defining a variable you don't read will result in a compile time error like limit declared but not used. This stricture improves code quality and helps identify typeos. But at package scope, you've gotten away with defining the unused limit as a "global" and thus avoided a useful error - you haven't limited anything.
Don't use globals. Use passed parameters to define scope, because the definition of scope is tantamount to functional concurrency as expressed with the go keyword. Pass the relevant channels defined in local scope to functions defined at package scope so that you can easily track their relationships. And use directional channels to enforce the producer/consumer relationship between your functions. More on this later.
Going back to "limit", it makes sense to limit the quantity of jobs you're queueing because all resources are limited, and accepting more messages than you have any expectation of processing requires more durable storage than process memory provides. If you don't feel obligated to fulfill those requests no matter what, don't accept "too many" of them in the first place.
So then, what function has dispatchchan and dispatch()? To store a limited number of pending requests, if any, before they can be processed, and then to send them to the next available worker? That's exactly what a buffered channel is for.
Circular Logic
Who "knows" when your program is done? main() provides the initial input, but you close all 3 channels in `dispatch():
close(jobchan)
close(dispatchchan)
close(mw)
Your workers write to their own job queue so only when the workers are done writing to it can the incoming job queue be closed. However, individual workers also don't know when to close the jobs queue because other workers are writing to it. Nobody knows when your algorithm is done. There's your circular logic.
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool.
There's a race condition here. Consider the case where all n workers have just received the last n jobs. They've each read from jobschan and they're checking the value of ok. disptatcher proceeds to run its select. Nobody is writing to dispatchchan or reading from jobschan right now so the default case is immediately matched. len(stack) is 0 and there's no current job so dispatcher closes all channels including mw. At some point thereafter, a worker tries to write to a closed channel and panics.
So finally I'm ready to provide some code, but I have one more problem: I don't have a clear problem statement to write code around.
I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel.
Channels between goroutines are like the teeth that synchronize gears. But to what end do the gears turn? You're not trying to keep time, nor construct a wind-up toy. Your gears could be made to turn, but what would success look like? Their turning?
Let's try to define a more specific use case for channels: given an arbitrarily long set of durations coming in as strings on standard input*, sleep that many seconds in one of n workers. So that we actually have a result to return, we'll say each worker will return the start and end time the duration was run for.
So that it can run in the playground, I'll simulate standard input with a hard-coded byte buffer.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
"sync"
"time"
)
type SleepResult struct {
worker_id int
duration time.Duration
start time.Time
end time.Time
}
func main() {
var num_workers = 2
workchan := make(chan time.Duration)
resultschan := make(chan SleepResult)
var wg sync.WaitGroup
var resultswg sync.WaitGroup
resultswg.Add(1)
go results(&resultswg, resultschan)
for i := 0; i < num_workers; i++ {
wg.Add(1)
go worker(i, &wg, workchan, resultschan)
}
// playground doesn't have stdin
var input = bytes.NewBufferString(
strings.Join([]string{
"3ms",
"1 seconds",
"3600ms",
"300 ms",
"5s",
"0.05min"}, "\n") + "\n")
var scanner = bufio.NewScanner(input)
for scanner.Scan() {
text := scanner.Text()
if dur, err := time.ParseDuration(text); err != nil {
fmt.Fprintln(os.Stderr, "Invalid duration", text)
} else {
workchan <- dur
}
}
close(workchan) // we know when our inputs are done
wg.Wait() // and when our jobs are done
close(resultschan)
resultswg.Wait()
}
func results(wg *sync.WaitGroup, resultschan <-chan SleepResult) {
for res := range resultschan {
fmt.Printf("Worker %d: %s : %s => %s\n",
res.worker_id, res.duration,
res.start.Format(time.RFC3339Nano), res.end.Format(time.RFC3339Nano))
}
wg.Done()
}
func worker(id int, wg *sync.WaitGroup, jobchan <-chan time.Duration, resultschan chan<- SleepResult) {
var res = SleepResult{worker_id: id}
for dur := range jobchan {
res.duration = dur
res.start = time.Now()
time.Sleep(res.duration)
res.end = time.Now()
resultschan <- res
}
wg.Done()
}
Here I use 2 wait groups, one for the workers, one for the results. This makes sure Im done writing all the results before main() ends. I keep my functions simple by having each function do exactly one thing at a time: main reads inputs, parses durations from them, and sends them off to the next worker. The results function collects results and prints them to standard output. The worker does the sleeping, reading from jobchan and writing to resultschan.
workchan can be buffered (or not, as in this case); it doesn't matter because the input will be read at the rate it can be processed. We can buffer as much input as we want, but we can't buffer an infinite amount. I've set channel sizes as big as 1e6 - but a million is a lot less than infinite. For my use case, I don't need to do any buffering at all.
main knows when the input is done and can close the jobschan. main also knows when jobs are done (wg.Wait()) and can close the results channel. Closing these channels is an important signal to the worker and results goroutines - they can distinguish between a channel that is empty and a channel that is guaranteed not to have any new additions.
for job := range jobchan {...} is shorthand for your more verbose:
for {
job, ok := <- jobchan
if !ok {
wg.Done()
return
}
...
}
Note that this code creates 2 workers, but it could create 20 or 2000, or even 1. The program functions regardless of how many workers are in the pool. It can handle any volume of input (though interminable input of course leads to an interminable program). It does not create a cyclic loop of output to input. If your use case requires jobs to create more jobs, that's a more challenging scenario that can typically be avoided with careful planning.
I hope this gives you some good ideas about how you can better use concurrency in your Go applications.
https://play.golang.wiki/p/cZuI9YXypxI

context without channels in the same thread of execution

Can't figure out how I can cancel a task if it takes to much to time compute in the same thread of execution via context semantics?
I use this example as a reference point
https://golang.org/src/context/context_test.go
The goal here call a doWork, if doWork takes to much time to compute, GetValueWithDeadline should after a timeout return 0, or if caller called cancel that cancel a wait, (here it main is caller) or the value returned in in give a time window.
The same scenario can be done In a different way. ( separate goroutine sleep, wakeup check value etc, condition on a mutex, etc) but I really want to understand the correct way to use context.
The channel semantic I understand but here I can't achieve the desired effect, the default case
call to a doWork fault under default case and sleep.
package main
import (
"context"
"fmt"
"log"
"math/rand"
"sync"
"time"
)
type Server struct {
lock sync.Mutex
}
func NewServer() *Server {
s := new(Server)
return s
}
func (s *Server) doWork() int {
s.lock.Lock()
defer s.lock.Unlock()
r := rand.Intn(100)
log.Printf("Going to nap for %d", r)
time.Sleep(time.Duration(r) * time.Millisecond)
return r
}
// I take an example from here and it very unclear where is do work executed
// https://golang.org/src/context/context_test.go
func (s *Server) GetValueWithDeadline(ctx context.Context) int {
val := 0
select {
case <- time.After(150 * time.Millisecond):
fmt.Println("overslept")
return 0
case <- ctx.Done():
fmt.Println(ctx.Err())
return 0
default:
val = s.doWork()
}
return all
}
func main() {
rand.Seed(time.Now().UTC().UnixNano())
s := NewServer()
for i :=0; i < 10; i++ {
d := time.Now().Add(50 * time.Millisecond)
ctx, cancel := context.WithDeadline(context.Background(), d)
log.Print(s.GetValueWithDeadline(ctx))
cancel()
}
}
Thank you
There are multiple problems with your approach.
What problem contexts solve
First, the primary reason contexts were invented in Go is that they allow to unify an approach to cancellation of a set of tasks.
To explain this concept using a simple example, consider a client request to some sever; to simplify further let it be an HTTP request.
The client connects to the server, sends some data telling the server what to do to fulfill the request and then waits for the server to respond.
Let's now suppose the request requires elaborate and time-consuming processing on the server — for instance, suppose it needs to perform multiple complex queries to multiple remote database engines, do multiple HTTP requests to external services and then process the acquired results to actually produce the data the client wants.
So the client starts its request and the server goes on with all those requests.
To hide latency of individual tasks the server has to perform to fulfill the request, it runs them in separate goroutines.
Once each goroutine completes the assigned task, it communicates its result (and/or an error) back to the goroutine which handles the client's request, and so on.
Now suppose that the client fails to wait for the response to its request for whatever reason — a network outage, an explicit timeout in the client's software, the user kills the app which initiated the request etc, — there are lots of possibilities.
As you can see, there's little sense for the server to continue spending resources to finish the tasks which were logically bound to the now-dead request: there's no one to hear back the result anyway.
So it makes sense to reap those tasks once we know the request is not going to be completed, and that's where contexts come into play: you can associate each incoming request with a single context and then either pass it itself to any goroutine spawned to carry out a single task required to be done to fulfill the request, or derive another request from that and pass it instead.
Then, as soon as you cancel the "root" request, that signal is propagated through the whole tree of requests derived from the root one.
Now each goroutine which were given a context, might "listen" on it to be notified when that cancellation signal is sent, and once the goroutine notices that it might drop whatever it was busy doing and exit.
In terms of actual context.Context type that signal is called "done" — as in "we're done doing whatever that context is assotiated with", — and that's why the goroutine which wants to know it should stop doing its work listens on a special channel returned by the context's method called Done.
Back to your example
To make it work, you'd do something like:
func (s *Server) doWork(ctx context.Context) int {
s.lock.Lock()
defer s.lock.Unlock()
r := rand.Intn(100)
log.Printf("Going to nap for %d", r)
select {
case <- time.After(time.Duration(r) * time.Millisecond):
return r
case <- ctx.Done():
return -1
}
}
func (s *Server) GetValueWithTimeout(ctx context.Context, maxTime time.Duration) int {
d := time.Now().Add(maxTime)
ctx, cancel := context.WithDeadline(ctx, d)
defer cancel()
return s.doWork(ctx)
}
func main() {
const maxTime = 50 * time.Millisecond
rand.Seed(time.Now().UTC().UnixNano())
s := NewServer()
for i :=0; i < 10; i++ {
v := s.GetValueWithTimeout(context.Background(), maxTime)
log.Print(v)
}
}
(Playground).
So what happens here?
The GetValueWithTimeout method accepts the maximum time it should take the doWork method to produce a value, calculates the deadline, derives a context which cancels itself once the deadline passes from the context passed to the method and calls doWork with the new context object.
The doWork method arms its own timer to go off after a random time interval and then listens on both the context and the timer.
This one is the critical point: the code which performs some unit of work which is supposed to be cancellable must check the context to become "done" actively, by itself.
So, in our toy example, either the doWork's own timer fires first or the deadline of the generated context gets reached first; whatever happens first, makes the select statement unblock and proceed.
Note that if your "do the work" code wold be more involved — it would actually do something instead of sleeping, — you would most probably need to check on the context's status periodically, usually after performing invividual bits of that work.

How to do idiomatic synchronization with time.After?

I'm writing an application that queues incoming requests. If a request has been on the queue for more than a certain amount of time, I'd like to throw a timeout. I'm doing that with time.After:
timeoutCh := time.After(5 * time.Second)
select {
case <-timeoutCh:
//throw timeout 504
case <-processing:
//process request
}
The processing channel (along with the request) is put on the queue, and when a request is taken off to be processed, I send a signal to the channel to hit the case statement:
processing <- true
The problem with this is that if timeoutCh has already been selected, the processing channel will block, so I need some way to check whether the request has timed out.
I considered using a shared atomic boolean, but if I do something like this:
case <-timeoutCh:
requestTimedOut = true
and then check the boolean before sending to the processing channel, there's still a race condition, because the timeoutCh case may have been selected, but the bool not yet set to true!
Is there an idiomatic way of dealing with this sort of synchronization problem in Go?
Use a mutex coordinate processing of the data and timeout.
Define a type to hold the mutex, input, result, a channel to signal completion of the work and a flag indicating that the work, if any, is complete.
type work struct {
sync.Mutex
input InputType
result ResultType
signal chan struct {}
done bool
}
The request handler creates and enqueues a work item and waits for a timeout or a signal from the queue processor. Either way, the request handler checks to see if the queue processor did the work and responds as appropriate.
func handler(resp http.ResponseWriter, req *http.Request) {
w := &queueElement{
input: computeInputFromRequest(req)
signal: make(chan struct{})
}
enqueue(w)
// Wait for timeout or for queue processor to signal that the work is complete.
select {
case <-time.After(5 * time.Second):
case <-w.signal:
}
w.Lock()
done := w.done // Record state of the work item.
w.done = true // Mark the work item as complete.
w.Unlock()
if !done {
http.Error(w, "Timeout", http.StatusGatewayTimeout)
} else {
respondWithResult(resp, w.result)
}
}
The queue processor will look something like this:
for {
w := dequeue()
w.Lock()
if !w.done {
w.done = true
w.result = computeResultFromInput(w.input)
close(w.signal)
}
w.Unlock()
}
To ensure that the request handler waits on the result, the queue processor holds the lock while processing the work item.

Concurrent queue which returns channels, locking doubts

There is queue of not important structs Message, which has the classic push and pop methods:
type Queue struct {
messages list.List
}
//The implementation is not relevant for the sake of the question
func (q *Queue) Push(msg Message) { /*...*/ }
func (q *Queue) Pop() (Message, bool) { /*...*/ }
/*
* NewTimedChannel runs a goroutine which pops a message from the queue every
* given time duration and sends it over the returned channel
*/
func (q *Queue) NewTimedChannel(t time.Duration) (<-chan Message) {/*...*/}
The client of the Push function will be a web gui in which users will post their messages.
The client of the channel returned by NewTimedChannel will be a service which sends each message to a not relevant endpoint over the network.
I'm a newbie in concurrency and go and I have the following question:
I know that since Queue.messages is a shared state between the main goroutine which deals with pushing the message after the user submit a web form and the ones created for each NewTimedChannel invocation, I need to lock it.
Do I need to lock and unlock using the sync.Mutex in all the Push, Pop and NewTimedChannel methods?
And is there a more idiomatic way to handle this specific problem in the go environment?
As others have pointed out, it requires synchronization or there will be a data race.
There is a saying in Go, "Don't communicate by sharing memory, share memory by communicating." As in this case, I think an idomatic way is to make channels send to a seprate goroutine which synchronize all the operations together using select. The code can easily be extended by adding more channels to support more kinds of operations (like the timed channel in your code which I don't fully understand what does it do), and by using select and other utils, it can easily handle more complex synchronizing by using locks. I write some sample code:
type SyncQueue struct {
Q AbsQueue
pushCh,popMsgCh chan Message
popOkCh chan bool
popCh chan struct{}
}
// An abstract of the Queue type. You can remove the abstract layer.
type AbsQueue interface {
Push(Message)
Pop() (Message,bool)
}
func (sq SyncQueue) Push(m Message) {
sq.pushCh <- m
}
func (sq SyncQueue) Pop() (Message,bool) {
sq.popCh <- struct{}{} // send a signal for pop. struct{}{} cost no memory at all.
return <-sq.popMsgCh,<-sq.popOkCh
}
// Every pop and push get synchronized here.
func (sq SyncQueue) Run() {
for {
select {
case m:=<-pushCh:
Q.Push(m)
case <-popCh:
m,ok := Q.Pop()
sq.popMsgCh <- m
sq.popOkCh <- ok
}
}
}
func NewSyncQueue(Q AbsQueue) *SyncQueue {
sq:=SyncQueue {
Q:Q,
pushCh: make(chan Message),popMsgCh: make(chan Message),
pushOkCh: make(chan bool), popCh: make(chan struct{}),
}
go sq.Run()
return &sq
}
Note that for simpilicity, I did not use a quit channel or a context.Context, so the goroutine of sq.Run() has no way of exiting and would cause a memory leak.
Do I need to lock and unlock using the sync.Mutex in all the Push, Pop and NewTimedChannel methods?
Yes.
And is there a more idiomatic way to handle this specific problem in
the go environment?
For insight, have a look at the last answer for this question:
How do I (succinctly) remove the first element from a slice in Go?

How to use channels to safely synchronise data in Go

Below is an example of how to use mutex lock in order to safely access data. How would I go about doing the same with the use of CSP (communication sequential processes) instead of using mutex lock’s and unlock’s?
type Stack struct {
top *Element
size int
sync.Mutex
}
func (ss *Stack) Len() int {
ss.Lock()
size := ss.size
ss.Unlock()
return size
}
func (ss *Stack) Push(value interface{}) {
ss.Lock()
ss.top = &Element{value, ss.top}
ss.size++
ss.Unlock()
}
func (ss *SafeStack) Pop() (value interface{}) {
ss.Lock()
size := ss.size
ss.Unlock()
if size > 0 {
ss.Lock()
value, ss.top = ss.top.value, ss.top.next
ss.size--
ss.Unlock()
return
}
return nil
}
If you actually were to look at how Go implements channels, you'd essentially see a mutex around an array with some additional thread handling to block execution until the value is passed through. A channel's job is to move data from one spot in memory to another with ease. Therefore where you have locks and unlocks, you'd have things like this example:
func example() {
resChan := make(int chan)
go func(){
resChan <- 1
}()
go func(){
res := <-resChan
}
}
So in the example, the first goroutine is blocked after sending the value until the second goroutine reads from the channel.
To do this in Go with mutexes, one would use sync.WaitGroup which will add one to the group on setting the value, then release it from the group and the second goroutine will lock and then unlock the value.
The oddities in your example are 1 no goroutines, so it's all happening in a single main goroutine and the locks are being used more traditionally (as in c thread like) so channels won't really accomplish anything. The example you have would be considered an anti-pattern, like the golang proverb says "Don't communicate by sharing memory, share memory by communicating."

Resources