I have a Go RPC server that serves client requests. A client requests work (or task) from the server and the server assigns a task to the client. The server expects workers (or clients) to finish any task within a time limit. Therefore a timeout event callback mechanism is required on the server-side.
Here is what I tried so far.
func (l *Listener) RequestHandler(request string, reply string) error {
// some other work
// ....
_timer := time.NewTimer(time.Second * 5) // timer for 2 seconds
go func() {
// simulates a client not replying case, with timeout of 2 sec
y := <-_timer.C
fmt.Println("TimeOut for client")
// revert state changes becasue of client fail
}()
// set reply
// update some states
return nil
}
In the above snippet for each request from a worker (or a client) the handler in the server-side starts a timer and a goroutine. The goroutine reverts the changes done by the handler function before sending a reply to the client.
Is there any way of creating a "set of timers" and blocking wait on the "set of timers" ? Further, whenever a timer expires the blocking wait wakes up and provides us with the timer handles. Depending on the timer type we can perform different expiry handler functions in the runtime.
I am trying to implement a similar mechanism in Go that we can implement in C++ with timerfd with epoll.
Full code for the sample implementation of timers in Go. server.go and client.go.
I suggest you to explored the context package
it can be be done like this:
func main() {
c := context.Background()
wg := &sync.WaitGroup{}
f(c, wg)
wg.Wait()
}
func f(c context.Context, wg *sync.WaitGroup) {
c, _ = context.WithTimeout(c, 3*time.Second)
wg.Add(1)
go func(c context.Context) {
defer wg.Done()
select {
case <-c.Done():
fmt.Println("f() Done:", c.Err())
return
case r := <-time.After(5 * time.Second):
fmt.Println("f():", r)
}
}(c)
}
basically you initiate a base context and then derive other contexts from it, when a context is terminated, either by passing the time or a call to its close, it closes its Done channel and the Done channel of all the contexts that are derived from it.
Related
In a project the program receives data via websocket. This data needs to be processed by n algorithms. The amount of algorithms can change dynamically.
My attempt is to create some pub/sub pattern where subscriptions can be started and canceled on the fly. Turns out that this is a bit more challenging than expected.
Here's what I came up with (which is based on https://eli.thegreenplace.net/2020/pubsub-using-channels-in-go/):
package pubsub
import (
"context"
"sync"
"time"
)
type Pubsub struct {
sync.RWMutex
subs []*Subsciption
closed bool
}
func New() *Pubsub {
ps := &Pubsub{}
ps.subs = []*Subsciption{}
return ps
}
func (ps *Pubsub) Publish(msg interface{}) {
ps.RLock()
defer ps.RUnlock()
if ps.closed {
return
}
for _, sub := range ps.subs {
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
}
}
func (ps *Pubsub) Subscribe() (context.Context, *Subsciption, error) {
ps.Lock()
defer ps.Unlock()
// prep channel
ctx, cancel := context.WithCancel(context.Background())
sub := &Subsciption{
Data: make(chan interface{}, 1),
cancel: cancel,
ps: ps,
}
// prep subsciption
ps.subs = append(ps.subs, sub)
return ctx, sub, nil
}
func (ps *Pubsub) unsubscribe(s *Subsciption) bool {
ps.Lock()
defer ps.Unlock()
found := false
index := 0
for i, sub := range ps.subs {
if sub == s {
index = i
found = true
}
}
if found {
s.cancel()
ps.subs[index] = ps.subs[len(ps.subs)-1]
ps.subs = ps.subs[:len(ps.subs)-1]
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
}
return found
}
func (ps *Pubsub) Close() {
ps.Lock()
defer ps.Unlock()
if !ps.closed {
ps.closed = true
for _, sub := range ps.subs {
sub.cancel()
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(sub.Data)
}
}
}
type Subsciption struct {
Data chan interface{}
cancel func()
ps *Pubsub
}
func (s *Subsciption) Unsubscribe() {
s.ps.unsubscribe(s)
}
As mentioned in the comments there are (at least) two issues with this:
ISSUE1:
After a while of running in implementation of this I get a few errors of this kind:
goroutine 120624 [runnable]:
bm/internal/pubsub.(*Pubsub).Publish.func1(0x8586c0, 0xc00b44e880, 0xc008617740)
/home/X/Projects/bm/internal/pubsub/pubsub.go:30
created by bookmaker/internal/pubsub.(*Pubsub).Publish
/home/X/Projects/bm/internal/pubsub/pubsub.go:30 +0xbb
Without really understanding this it appears to me that the goroutines created in Publish() do accumulate/leak. Is this correct and what am I doing wrong here?
ISSUE2:
When I end a subscription via Unsubscribe() it occurs that Publish() tried to write to a closed channel and panics. To mitigate this I have created a goroutine to close the channel with a delay. This feel really off-best-practice but I was not able to find a proper solution to this. What would be a deterministic way to do this?
Heres a little playground for you to test with: https://play.golang.org/p/K-L8vLjt7_9
Before diving into your solution and its issues, let me recommend again another Broker approach presented in this answer: How to broadcast message using channel
Now on to your solution.
Whenever you launch a goroutine, always think of how it will end and make sure it does if the goroutine is not ought to run for the lifetime of your app.
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
This goroutine tries to send a value on ch. This may be a blocking operation: it will block if ch's buffer is full and there is no ready receiver on ch. This is out of the control of the launched goroutine, and also out of the control of the pubsub package. This may be fine in some cases, but this already places a burden on the users of the package. Try to avoid these. Try to create APIs that are easy to use and hard to misuse.
Also, launching a goroutine just to send a value on a channel is a waste of resources (goroutines are cheap and light, but you shouldn't spam them whenever you can).
You do it because you don't want to get blocked. To avoid blocking, you may use a buffered channel with a "reasonable" high buffer. Yes, this doesn't solve the blocking issue, in only helps with "slow" clients receiving from the channel.
To "truly" avoid blocking without launching a goroutine, you may use non-blocking send:
select {
case ch <- msg:
default:
// ch's buffer is full, we cannot deliver now
}
If send on ch can proceed, it will happen. If not, the default branch is chosen immediately. You have to decide what to do then. Is it acceptable to "lose" a message? Is it acceptable to wait for some time until "giving up"? Or is it acceptable to launch a goroutine to do this (but then you'll be back at what we're trying to fix here)? Or is it acceptable to get blocked until the client can receive from the channel...
Choosing a reasonable high buffer, if you encounter a situation when it still gets full, it may be acceptable to block until the client can advance and receive from the message. If it can't, then your whole app might be in an unacceptable state, and it might be acceptable to "hang" or "crash".
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
Closing a channel is a signal to the receiver(s) that no more values will be sent on the channel. So always it should be the sender's job (and responsibility) to close the channel. Launching a goroutine to close the channel, you "hand" that job and responsibility to another "entity" (a goroutine) that will not be synchronized to the sender. This may easily end up in a panic (sending on a closed channel is a runtime panic, for other axioms see How does a non initialized channel behave?). Don't do that.
Yes, this was necessary because you launched goroutines to send. If you don't do that, then you may close "in-place", without launching a goroutine, because then the sender and closer will be the same entity: the Pubsub itself, whose sending and closing operations are protected by a mutex. So solving the first issue solves the second naturally.
In general if there are multiple senders for a channel, then closing the channel must be coordinated. There must be a single entity (often not any of the senders) that waits for all senders to finish, practically using a sync.WaitGroup, and then that single entity can close the channel, safely. See Closing channel of unknown length.
Timeout handler moves ServeHTTP execution on a new goroutine, but not able to kill that goroutine after the timer ends. On every request, it creates two goroutines, but ServeHTTP goroutines never kill with context.
Not able to find a way to kill goroutines.
Edit For-loop with time.Sleep function, represents huge computation which goes beyond our timer. Can replace it with any other function.
package main
import (
"fmt"
"io"
"net/http"
"runtime"
"time"
)
type api struct{}
func (a api) ServeHTTP(w http.ResponseWriter, req *http.Request) {
// For-loop block represents huge computation and usually takes more time
// Can replace with any code
i := 0
for {
if i == 500 {
break
}
fmt.Printf("#goroutines: %d\n", runtime.NumGoroutine())
time.Sleep(1 * time.Second)
i++
}
_, _ = io.WriteString(w, "Hello World!")
}
func main() {
var a api
s := http.NewServeMux()
s.Handle("/", a)
h := http.TimeoutHandler(s, 1*time.Second, `Timeout`)
fmt.Printf("#goroutines: %d\n", runtime.NumGoroutine())
_ = http.ListenAndServe(":8080", h)
}
ServeHTTP goroutine should kill along with request context, normally which does not happen.
Use context.Context to instruct go-routines to abort their function. The go-routines, of course, have to listen for such cancelation events.
So for your code, do something like:
ctx := req.Context() // this will be implicitly canceled by your TimeoutHandler after 1s
i := 0
for {
if i == 500 {
break
}
// for any long wait (1s etc.) always check the state of your context
select {
case <-time.After(1 * time.Second): // no cancelation, so keep going
case <-ctx.Done():
fmt.Println("request context has been canceled:", ctx.Err())
return // terminates go-routine
}
i++
}
Playground: https://play.golang.org/p/VEnW0vsItXm
Note: Context are designed to be chained - allowing for multiple levels of sub-tasks to be canceled in a cascading manner.
In a typical REST call one would initiate a database request. So, to ensure such a blocking and/or slow call completes in a timely manner, instead of using Query one should use QueryContext - passing in the http request's context as the first argument.
I found, if you do not have any way to reach to your channel then there is no way to kill or stop goroutine when it is running.
In the large computational task, you have to watch the channel on a specific interval or after specific task completion.
Got some help here already that made me move forward in this concept I'm trying, but it still isn't quite working and I hit a conflict that I can't seem to get around.
I have tried here to illustrate what I want in a flowchart - please note that the client can be many clients that will send in with printjobs therefore we cannot reply on the worker to be processing our job at that moment, but for most it will (peak times it won't as the processing job of printing can take time).
type QueueElement struct {
jobid string
rw http.ResponseWriter
doneChan chan struct{}
}
type GlobalVars struct {
db *sql.DB
wg sync.WaitGroup
jobs chan QueueElement
}
func (gv *GlobalVars) ServeHTTP(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/StartJob":
fmt.Printf ("incoming\r\n")
doneC := make(chan struct{}, 1) //Buffered channel in order not to block the worker routine
newPrintJob := QueueElement{
doneChan: doneC,
jobid: "jobid",
}
gv.jobs <- newPrintJob
func(doneChan chan struct{},w http.ResponseWriter) {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
select {
//If this triggers first, then this waiting goroutine would exit
//and nobody would be listeding the 'doneChan'. This is why it has to be buffered.
case <-ctx.Done():
fmt.Fprintf(w, "job is taking more than 5 seconds to complete\r\n")
fmt.Printf ("took longer than 5 secs\r\n")
case <-doneChan:
fmt.Fprintf(w, "instant reply from serveHTTP\r\n")
fmt.Printf ("instant\r\n")
}
}(doneC,w)
default:
fmt.Fprintf(w, "No such Api")
}
}
func worker(jobs <-chan QueueElement) {
for {
job := <-jobs
processExec ("START /i /b try.cmd")
fmt.Printf ("job done")
// processExec("START /i /b processAndPrint.exe -" + job.jobid)
job.doneChan <- struct{}{}
}
}
func main() {
db, err := sql.Open("sqlite3", "jobs.sqlite")
if err := db.Ping(); err != nil {
log.Fatal(err)
}
db.SetMaxOpenConns(1) // prevents locked database error
_, err = db.Exec(setupSQL)
if err != nil {
log.Fatal(err)
}
// create a GlobalVars instance
gv := GlobalVars{
db : db,
jobs: make(chan QueueElement),
}
go worker (gv.jobs)
// create an http.Server instance and specify our job manager as
// the handler for requests.
server := http.Server{
Handler: &gv,
Addr : ":8888",
}
// start server and accept connections.
log.Fatal(server.ListenAndServe())
}
The above code is of the serveHTTP and the worker with help from here, initially the func inside the ServeHTTP was a go routine and it's here the whole conflict arises for me - the concept is that in the serveHTTP it spawns a process that will get reply from the worker if the worker were able to process the job in time, within 5 seconds.
If the job was able to finish in 1 second, I want to reply back instantly after that 1 second to the client, if it takes 3 I want to reply after 3, if it takes more than 5 I will send reply back after 5 seconds (if the job takes 13 seconds I still want to reply after 5 seconds). That the client has to poll on the job from now on - but the conflict is:
a) When ServeHTTP exits - then the ResponseWriter closes - and to be able to reply back to the client, we have to write the answer to the ResponseWriter.
b) if I block serveHTTP (like in the code example below where I don't call the func as a go routine) then it affects not only that single API call but it seems that all others after that will be affected, so the first call in will be served correctly in time but the calls that would be coming in at the same time after the first will be sequentially delayed by the blocking operation.
c) if I don't block it ex - and change it to a go routine :
gv.jobs <- newPrintJob
go func(doneChan chan struct{},w http.ResponseWriter) {
Then there's no delay - can call in many APIs - but problem is here serveHTTP exists right away and thereby kills ResponseWriter and then I'm not able to reply back to the client.
I am not sure how I can get around this conflict where I don't cause any blockage to serveHTTP so I can handle all requests parallel, but still able to reply back to the ResponseWriter in question.
Is there any way of preventing serveHTTP of shutting down a responsewriter even though the function exits?
Yes, you are right with your point "c) if i don't block it ex".
In order to save the response writer, you shouldn't call a go routine inside it. Rather you should call ServeHTTP as a go-routine, which most of the http server implementations do.
This way you won't be blocking any api calls, each api call will run in a different go-routine, blocked by their functionalities.
Since your "jobs chan QueueElement" is a single channel (not a buffered channel), so all your processes get blocked at "gv.jobs <- newPrintJob".
You should rather use a buffered channel so that all api calls can add it to the queue and get response depending on work completion or time out.
Having a buffered channel simulates the real world memory limit of printers too. (queue length 1 being a special case)
I've add some updates to your code. Now it works as you've described.
package main
import (
"database/sql"
"fmt"
"log"
"math/rand"
"net/http"
"sync"
"time"
)
type QueueElement struct {
jobid string
rw http.ResponseWriter
doneChan chan struct{}
}
type GlobalVars struct {
db *sql.DB
wg sync.WaitGroup
jobs chan QueueElement
}
func (gv *GlobalVars) ServeHTTP(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/StartJob":
fmt.Printf("incoming\r\n")
doneC := make(chan struct{}, 1) //Buffered channel in order not to block the worker routine
go func(doneChan chan struct{}, w http.ResponseWriter) {
gv.jobs <- QueueElement{
doneChan: doneC,
jobid: "jobid",
}
}(doneC, w)
select {
case <-time.Tick(time.Second * 5):
fmt.Fprintf(w, "job is taking more than 5 seconds to complete\r\n")
fmt.Printf("took longer than 5 secs\r\n")
case <-doneC:
fmt.Fprintf(w, "instant reply from serveHTTP\r\n")
fmt.Printf("instant\r\n")
}
default:
fmt.Fprintf(w, "No such Api")
}
}
func worker(jobs <-chan QueueElement) {
for {
job := <-jobs
fmt.Println("START /i /b try.cmd")
fmt.Printf("job done")
randTimeDuration := time.Second * time.Duration(rand.Intn(7))
time.Sleep(randTimeDuration)
// processExec("START /i /b processAndPrint.exe -" + job.jobid)
job.doneChan <- struct{}{}
}
}
func main() {
// create a GlobalVars instance
gv := GlobalVars{
//db: db,
jobs: make(chan QueueElement),
}
go worker(gv.jobs)
// create an http.Server instance and specify our job manager as
// the handler for requests.
server := http.Server{
Handler: &gv,
Addr: ":8888",
}
// start server and accept connections.
log.Fatal(server.ListenAndServe())
}
select statement should be outside the goroutine function and blocks request till the end of the job execution or reaching timeout.
I'm implementing a feature where I need to read files from a directory, parse and export them to a REST service at a regular interval. As part of the same I would like to gracefully handle the program termination (SIGKILL, SIGQUIT etc).
Towards the same I would like to know how to implement Context based cancellation of process.
For executing the flow in regular interval I'm using gocron.
cmd/scheduler.go
func scheduleTask(){
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
s := gocron.NewScheduler()
s.Every(10).Minutes().Do(processTask, ctx)
s.RunAll() // run immediate
<-s.Start() // schedule
for {
select {
case <-(ctx).Done():
fmt.Print("context done")
s.Remove(processTask)
s.Clear()
cancel()
default:
}
}
}
func processTask(ctx *context.Context){
task.Export(ctx)
}
task/export.go
func Export(ctx *context.Context){
pendingFiles, err := filepath.Glob("/tmp/pending/" + "*_task.json")
//error handling
//as there can be 100s of files, I would like to break the loop when context.Done() to return asap & clean up the resources here as well
for _, fileName := range pendingFiles {
exportItem(fileName string)
}
}
func exportItem(fileName string){
data, err := ReadFile(fileName) //not shown here for brevity
//err handling
err = postHTTPData(string(data)) //not shown for brevity
//err handling
}
For the process management, I think the other component is the actual handling of signals, and managing the context from those signals.
I'm not sure of the specifics of go-cron (they have an example showing some of these concepts on their github) but in general I think that the steps involved are:
Registration of os signals handler
Waiting to receive a signal
Canceling top level context in response to a signal
Example:
sigCh := make(chan os.Signal, 1)
defer close(sigCh)
signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)
<-sigCh
cancel()
I'm not sure how this will look in the context of go-cron, but the context that the signal handling code cancels should be a parent of the context that the task and job is given.
Worked this out myself just now. I've always felt the blog post on contexts was A LOT of material to try and understand so a simpler demonstration would be nice.
There are many scenarios you may encounter. Each one is different and will require adaptation. Here's one example:
Say you have a channel that could run for an indeterminate amount of time.
indeterminateChannel := make(chan string)
for s := range indeterminateChannel{
fmt.Println(s)
}
Your producer might look something like:
for {
indeterminateChannel <- "Terry"
}
We don't control the producer, so we need someway to cut out of your print loop if the producer exceeds your time limit.
indeterminateChannel := make(chan string)
// Close the channel when our code exits so OUR for loop no longer occupies
// resources and the goroutine exits.
// The producer will have a problem, but we don't care about their problems.
// In this instance.
defer close(indeterminateChannel)
// we wait for this context to time out below the goroutine.
ctx, cancel := context.WithTimeout(context.TODO(), time.Minute*1)
defer cancel()
go func() {
for s := range indeterminateChannel{
fmt.Println(s)
}
}()
<- ctx.Done // wait for the context to terminate based on a timeout.
You can also check ctx.Err to see if the context exited due to a timeout or because it was canceled.
You might also want to learn about how to properly check if the context failed due to a deadline: How to check if an error is "deadline exceeded" error?
Or if the context was canceled: How to check if a request was cancelled
is there a possible in go to defer a go routine, or a way to achieve the desired behaviour? The following background: I am pooling connections to a database in a channel. Basically in a handler I call
session, err := getSessionFromQueue()
// ...
// serving content to my client
// ...
go queueSession(session)
What I really would like to do is:
session, err := getSessionFromQueue()
defer go queueSession(session)
// ...
// serving content to my client
// ...
to avoid that my handler is hanging/crashing at some point and the session is not properly returned to the queue. The reason I want to run it as a go routine is that queueSession is potentially blocking for 1 second (in case the queue is full I wait for one second before I completely close the session).
Update
#abhink got me on the right track there. I solved the problem by putting the call to a goroutine in queueBackend.
func queueSession(mongoServer *Server) {
go func(mongoServer *Server) {
select {
case mongoQueue <- mongoServer:
// mongoServer stored in queue, done.
case <- time.After(1 * time.Second):
// cannot queue for whatever reason after 1 second
// abort
mongoServer.Close()
}
}(mongoServer)
}
Now I can simply call
defer queueSession(session)
and it is run as a goroutine.
There is no way to directly defer a goroutine. You can try something like this:
session, err := getSessionFromQueue()
defer func() {
go queueSession(session)
}()