I have a node that I want to spin up until I decide I want to stop it. I currently have a Start method that blocks on the Contexts Done channel, I then have a Stop function that calls the cancel, in my tests my Start seems to hang forever and the Stop is never called. I can't work out why the Done signal isn't being called and stopping my node.
var (
// Ctx is the node's main context.
Ctx, cancel = context.WithCancel(context.Background())
// Cancel is a function used to initiate graceful shutdown.
Cancel = cancel
)
type (
Node struct {
database *store.Store
}
)
// Starts run the Node.
func (n *Node) Start() error {
var nodeError error
defer func() {
err := n.database.Close()
if err != nil {
nodeError = err
}
}()
<-Ctx.Done()
return nodeError
}
// Stop stops the node.
func (n *Node) Stop() {
Cancel()
}
And my test is:
func TestNode_Start(t *testing.T) {
n, _ := node.NewNode("1.0")
err := n.Start()
n.Stop()
assert.NoError(t, err)
}
There are several problems with your code. Let's break it down.
var (
// Ctx is the node's main context.
Ctx, cancel = context.WithCancel(context.Background())
// Cancel is a function used to initiate graceful shutdown.
Cancel = cancel
)
These should not be package variables. They should be instance variables--that is to say, members or the Node struct. By making these package variables, if you have multiple tests that use Node, they will all step on each others toes, cause race conditions, and crashes. So instead, do this:
type Node struct {
database *store.Store
ctx context.Context
cancel context.CancelFunc
}
Next, we see that you have a deferred function in your Start() method:
// Starts run the Node.
func (n *Node) Start() error {
var nodeError error
defer func() {
err := n.database.Close()
if err != nil {
nodeError = err
}
}()
/* snip */
This does not do what you expect. It closes the database connection as soon as Start() returns--before anything possibly has a chance to use it.
Instead, you should close the database connection as part of your Stop() method:
// Stop stops the node.
func (n *Node) Stop() error {
n.cancel()
return n.database.Close()
}
And finally, your Start() method blocks, because it waits for the context to cancel, which cannot possibly be canceled until Stop() is called, which is only ever called after Start() returns:
func (n *Node) Start() error {
/* snip */
<-Ctx.Done()
return nodeError
}
I cannot think of any reason to have <-Ctx.Done in Start() at all, so I would just remove it.
With all of my suggested changes, you should have something like this:
type Node struct {
database *store.Store
ctx context.Context
cancel context.CancelFunc
}
// Starts run the Node.
func (n *Node) Start() {
n.ctx, n.cancel = context.WithCancel(context.Background())
}
// Stop stops the node.
func (n *Node) Stop() error {
n.cancel()
return n.database.Close()
}
Of course, this still leaves open the question of if/where/how ctx is used. Since your original code didn't include that, I didn't either.
Related
I have the following code for a module I'm developing and I'm not sure why the provider.Shutdown() function is never called when I called .Stop()
The main process does stop but I'm confused why this doesn't work?
package pluto
import (
"context"
"fmt"
"log"
"sync"
)
type Client struct {
name string
providers []Provider
cancelCtxFunc context.CancelFunc
}
func NewClient(name string) *Client {
return &Client{name: name}
}
func (c *Client) Start(blocking bool) {
log.Println(fmt.Sprintf("Starting the %s service", c.name))
ctx, cancel := context.WithCancel(context.Background())
c.cancelCtxFunc = cancel // assign for later use
var wg sync.WaitGroup
for _, p := range c.providers {
wg.Add(1)
provider := p
go func() {
provider.Setup()
select {
case <-ctx.Done():
// THIS IS NEVER CALLED?!??!
provider.Shutdown()
return
default:
provider.Run(ctx)
}
}()
}
if blocking {
wg.Wait()
}
}
func (c *Client) RegisterProvider(p Provider) {
c.providers = append(c.providers, p)
}
func (c *Client) Stop() {
log.Println("Attempting to stop service")
c.cancelCtxFunc()
}
Client code
package main
import (
"pluto/pkgs/pluto"
"time"
)
func main() {
client := pluto.NewClient("test-client")
testProvider := pluto.NewTestProvider()
client.RegisterProvider(testProvider)
client.Start(false)
time.Sleep(time.Second * 3)
client.Stop()
}
Because it's already chosen the other case before the context is cancelled. Here is your code, annotated:
// Start a new goroutine
go func() {
provider.Setup()
// Select the first available case
select {
// Is the context cancelled right now?
case <-ctx.Done():
// THIS IS NEVER CALLED?!??!
provider.Shutdown()
return
// No? Then call provider.Run()
default:
provider.Run(ctx)
// Run returned, nothing more to do, we're not in a loop, so our goroutine returns
}
}()
Once provider.Run is called, cancelling the context isn't going to do anything in the code shown. provider.Run also gets the context though, so it is free to handle cancellation as it sees fit. If you want your routine to also see cancellation, you could wrap this in a loop:
go func() {
provider.Setup()
for {
select {
case <-ctx.Done():
// THIS IS NEVER CALLED?!??!
provider.Shutdown()
return
default:
provider.Run(ctx)
}
}
}()
This way, once provider.Run returns, it will go through the select again, and if the context has been cancelled, that case will be called. However, if the context hasn't been cancelled, it'll call provider.Run again, which may or may not be what you want.
EDIT:
More typically, you'd have one of a couple scenarios, depending on how provider.Run and provider.Shutdown work, which hasn't been made clear in the question, so here are your options:
Shutdown must be called when the context is cancelled, and Run must only be called once:
go func() {
provider.Setup()
go provider.Run(ctx)
go func() {
<- ctx.Done()
provider.Shutdown()
}()
}
Or Run, which already receives the context, already does the same thing as Shutdown when the context is cancelled, and therefore calling Shutdown when the context is cancelled is wholly unnecessary:
go provider.Run(ctx)
I have the following code in Go using the semaphore library just as an example:
package main
import (
"fmt"
"context"
"time"
"golang.org/x/sync/semaphore"
)
// This protects the lockedVar variable
var lock *semaphore.Weighted
// Only one go routine should be able to access this at once
var lockedVar string
func acquireLock() {
err := lock.Acquire(context.TODO(), 1)
if err != nil {
panic(err)
}
}
func releaseLock() {
lock.Release(1)
}
func useLockedVar() {
acquireLock()
fmt.Printf("lockedVar used: %s\n", lockedVar)
releaseLock()
}
func causeDeadLock() {
acquireLock()
// calling this from a function that's already
// locked the lockedVar should cause a deadlock.
useLockedVar()
releaseLock()
}
func main() {
lock = semaphore.NewWeighted(1)
lockedVar = "this is the locked var"
// this is only on a separate goroutine so that the standard
// go "deadlock" message doesn't print out.
go causeDeadLock()
// Keep the primary goroutine active.
for true {
time.Sleep(time.Second)
}
}
Is there a way to get the acquireLock() function call to print a message after a timeout indicating that there is a potential deadlock but without unblocking the call? I would want the deadlock to persist, but a log message to be written in the event that a timeout is reached. So a TryAcquire isn't exactly what I want.
An example of what I want in psuedo code:
afterFiveSeconds := func() {
fmt.Printf("there is a potential deadlock\n")
}
lock.Acquire(context.TODO(), 1, afterFiveSeconds)
The lock.Acquire call in this example would call the afterFiveSeconds callback if the Acquire call blocked for more than 5 seconds, but it would not unblock the caller. It would continue to block.
I think I've found a solution to my problem.
func acquireLock() {
timeoutChan := make(chan bool)
go func() {
select {
case <-time.After(time.Second * time.Duration(5)):
fmt.Printf("potential deadlock while acquiring semaphore\n")
case <-timeoutChan:
break
}
}()
err := lock.Acquire(context.TODO(), 1)
close(timeoutChan)
if err != nil {
panic(err)
}
}
I'm trying to write a program in go that is similar to cron with the addition that jobs are given a max runtime and if a function exceeds this duration, the job should exit. Here is my my whole code:
package main
import (
"fmt"
"log"
"sync"
"time"
)
type Job struct {
ID string
MaxRuntime time.Duration
Frequency time.Duration
Function func()
}
func testFunc() {
log.Println("OPP11")
time.Sleep(7 * time.Second)
log.Println("OP222")
}
func New(ID, frequency, runtime string, implementation func()) Job {
r, err := time.ParseDuration(runtime)
if err != nil {
panic(err)
}
f, err := time.ParseDuration(frequency)
if err != nil {
panic(err)
}
j := Job{ID: ID, MaxRuntime: r, Frequency: f, Function: implementation}
log.Printf("Created job %#v with frequency %v and max runtime %v", ID, f, r)
return j
}
func (j Job) Run() {
for range time.Tick(j.Frequency) {
start := time.Now()
log.Printf("Job %#v executing...", j.ID)
done := make(chan int)
//quit := make(chan int)
//var wg sync.WaitGroup
//wg.Add(1)
go func() {
j.Function()
done <- 0
}()
select {
case <-done:
elapsed := time.Since(start)
log.Printf("Job %#v completed in %v \n", j.ID, elapsed)
case <-time.After(j.MaxRuntime):
log.Printf("Job %#v halted after %v", j.ID, j.MaxRuntime)
// here should exit the above goroutine
}
}
}
func main() {
// create a new job given its name, frequency, max runtime
// and the function it should run
testJob := New("my-first-job", "3s", "5s", func() {
testFunc()
})
testJob.Run()
}
What I'm trying to do is that in the second case in the select of the Run() function, it should exit the goroutine which is running the function. I tried to do this by wrapping the function in a for loop with a select statement which listens on a quit channel like this:
go func() {
for {
select {
case <-quit:
fmt.Println("quiting goroutine")
return
default:
j.Function()
done <- 0
}
}
}()
And then having quit <- 1 in the Run() function, but that doesnt seem to be doing anything. Is there a better of doing this?
As explained in the comments, the whole problem is that you want to cancel the execution of a function (j.Function) that isn't cancellable.
There's no way to "kill a goroutine". Goroutines work in a cooperative fashion. If you want to be able to "kill it", you need to ensure that the function running in that Goroutine has a mechanism for you to signal that it should stop what it's doing and return, letting the Goroutine that was running it finally terminate.
The standard way of indicating that a function is cancellable is by having it take a context.Context as its first param:
type Job struct {
// ...
Function func(context.Context)
}
Then you create the context and pass it to the j.Function. Since your cancellation logic is simply based on a timeout, there's no need to write all that select ... case <-time.After(...), as that is provided as built-in functionality with a context.Context:
func (j Job) Run() {
for range time.Tick(j.Frequency) {
go j.ExecuteOnce()
}
}
func (j Job) ExecuteOnce() {
log.Printf("Job %#v executing...", j.ID)
ctx, cancel := context.WithTimeout(context.Background(), j.MaxRuntime)
defer cancel()
j.Function(ctx)
}
Now, to finish, you have to rewrite the functions that you're going to be passing to your job scheduler so that they take context.Context and, very importantly, that they use it properly and cancel whatever they're doing when the context is cancelled.
This means that if you're writing the code for those funcs and they will somehow block, you'll be responsible for writing stuff like:
select {
case <-ctx.Done():
return ctx.Err()
case ...your blocking case...:
}
If your funcs are invoking 3rd party code, then that code needs to be aware of context and cancellation, and you'll need to pass down the ctx your funcs receive.
I'm relatively new to Golang and am trying to incorporate Contexts into my code.
I see the benefits in terms of cancelling from the parent as well as sharing context-specific stuff (loggers, for example).
Beyond that, I might be missing something, but I can't see a way for a child to cancel the context. The example here would be if one of the child routines encounters an error that means the whole context is done.
Here's some sample code:
package main
import (
"context"
"fmt"
"math/rand"
"os"
"os/signal"
"sync"
"time"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
// handle SIGINT (control+c)
go func() {
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
<-c
fmt.Println("main: interrupt received. cancelling context.")
cancel()
}()
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
child1DoWork(ctx)
wg.Done()
}()
wg.Add(1)
go func() {
child2DoWork(ctx)
wg.Done()
}()
fmt.Println("main: waiting for children to finish")
wg.Wait()
fmt.Println("main: children done. exiting.")
}
func child1DoWork(ctx context.Context) {
// pretend we're doing something useful
tck := time.NewTicker(5 * time.Second)
for {
select {
case <-tck.C:
fmt.Println("child1: still working")
case <-ctx.Done():
// context cancelled
fmt.Println("child1: context cancelled")
return
}
}
}
func child2DoWork(ctx context.Context) {
// pretend we're doing something useful
tck := time.NewTicker(2 * time.Second)
for {
select {
case <-tck.C:
if rand.Intn(5) < 4 {
fmt.Println("child2: did some work")
} else {
// pretend we encountered an error
fmt.Println("child2: error encountered. need to cancel but how do I do it?!?")
// PLACEHOLDER: HOW TO CANCEL FROM HERE?
return
}
case <-ctx.Done():
// context cancelled
fmt.Println("child2: context cancelled")
return
}
}
}
Here you have an example of cancelling from the parent (due to SIGINT) which works great. However, there's a placeholder in child2DoWork where an error is encountered and I want to then cancel the whole context, but I can't see a way to do that with the vanilla context capabilities.
Is this out-of-scope for contexts? Clearly I could communicate from child2 back to the parent which could then cancel, but I'm wondering if there isn't an easier way.
If communication back to the parent is the proper way, is there an idiomatic way of doing this? It does seem like a common problem.
Thanks!
A child can't and shouldn't cancel a context, it's the parent's call. What a child may do is return an error, and the parent should decide if the error requires cancelling the context.
Just because a "subtask" fails, it doesn't mean all other subtasks need to be cancelled. Often, a failing subtask may have a meaning that other subtasks become more important. Think of a parallel search: you may use multiple subtasks to search for the same thing in multiple sources. You may use the fastest result and may wish to cancel the slower ones. If a search fails, you do want the rest to continue.
Obviously if you pass the cancel function to the child, the child will have the power to cancel the context. But instead leave that power at the parent.
Is this out-of-scope for contexts? Clearly I could communicate from child2 back to the parent which could then cancel, but I'm wondering if there isn't an easier way.
Yes, this is exactly backwards for contexts. They are explicitly for a caller to cancel. The correct mechanism here is the simplest and most obvious: when child2DoWork encounters an error, it should return an error, and when the caller gets an error back, if the correct response is to cancel other tasks, it can then cancel the appropriate context(s).
Essentially, the child is a task, and it should be isolated from any other tasks. It shouldn't be trying to manage its siblings; the parent should be managing all of its children.
In the case that
parent spawn multiple child goroutines to achieve one goal
if one child failed, parent need to stop its siblings
you can use a channel to communicate, parent can listen to the channel, once there is an error, parent can cancel all children task.
I have modified your code
package main
import (
"context"
"fmt"
"math/rand"
"os"
"os/signal"
"sync"
"time"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
errChan := make(chan error)
// handle SIGINT (control+c)
go func() {
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
select {
case <-c:
fmt.Println("main: interrupt received. cancelling context.")
case err := <-errChan:
fmt.Printf("main: child goroutine returns error. cancelling context. %s\n", err)
}
cancel()
}()
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
child1DoWork(ctx)
wg.Done()
}()
wg.Add(1)
go func() {
child2DoWork(ctx, errChan)
wg.Done()
}()
fmt.Println("main: waiting for children to finish")
wg.Wait()
fmt.Println("main: children done. exiting.")
}
func child1DoWork(ctx context.Context) {
// pretend we're doing something useful
tck := time.NewTicker(5 * time.Second)
for {
select {
case <-tck.C:
fmt.Println("child1: still working")
case <-ctx.Done():
// context cancelled
fmt.Println("child1: context cancelled")
return
}
}
}
func child2DoWork(ctx context.Context, errChan chan error) {
// pretend we're doing something useful
tck := time.NewTicker(2 * time.Second)
for {
select {
case <-tck.C:
if rand.Intn(5) < 4 {
fmt.Println("child2: did some work")
} else {
// pretend we encountered an error
fmt.Println("child2: error encountered")
// PLACEHOLDER: HOW TO CANCEL FROM HERE?
errChan <- fmt.Errorf("error in child2")
return
}
case <-ctx.Done():
// context cancelled
fmt.Println("child2: context cancelled")
return
}
}
}
I am attempting to create a poller in Go that spins up and every 24 hours executes a function.
I want to also be able to stop the polling, I'm attempting to do this by having a done channel and passing down an empty struct to stop the for loop.
In my tests, the for just loops infinitely and I can't seem to stop it, am I using the done channel incorrectly? The ticker case works as expected.
Poller struct {
HandlerFunc HandlerFunc
interval *time.Ticker
done chan struct{}
}
func (p *Poller) Start() error {
for {
select {
case <-p.interval.C:
err := p.HandlerFunc()
if err != nil {
return err
}
case <-p.done:
return nil
}
}
}
func (p *Poller) Stop() {
p.done <- struct{}{}
}
Here is the test that's exeuting the code and causing the infinite loop.
poller := poller.NewPoller(
testHandlerFunc,
time.NewTicker(1*time.Millisecond),
)
err := poller.Start()
assert.Error(t, err)
poller.Stop()
Seems like problem is in your use case, you calling poller.Start() in blocking maner, so poller.Stop() is never called. It's common, in go projects to call goroutine inside of Start/Run methods, so, in poller.Start(), i would do something like that:
func (p *Poller) Start() <-chan error {
errc := make(chan error, 1 )
go func() {
defer close(errc)
for {
select {
case <-p.interval.C:
err := p.HandlerFunc()
if err != nil {
errc <- err
return
}
case <-p.done:
return
}
}
}
return errc
}
Also, there's no need to send empty struct to done channel. Closing channel like close(p.done) is more idiomatic for go.
There is no explicit way in Go to broadcast an event to go routines for something like cancellation. Instead its idiomatic to create a channel that when closed signifies a message such as cancelling any work it has to do. Something like this is a viable pattern:
var done = make(chan struct{})
func cancelled() bool {
select {
case <-done:
return true
default:
return false
}
}
Go-routines can call cancelled to poll for a cancellation.
Then your main loop can respond to such an event but make sure you drain any channels that might cause go-routines to block.
for {
select {
case <-done:
// Drain whatever channels you need to.
for range someChannel { }
return
//.. Other cases
}
}