how to cleanly stop goroutines internally on error - go

All,
I'm writing a program involving tcp traffic that has several points of failure, and
I'd like to be able to exit out of a goroutine smoothly in an error condition without incurring coding overhead.
Here's some pseudocode:
func main() {
l, err := net.Listen(CONN_TYPE, CONN_HOST+":"+ CONN_PORT)
for {
// Listen for an incoming connection.
conn, err := l.Accept()
if err != nil {
fmt.Println("Error accepting: ", err.Error())
os.Exit(1)
}
done_flag := make(chan bool, 1)
// Handle connections in a new goroutine.
go func() {
conn.Write([]byte("string1\n"))
conn.Write([]byte("string2\n"))
...
}()
}
}
Now, what I'm trying to avoid is the following code with the connection statements, where I wrap the code in error handling inside the goroutine (something like the following):
go func() {
if (_err := _send_ack(conn, "string1\n"); _err != nil {
done_flag <- true
}
if (_err := _send_ack(conn, "string2\n"); _err != nil {
done_flag <- true
}
}()
Instead, if there was a connection issue, I'd rather short circuit the whole thing and just exit the goroutine with an error right then and there - and I'd rather not have to worry about how I structure the code. I could perhaps, further wrap _send_ack and send the channel as a function parameter - but that gets iffy if the program gets to be highly hierarchical. For example, I might have a goroutine composed of several funcs, each of which handles a different tcp conversation - and I don't want to litter my subroutines with an extra channel parameter to propogate the channel up and down the call stack just in case I have to set a done flag. Plus there is the question of what happens to the goroutine after the done flag is being set and how to handle it in the caller.
If I was working in python, or perl, or C++, i'd throw an exception which has attached to it a stack trace where the error occurred and then process this error in the caller. But since golang doesn't have exceptions, I was hoping for a way to just stop the goroutine cold without actually exiting the main program - ie: set a channel to have the relevant error and then just stop execution at that point.
I see the panic function, but i'm not sure of the side effects of this. Can you panic() out of a goroutine without affecting the main program, or is there a way to intelligently short-circuit a goroutine without side effects, perhaps returning back something akin to an exception, with stack trace and error? Or what is the suggested way to cleanly error handle a hierarchical program like this?
Thanks much for any help - I'm new to golang and it probably shows.
Ed

golang suggests using explicit error instead of using implicit exception.
// for code simplicity
func doSendACKImpl(conn net.Conn) error {
if err := _send_ack(conn, "string1\n"); err != nil {
return err
}
if err := _send_ack(conn, "string2\n"); err != nil {
return err
}
return nil
}
func main() {
l, err := net.Listen(CONN_TYPE, CONN_HOST+":"+ CONN_PORT)
for {
// Listen for an incoming connection.
conn, err := l.Accept()
if err != nil {
fmt.Println("Error accepting: ", err.Error())
os.Exit(1)
}
// can change to self defined ResponseType, here use error for demo
workRes := make(chan error, 1)
go func() {
// return write back to channel
workRes <- doSendACKImpl(conn)
}()
select {
// read result back
case resError := <-workRes:
fmt.Printf("meet error %s", resError)
}
}
}
for more concurrent ability, use more channel buffer size, and move the processing result handler into another goroutine
func main() {
l, _ := net.Listen(CONN_TYPE, CONN_HOST+":"+CONN_PORT)
// more result buffer size
const workSize int = 100
// can change to self defined ResponseType, here use error for demo
workResBuffer := make(chan error, workSize)
// goroutine collect result
go func() {
// get all result from worker responses
for resError := range workResBuffer {
fmt.Printf("meet error %s", resError)
}
}()
for {
// Listen for an incoming connection.
conn, err := l.Accept()
if err != nil {
fmt.Println("Error accepting: ", err.Error())
os.Exit(1)
}
// TODO: limit the goroutine number
go func() {
// return write back to channel
workResBuffer <- doSendACKImpl(conn)
}()
}
}

Related

automatic gRPC unix reconnect after EOF

I have an application (let's call it client) connecting to another process (let's call it server) on the same machine via gRPC. The communication goes over unix socket.
If server is restarted, my client gets an EOF and does not re-establish the connection, although I expected the clientConn to handle the reconnection automatically.
Why isn't the dialer taking care of the reconnection?
I expect it to do so with the backoff params I passed.
Below some pseudo-MWE.
Run establish the initial connection, then spawns goroutineOne
goroutineOne waits for the connection to be ready and delegates the send to fooUpdater
fooUpdater streams the data, or returns in case of errors
for waitUntilReady I used the pseudo-code referenced by this answer to get a new stream.
func main() {
go func() {
if err := Run(ctx); err != nil {
log.Errorf("connection error: %v", err)
}
ctxCancel()
}()
// some wait logic
}
func Run(ctx context.Context) {
backoffConfig := backoff.Config{
BaseDelay: time.Duration(1 * time.Second),
Multiplier: backoff.DefaultConfig.Multiplier,
Jitter: backoff.DefaultConfig.Jitter,
MaxDelay: time.Duration(120 * time.Second),
}
myConn, err := grpc.DialContext(ctx,
"/var/run/foo.bar",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithConnectParams(grpc.ConnectParams{Backoff: backoffConfig, MinConnectTimeout: time.Duration(1 * time.Second)}),
grpc.WithContextDialer(func(ctx context.Context, addr string) (net.Conn, error) {
d := net.Dialer{}
c, err := d.DialContext(ctx, "unix", addr)
if err != nil {
return nil, fmt.Errorf("connection to unix://%s failed: %w", addr, err)
}
return c, nil
}),
)
if err != nil {
return fmt.Errorf("could not establish socket for foo: %w", err)
}
defer myConn.Close()
return goroutineOne()
}
func goroutineOne() {
reconnect := make(chan struct{})
for {
if ready := waitUntilReady(ctx, myConn, time.Duration(2*time.Minute)); !ready {
return fmt.Errorf("myConn: %w, timeout: %s", ErrWaitReadyTimeout, "2m")
}
go func() {
if err := fooUpdater(ctx, dataBuffer, myConn); err != nil {
log.Errorf("foo updater: %v", err)
}
reconnect <- struct{}{}
}()
select {
case <-ctx.Done():
return nil
case <-reconnect:
}
}
}
func fooUpdater(ctx context.Context, dataBuffer custom.CircularBuffer, myConn *grpc.ClientConn) error {
clientStream, err := myConn.Stream(ctx) // custom pb code, returns grpc.ClientConn.NewStream(...)
if err != nil {
return fmt.Errorf("could not obtain stream: %w", err)
}
for {
select {
case <-ctx.Done():
return nil
case data := <-dataBuffer:
if err := clientStream.Send(data); err != nil {
return fmt.Errorf("could not send data: %w", err)
}
}
}
}
func waitUntilReady(ctx context.Context, conn *grpc.ClientConn, maxTimeout time.Duration) bool {
ctx, cancel := context.WithTimeout(ctx, maxTimeout)
defer cancel()
currentState := conn.GetState()
timeoutValid := true
for currentState != connectivity.Ready && timeoutValid {
timeoutValid = conn.WaitForStateChange(ctx, currentState)
currentState = conn.GetState()
// debug print currentState -> prints IDLE
}
return currentState == connectivity.Ready
}
Debugging hints also welcome :)
Based on the provided code and information, there might be an issue with how ctx.Done is being utilized.
The ctx.Done() is being used in fooUpdater and goroutineOnefunctions. When connection breaks, I believe that the ctx.Done() gets called in both functions, with the following execution order:
Connection breaks, the ctx.Done case in the fooUpdater function gets called, exiting the function. The select statement in the goroutineOne function also executes the ctx.Done case, which exists the function, and the client doesn't reconnect.
Try debugging it to check if both select case blocks get executed, but I believe that is the issue here.
According to the GRPC documentation, the connection is re-established if there is a transient failure otherwise it fails immediately. You can try to verify that the failure is transient by printing the connectivity state.
You should print the error code also to understand Why RPC failed.
Maybe what you have tried is not considered a transient failure.
Also, according to the following entry retry logic does not work with streams: grpc-java: Proper handling of retry on client for service streaming call
Here are the links to the corresponding docs:
https://grpc.github.io/grpc/core/md_doc_connectivity-semantics-and-api.html
https://pkg.go.dev/google.golang.org/grpc#section-readme
Also, check the following entry:
Ways to wait if server is not available in gRPC from client side

Closing amqp.Channel when consumer is failed is not responding

I use https://github.com/NeowayLabs/wabbit/
When amqp.Channel is closing after wrong try channel.Consume, we have a not listened chan and function is not responding.
My code:
package main
import (
"fmt"
"github.com/NeowayLabs/wabbit"
"github.com/NeowayLabs/wabbit/amqptest"
"github.com/NeowayLabs/wabbit/amqptest/server"
)
func someFunc(amqpURL string) error {
conn, err := amqptest.Dial(amqpURL)
defer conn.Close()
channel, err := conn.Channel()
defer channel.Close()
consumer, err := channel.Consume(
"queue",
"consumer",
wabbit.Option{},
)
if err != nil {
return err // err = "Unknown queue 'queue'", but we never response it
}
fmt.Println(<-consumer)
return nil
}
func main() {
amqpURL := "127.0.0.1:32773"
fakeServer := server.NewServer(amqpURL)
err := fakeServer.Start()
defer fakeServer.Stop()
err = someFunc(amqpURL)
if err != nil {
panic(err)
}
fmt.Println("Happy end")
}
someFunc never responding with error, but I want to handle consumer errors.
someFunc never responds with an error because it gets hung up in the defer code.
When someFunc gets to the return err line, then it tries to run the defer statements that you set up in the beginning of the function. The first one that it tries is defer channel.Close().
The problem seems to be with this block in the wabbit library: https://github.com/NeowayLabs/wabbit/blob/d8bc549279ecd80204a8a83a868a14fdd81d1a1b/amqptest/server/channel.go#L315-L317
I think, although I am not sure, that writing to the consumer.done channel is a blocking operation because the channel is not buffered and does not have a receiver. See this: https://gobyexample.com/non-blocking-channel-operations for more information.
I commented that block of code out when running this locally and found that the rest of the code ran as you expected that it would, finally resulting in a panic: Unknown queue 'queue'

Terminating function execution if a context is cancelled

I have this current function that originally was not context aware.
func (s *Service) ChunkUpload(r *multipart.Reader) error {
chunk, err := s.parseChunk(r)
if err != nil {
return fmt.Errorf("failed parsing chunk %w", err)
}
if err := os.MkdirAll(chunk.UploadDir, 02750); err != nil {
return err
}
if err := s.saveChunk(chunk); err != nil {
return fmt.Errorf("failed saving chunk %w", err)
}
return nil
}
I've updated it's method call to now take a context.Context as its first argument. My main goal is to terminate and return the function as soon as the context is cancelled.
My initial implementation was this.
func (s *Service) ChunkUpload(ctx context.Context, r *multipart.Reader) error {
errCh := make(chan error)
go func() {
chunk, err := s.parseChunk(r)
if err != nil {
errCh <- fmt.Errorf("failed parsing chunk %w", err)
return
}
if err := os.MkdirAll(chunk.UploadDir, 02750); err != nil {
errCh <- err
return
}
if err := s.saveChunk(chunk); err != nil {
errCh <- fmt.Errorf("failed saving chunk %w", err)
return
}
}()
select {
case err := <-errCh:
return err
case <-ctx.Done():
return ctx.Err()
}
}
However, as I thought about the execution of the code I realized that this doesn't achieve my goal. Since all the function's logic is in a separate go routine even if context gets cancelled and I return ChunkUpload early the code within the go routine will continue to execute thus not really making a difference from the original code.
The next though was okay just pass a context to all inner functions like s.parseChunk and s.saveChunk but this option also doesn't seem right as I would need to implement cancellations in each function. What would be the proper way to refactor this original function to be context aware and terminate as soon as a context is cancelled?
Function calls and goroutines cannot be terminated from the caller, the functions and goroutines have to support the cancellation, often via a context.Context value or a done channel.
In either case, the functions are responsible to check / monitor the context, and if cancel is requested (when the Context's done channel is closed), return early. There isn't an easier / automatic way.
If the task executes code in a loop, a convenient solution is to check the done channel in each iteration, and return if it's closed. If the task is one "monolith", the implementor is responsible to use / insert "checkpoints" at which the task can be reasonably aborted early if such cancellation is requested.
An easy way to check if the done channel is closed is to use a non-blocking select, such as:
select {
case <-ctx.Done():
// Abort / return early
return
default:
}
Care must be taken when the task uses other channel operations, as they may block in a nondeterministic way. Those selects should include the ctx.Done() channel too:
select {
case v := <- someChannel:
// Do something with v
case <-ctx.Done():
// Abort / return early
return
}
Also be careful, because if the above receive from someChannel never blocks there is no guarantee a cancellation is properly handled, because if multiple communications can proceed in a select, one is chosen randomly (and there's no guarantee the <-ctx.Done() is ever chosen). In such case you may combine the above 2: first do a non-blocking check for cancellation, then use a select with your channel operations and the cancel monitoring.
When we talked about canceling, we talked about a long run function or a block repeating multiple times, such as http.Serve()
As to your case, assume saveChunk will cost seconds to run, and you want to cancel when it's saving. So we can split the chunk into pieces, and save one by one, after each piece.
for i:=0;i<n;i++{
select {
case err := <- s.saveChunk(chunk[i]):
{
if err != nil {
fmt.Errorf("failed saving chunk %w", err)
return
}
}
case <-ctx.Done():
return
}
}

goroutine deadlock: In an app that reads from a blockchain and writes to rethinkdb, have

Okay, so
My situation is this: It's been three weeks and some-odd hours since I've become entranced by golang. I'm working on a blockchain dump tool for steem, and I'm probably going to give a touch of gjson to github.com/go-steem/rpc, the library I currently rely on. Now, with this said, this question is about the goroutines for my current blockchain reader. Here it is (sorry a tad on the beefy side, but you'll see the part that I want to pull back into the library, too):
// Keep processing incoming blocks forever.
fmt.Println("---> Entering the block processing loop")
for {
// Get current properties.
props, err := Client.Database.GetDynamicGlobalProperties()
if err != nil {
fmt.Println(err)
}
// Process blocks.
for I := uint32(1); I <= props.LastIrreversibleBlockNum; I++ {
go getblock(I, Client, Rsession)
}
if err != nil {
fmt.Println(err)
}
}
}
func getblock(I uint32, Client *rpc.Client, Rsession *r.Session) {
block, err := Client.Database.GetBlock(I)
fmt.Println(I)
writeBlock(block, Rsession)
if err != nil {
fmt.Println(err)
}
}
func writeBlock(block *d.Block, Rsession *r.Session) {
//rethinkdb writes
r.Table("transactions").
Insert(block.Transactions).
Exec(Rsession)
r.Table("blocks").
Insert(block).
Exec(Rsession)
}
I just made a third edit to this, which was to call the function writeBlock from goroutine getBlock instead of the way I was doing things before. I'
Okay, so that is now resolved, but this is going to spawn another question, unfortunatley.
I've got the application working with the goroutine, however it hasn't increased performance any.
The way that I got it to work was by not spawning a goroutine from a goroutine and instead calling a plain function, writeBlock from the goroutine "getblock":
fmt.Println("---> Entering the block processing loop")
for {
// Get current properties.
props, err := Client.Database.GetDynamicGlobalProperties()
if err != nil {
fmt.Println(err)
}
// Process blocks.
for I := uint32(1); I <= props.LastIrreversibleBlockNum; I++ {
go getblock(I, Client, Rsession)
}
if err != nil {
fmt.Println(err)
}
}
}
func getblock(I uint32, Client *rpc.Client, Rsession *r.Session) {
block, err := Client.Database.GetBlock(I)
fmt.Println(I)
writeBlock(block, Rsession)
if err != nil {
fmt.Println(err)
}
}
func writeBlock(block *d.Block, Rsession *r.Session) {
//rethinkdb writes
r.Table("transactions").
Insert(block.Transactions).
Exec(Rsession)
r.Table("blocks").
Insert(block).
Exec(Rsession)
}

How can i interrupt a goroutine executing (*TCPListener) Accept?

I am playing with go lately and trying to make some server which responds to clients on a tcp connection.
My question is how do i cleanly shutdown the server and interrupt the go-routine which is currently "blocked" in the following call
func (*TCPListener) Accept?
According to the documentation of Accept
Accept implements the Accept method in the Listener interface; it waits for the next call and returns a generic Conn.
The errors are also very scarcely documented.
Simply Close() the net.Listener you get from the net.Listen(...) call and return from the executing goroutine.
TCPListener Deadline
You don't necessarily need an extra go routine (that keeps accepting), simply specify a Deadline.
for example:
for {
// Check if someone wants to interrupt accepting
select {
case <- someoneWantsToEndMe:
return // runs into "defer listener.Close()"
default: // nothing to do
}
// Accept with Deadline
listener.SetDeadline(time.Now().Add(1 * time.Second)
conn, err := listener.Accept()
if err != nil {
// TODO: Could do some err checking (to be sure it is a timeout), but for brevity
continue
}
go handleConnection(conn)
}
Here is what i was looking for. Maybe helps someone in the future.
Notice the use of select and the "c" channel to combine it with the exit channel
ln, err := net.Listen("tcp", ":8080")
if err != nil {
// handle error
}
defer ln.Close()
for {
type accepted struct {
conn net.Conn
err error
}
c := make(chan accepted, 1)
go func() {
conn, err := ln.Accept()
c <- accepted{conn, err}
}()
select {
case a := <-c:
if a.err != nil {
// handle error
continue
}
go handleConnection(a.conn)
case e := <-ev:
// handle event
return
}
}

Resources