Multiple docker container logs - go

I'm trying to get the logs from multiple docker containers at once (order doesn't matter). This works as expected if types.ContainerLogsOption.Follow is set to false.
If types.ContainerLogsOption.Follow is set to true sometimes the log output get stuck after a few logs and no follow up logs are printed to stdout.
If the output doesn't get stuck it works as expected.
Additionally if I restart one or all of the containers the command doesn't exit like docker logs -f containerName does.
func (w *Whatever) Logs(options LogOptions) {
readers := []io.Reader{}
for _, container := range options.Containers {
responseBody, err := w.Docker.Client.ContainerLogs(context.Background(), container, types.ContainerLogsOptions{
ShowStdout: true,
ShowStderr: true,
Follow: options.Follow,
})
defer responseBody.Close()
if err != nil {
log.Fatal(err)
}
readers = append(readers, responseBody)
}
// concatenate all readers to one
multiReader := io.MultiReader(readers...)
_, err := stdcopy.StdCopy(os.Stdout, os.Stderr, multiReader)
if err != nil && err != io.EOF {
log.Fatal(err)
}
}
Basically there is no great difference in my implementation from that of docker logs https://github.com/docker/docker/blob/master/cli/command/container/logs.go, hence I'm wondering what causes this issues.

As JimB commented, that method won't work due to the operation of io.MultiReader. What you need to do is read from each from each response individually and combine the output. Since you're dealing with logs, it would make sense to break up the reads on newlines. bufio.Scanner does this for a single io.Reader. So one option would be to create a new type that scans multiple readers concurrently.
You could use it like this:
scanner := NewConcurrentScanner(readers...)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatalln(err)
}
Example implementation of a concurrent scanner:
// ConcurrentScanner works like io.Scanner, but with multiple io.Readers
type ConcurrentScanner struct {
scans chan []byte // Scanned data from readers
errors chan error // Errors from readers
done chan struct{} // Signal that all readers have completed
cancel func() // Cancel all readers (stop on first error)
data []byte // Last scanned value
err error
}
// NewConcurrentScanner starts scanning each reader in a separate goroutine
// and returns a *ConcurrentScanner.
func NewConcurrentScanner(readers ...io.Reader) *ConcurrentScanner {
ctx, cancel := context.WithCancel(context.Background())
s := &ConcurrentScanner{
scans: make(chan []byte),
errors: make(chan error),
done: make(chan struct{}),
cancel: cancel,
}
var wg sync.WaitGroup
wg.Add(len(readers))
for _, reader := range readers {
// Start a scanner for each reader in it's own goroutine.
go func(reader io.Reader) {
defer wg.Done()
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
select {
case s.scans <- scanner.Bytes():
// While there is data, send it to s.scans,
// this will block until Scan() is called.
case <-ctx.Done():
// This fires when context is cancelled,
// indicating that we should exit now.
return
}
}
if err := scanner.Err(); err != nil {
select {
case s.errors <- err:
// Reprort we got an error
case <-ctx.Done():
// Exit now if context was cancelled, otherwise sending
// the error and this goroutine will never exit.
return
}
}
}(reader)
}
go func() {
// Signal that all scanners have completed
wg.Wait()
close(s.done)
}()
return s
}
func (s *ConcurrentScanner) Scan() bool {
select {
case s.data = <-s.scans:
// Got data from a scanner
return true
case <-s.done:
// All scanners are done, nothing to do.
case s.err = <-s.errors:
// One of the scanners error'd, were done.
}
s.cancel() // Cancel context regardless of how we exited.
return false
}
func (s *ConcurrentScanner) Bytes() []byte {
return s.data
}
func (s *ConcurrentScanner) Text() string {
return string(s.data)
}
func (s *ConcurrentScanner) Err() error {
return s.err
}
Here's an example of it working in the Go Playground: https://play.golang.org/p/EUB0K2V7iT
You can see that the concurrent scanner output is interleaved. Rather than reading all of one reader, then moving on to the next, as is seen with io.MultiReader.

Related

Problem synchronizing composable goroutines by reading data from file with scanner.Scan()

I'm building a filtering pipeline with channels by function composition instead of 120+LOC all in a single method
since I might reuse some part of this pipeline later.
I'm not been able to make it work as intended. I suspect that the funcion readValuesFromFile
is exiting before the scanner.Scan() puts a value in the inputStream channel (ie
that method's main goroutine is exiting before (1) goroutine).
If I replace the scanner.Scan() with just putting some random string in the channel
the whole pipeline works as expected.
Is this the problem or I'm missing something?
How can this be fixed in an elegant way?
Thanks!
func readValuesFromFile(filename string) <-chan string {
file, err := os.Open(filename)
if err != nil {
log.Fatal(err)
}
defer file.Close()
inputStream := make(chan string)
go func() { //(1)
count := 0
scanner := bufio.NewScanner(file)
for scanner.Scan() { // (2)
inputStream <- strings.TrimSpace(scanner.Text())
count = count + 1
}
close(inputStream)
}()
return inputStream
}
func validateValues(inputStream <-chan string) <-chan string {
//read from the input stream + validate&filter + creating and putting values in an output stream
}
func writeResults(validStream <-chan string) {
//read from the validated stream and write data to file
}
func main() {
valueStream := readValuesFromFile("myfile.txt")
validatedStream := validateValues(valueStream)
writeResults(validatedStream)
}
The function readValuesFromFile is guaranteed to return before the first value is sent to inputStream. Communication on the unbuffered channel inputStream does not succeed until a sender and receiver are ready. There is no receive on inputStream until after readValuesFromFile returns, therefore the send from the goroutine will not succeed until after readValuesFromFile returns.
When the function readValuesFromFile returns, the defer statement closes the file used by the scanner. It's possible that the scanner buffers some data before the file is closed underneath it, but it's also possible that the scanner does not read any data.
Fix by closing the file from the goroutine.
The error returned from the scanner describes problem. Always handle errors.
func readValuesFromFile(filename string) <-chan string {
file, err := os.Open(filename)
if err != nil {
log.Fatal(err)
}
inputStream := make(chan string)
go func() {
defer file.Close()
defer close(inputStream)
count := 0
scanner := bufio.NewScanner(file)
for scanner.Scan() {
inputStream <- strings.TrimSpace(scanner.Text())
count = count + 1
}
if scanner.Err() != nil {
// Handle error as appropriate for your application.
log.Print("scan error", err)
}
}()
return inputStream
}

Check if all goroutines have finished without using wg.Wait()

Let's say I have a function IsAPrimaryColour() which works by calling three other functions IsRed(), IsGreen() and IsBlue(). Since the three functions are quite independent of one another, they can run concurrently. The return conditions are:
If any of the three functions returns true, IsAPrimaryColour()
should also return true. There is no need to wait for the other
functions to finish. That is: IsPrimaryColour() is true if IsRed() is true OR IsGreen() is true OR IsBlue() is true
If all functions return false, IsAPrimaryColour() should also return
false. That is: IsPrimaryColour() is false if IsRed() is false AND IsGreen() is false AND IsBlue() is false
If any of the three functions returns an error, IsAPrimaryColour()
should also return the error. There is no need to wait for the other
functions to finish, or to collect any other errors.
The thing I'm struggling with is how to exit the function if any other three functions return true, but also to wait for all three to finish if they all return false. If I use a sync.WaitGroup object, I will need to wait for all 3 go routines to finish before I can return from the calling function.
Therefore, I'm using a loop counter to keep track of how many times I have received a message on a channel and existing the program once I have received all 3 messages.
https://play.golang.org/p/kNfqWVq4Wix
package main
import (
"errors"
"fmt"
"time"
)
func main() {
x := "something"
result, err := IsAPrimaryColour(x)
if err != nil {
fmt.Printf("Error: %v\n", err)
} else {
fmt.Printf("Result: %v\n", result)
}
}
func IsAPrimaryColour(value interface{}) (bool, error) {
found := make(chan bool, 3)
errors := make(chan error, 3)
defer close(found)
defer close(errors)
var nsec int64 = time.Now().UnixNano()
//call the first function, return the result on the 'found' channel and any errors on the 'errors' channel
go func() {
result, err := IsRed(value)
if err != nil {
errors <- err
} else {
found <- result
}
fmt.Printf("IsRed done in %f nanoseconds \n", float64(time.Now().UnixNano()-nsec))
}()
//call the second function, return the result on the 'found' channel and any errors on the 'errors' channel
go func() {
result, err := IsGreen(value)
if err != nil {
errors <- err
} else {
found <- result
}
fmt.Printf("IsGreen done in %f nanoseconds \n", float64(time.Now().UnixNano()-nsec))
}()
//call the third function, return the result on the 'found' channel and any errors on the 'errors' channel
go func() {
result, err := IsBlue(value)
if err != nil {
errors <- err
} else {
found <- result
}
fmt.Printf("IsBlue done in %f nanoseconds \n", float64(time.Now().UnixNano()-nsec))
}()
//loop counter which will be incremented every time we read a value from the 'found' channel
var counter int
for {
select {
case result := <-found:
counter++
fmt.Printf("received a value on the results channel after %f nanoseconds. Value of counter is %d\n", float64(time.Now().UnixNano()-nsec), counter)
if result {
fmt.Printf("some goroutine returned true\n")
return true, nil
}
case err := <-errors:
if err != nil {
fmt.Printf("some goroutine returned an error\n")
return false, err
}
default:
}
//check if we have received all 3 messages on the 'found' channel. If so, all 3 functions must have returned false and we can thus return false also
if counter == 3 {
fmt.Printf("all goroutines have finished and none of them returned true\n")
return false, nil
}
}
}
func IsRed(value interface{}) (bool, error) {
return false, nil
}
func IsGreen(value interface{}) (bool, error) {
time.Sleep(time.Millisecond * 100) //change this to a value greater than 200 to make this function take longer than IsBlue()
return true, nil
}
func IsBlue(value interface{}) (bool, error) {
time.Sleep(time.Millisecond * 200)
return false, errors.New("something went wrong")
}
Although this works well enough, I wonder if I'm not overlooking some language feature to do this in a better way?
errgroup.WithContext can help simplify the concurrency here.
You want to stop all of the goroutines if an error occurs, or if a result is found. If you can express “a result is found” as a distinguished error (along the lines of io.EOF), then you can use errgroup's built-in “cancel on first error” behavior to shut down the whole group:
func IsAPrimaryColour(ctx context.Context, value interface{}) (bool, error) {
var nsec int64 = time.Now().UnixNano()
errFound := errors.New("result found")
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
result, err := IsRed(ctx, value)
if result {
err = errFound
}
fmt.Printf("IsRed done in %f nanoseconds \n", float64(time.Now().UnixNano()-nsec))
return err
})
…
err := g.Wait()
if err == errFound {
fmt.Printf("some goroutine returned errFound\n")
return true, nil
}
if err != nil {
fmt.Printf("some goroutine returned an error\n")
return false, err
}
fmt.Printf("all goroutines have finished and none of them returned true\n")
return false, nil
}
(https://play.golang.org/p/MVeeBpDv4Mn)
some remarks,
you dont need to close the channels, you know before hand the expected count of signals to read. This is sufficient for an exit condition.
you dont need to duplicate manual function calls, use a slice.
since you use a slice, you dont even need a counter, or a static value of 3, just look at the length of your func slice.
that default case into the switch is useless. just block on the input you are waiting for.
So once you got ride of all the fat, the code looks like
func IsAPrimaryColour(value interface{}) (bool, error) {
fns := []func(interface{}) (bool, error){IsRed, IsGreen, IsBlue}
found := make(chan bool, len(fns))
errors := make(chan error, len(fns))
for i := 0; i < len(fns); i++ {
fn := fns[i]
go func() {
result, err := fn(value)
if err != nil {
errors <- err
return
}
found <- result
}()
}
for i := 0; i < len(fns); i++ {
select {
case result := <-found:
if result {
return true, nil
}
case err := <-errors:
if err != nil {
return false, err
}
}
}
return false, nil
}
you dont need to obsereve the time at the each and every async calls, just observe the time the overall caller took to return.
func main() {
now := time.Now()
x := "something"
result, err := IsAPrimaryColour(x)
if err != nil {
fmt.Printf("Error: %v\n", err)
} else {
fmt.Printf("Result: %v\n", result)
}
fmt.Println("it took", time.Since(now))
}
https://play.golang.org/p/bARHS6c6m1c
The idiomatic way to handle multiple concurrent function calls, and cancel any outstanding after a condition, is with the use of a context value. Something like this:
func operation1(ctx context.Context) bool { ... }
func operation2(ctx context.Context) bool { ... }
func operation3(ctx context.Context) bool { ... }
func atLeastOneSuccess() bool {
ctx, cancel := context.WithCancel(context.Background()
defer cancel() // Ensure any functions still running get the signal to stop
results := make(chan bool, 3) // A channel to send results
go func() {
results <- operation1(ctx)
}()
go func() {
results <- operation2(ctx)
}()
go func() {
results <- operation3(ctx)
}()
for i := 0; i < 3; i++ {
result := <-results
if result {
// One of the operations returned success, so we'll return that
// and let the deferred call to cancel() tell any outstanding
// functions to abort.
return true
}
}
// We've looped through all return values, and they were all false
return false
}
Of course this assumes that each of the operationN functions actually honors a canceled context. This answer discusses how to do that.
You don't have to block the main goroutine on the Wait, you could block something else, for example:
doneCh := make(chan struct{}{})
go func() {
wg.Wait()
close(doneCh)
}()
Then you can wait on doneCh in your select to see if all the routines have finished.

Wait for multiple callbacks with timeout in go without busy waiting or polling

In go I have two callbacks that eventually do not fire.
registerCb(func() {...})
registerCb(func() {...})
/* Wait for both func to execute with timeout */
I want to wait for both of them but having a timeout if one is not executed.
sync.WaitGroup does not work, since it is blocking and not channel based. Also you call WaitGroup.Done() without the risk of panic outside the callbacks.
My current solution is using just two booleans and a busy wait loop. But that's not satisfying.
Is there any idiomatic way that do not use polling or busy waiting?
Update:
Here is some code that demonstrates a busy wait solution but should return as soon as both callbacks are fired or after the timeout, without using polling
package main
import (
"fmt"
"log"
"sync"
"time"
)
var cbOne func()
var cbTwo func()
func registerCbOne(cb func()) {
cbOne = cb
}
func registerCbTwo(cb func()) {
cbTwo = cb
}
func executeCallbacks() {
<-time.After(1 * time.Second)
cbOne()
// Might never happen
//<-time.After(1 * time.Second)
//cbTwo()
}
func main() {
// Some process in background will execute our callbacks
go func() {
executeCallbacks()
}()
err := WaitAllOrTimeout(3 * time.Second)
if err != nil {
fmt.Println("Error: ", err.Error())
}
fmt.Println("Hello, playground")
}
func WaitAllOrTimeout(to time.Duration) error {
cbOneDoneCh := make(chan bool, 1)
cbTwoDoneCh := make(chan bool, 1)
cbOneDone := false
cbTwoDone := false
registerCbOne(func() {
fmt.Println("cb One");
cbOneDoneCh <- true
})
registerCbTwo(func() {
fmt.Println("cb Two");
cbTwoDoneCh <- true
})
// Wait for cbOne and cbTwo to be executed or a timeout
// Busywait solution
for {
select {
case <-time.After(to):
if cbOneDone && cbTwoDone {
fmt.Println("Both CB executed (we could poll more often)")
return nil
}
fmt.Println("Timeout!")
return fmt.Errorf("Timeout")
case <-cbOneDoneCh:
cbOneDone = true
case <-cbTwoDoneCh:
cbTwoDone = true
}
}
}
This is a followup to my comment, added after you added your example solution. To be clearer than I can in comments, your example code is actually not that bad. Here is your original example:
// Busywait solution
for {
select {
case <-time.After(to):
if cbOneDone && cbTwoDone {
fmt.Println("Both CB executed (we could poll more often)")
return nil
}
fmt.Println("Timeout!")
return fmt.Errorf("Timeout")
case <-cbOneDoneCh:
cbOneDone = true
case <-cbTwoDoneCh:
cbTwoDone = true
}
}
This isn't a "busy wait" but it does have several bugs (including the fact that you need an only-once send semantic for the done channels, or maybe easier and at least as good, to just close them once when done, perhaps using sync.Once). What we want to do is:
Start a timer with to as the timeout.
Enter a select loop, using the timer's channel and the two "done" channels.
We want to exit the select loop when the first of the following events occurs:
the timer fires, or
both "done" channels have been signaled.
If we're going to close the two done channels we'll want to have the Ch variables cleared (set to nil) as well so that the selects don't spin—that would turn this into a true busy-wait—but for the moment let's just assume instead that we send exactly once on them on callback, and otherwise just leak the channels, so that we can use your code as written as those selects will only ever return once. Here's the updated code:
t := timer.NewTimer(to)
for !cbOneDone || !cbTwoDone {
select {
case <-t.C:
fmt.Println("Timeout!")
return fmt.Errorf("timeout")
}
case <-cbOneDoneCh:
cbOneDone = true
case <-cbTwoDoneCh:
cbTwoDone = true
}
}
// insert t.Stop() and receive here to drain t.C if desired
fmt.Println("Both CB executed")
return nil
Note that we will go through the loop at most two times:
If we receive from both Done channels, once each, the loop stops without a timeout. There's no spinning/busy-waiting: we never received anything from t.C. We return nil (no error).
If we receive from one Done channel, the loop resumes but blocks waiting for the timer or the other Done channel.
If we ever receive from t.C, it means we didn't get both callbacks yet. We may have had one, but there's been a timeout and we choose to give up, which was our goal. We return an error, without going back through the loop.
A real version needs a bit more work to clean up properly and avoid leaking "done" channels (and the timer channel and its goroutine; see comment), but this is the general idea. You're already turning the callbacks into channel operations, and you already have a timer with its channel.
func wait(ctx context.Context, wg *sync.WaitGroup) error {
done := make(chan struct{}, 1)
go func() {
wg.Wait()
done <- struct{}{}
}()
select {
case <-done:
// Counter is 0, so all callbacks completed.
return nil
case <-ctx.Done():
// Context cancelled.
return ctx.Err()
}
}
Alternatively, you can pass a time.Duration and block on <-time.After(d) rather than on <-ctx.Done(), but I would argue that using context is more idiomatic.
below code present two variations,
the first is the regular pattern, nothing fancy, it does the job and does it well. You launch your callbacks into a routine, you make them push to a sink, listen that sink for a result or timeout. Take care to the sink channel initial capacity, to prevent leaking a routine it must match the number of callbacks.
the second factories out the synchronization mechanisms into small functions to assemble, two wait methods are provided, waitAll and waitOne. Nice to write, but definitely less efficient, more allocations, more back and forth with more channels, more complex to reason about, more subtle.
package main
import (
"fmt"
"log"
"sync"
"time"
)
func main() {
ExampleOne()
ExampleTwo()
ExampleThree()
fmt.Println("Hello, playground")
}
func ExampleOne() {
log.Println("start reg")
errs := make(chan error, 2)
go func() {
fn := callbackWithOpts("reg: so slow", 2*time.Second, nil)
errs <- fn()
}()
go func() {
fn := callbackWithOpts("reg: too fast", time.Millisecond, fmt.Errorf("broke!"))
errs <- fn()
}()
select {
case err := <-errs: // capture only one result,
// the fastest to finish.
if err != nil {
log.Println(err)
}
case <-time.After(time.Second): // or wait that many amount of time,
// in case they are all so slow.
}
log.Println("done reg")
}
func ExampleTwo() {
log.Println("start wait")
errs := waitAll(
withTimeout(time.Second,
callbackWithOpts("waitAll: so slow", 2*time.Second, nil),
),
withTimeout(time.Second,
callbackWithOpts("waitAll: too fast", time.Millisecond, nil),
),
)
for err := range trim(errs) {
if err != nil {
log.Println(err)
}
}
log.Println("done wait")
}
func ExampleThree() {
log.Println("start waitOne")
errs := waitOne(
withTimeout(time.Second,
callbackWithOpts("waitOne: so slow", 2*time.Second, nil),
),
withTimeout(time.Second,
callbackWithOpts("waitOne: too fast", time.Millisecond, nil),
),
)
for err := range trim(errs) {
if err != nil {
log.Println(err)
}
}
log.Println("done waitOne")
}
// a configurable callback for playing
func callbackWithOpts(msg string, tout time.Duration, err error) func() error {
return func() error {
<-time.After(tout)
fmt.Println(msg)
return err
}
}
// withTimeout return a function that returns first error or times out and return nil
func withTimeout(tout time.Duration, h func() error) func() error {
return func() error {
d := make(chan error, 1)
go func() {
d <- h()
}()
select {
case err := <-d:
return err
case <-time.After(tout):
}
return nil
}
}
// wait launches all func() and return their errors into the returned error channel; (merge)
// It is the caller responsability to drain the output error channel.
func waitAll(h ...func() error) chan error {
d := make(chan error, len(h))
var wg sync.WaitGroup
for i := 0; i < len(h); i++ {
wg.Add(1)
go func(h func() error) {
defer wg.Done()
d <- h()
}(h[i])
}
go func() {
wg.Wait()
close(d)
}()
return d
}
// wait launches all func() and return the first error into the returned error channel
// It is the caller responsability to drain the output error channel.
func waitOne(h ...func() error) chan error {
d := make(chan error, len(h))
one := make(chan error, 1)
var wg sync.WaitGroup
for i := 0; i < len(h); i++ {
wg.Add(1)
go func(h func() error) {
defer wg.Done()
d <- h()
}(h[i])
}
go func() {
for err := range d {
one <- err
close(one)
break
}
}()
go func() {
wg.Wait()
close(d)
}()
return one
}
func trim(err chan error) chan error {
out := make(chan error)
go func() {
for e := range err {
out <- e
}
close(out)
}()
return out
}

Is there any possibility of a goroutine interrupt without panic?

i'm setting up a service, provide http server and run goroutine to deal with some job, look code
once in a cycle, a sub job seems like interrupt, there is no logs after one func call
it didn't catch any panic err, and the defer seems not trigger because the mutex lock is not unlock
the log is not interrupted or lost
log of other job is complete
there is no restart or exit or oom kill on those time
this is for a CentOS 7.5, my service running in docker
go1.11
docker 18.09
this is an occasional bug, i add more log and open pprof, and try to reproduce this bug
main.go
func main() {
....
// this is a cycle job, with custom time intervals
router.Cycle(r)
....
endless.ListenAndServe(":"+conf.Conf.Port, r)
}
router/cycle.go
// this is a loop job, when job end, sleep custom time intervals and run again
// implemented by encapsulating a goroutine, and create a context
func Cycle(g *gin.Engine) {
cyclec := cli.InitCycle(g)
cyclec.AddFunc(time.Second, schedule.RunSomeDeal)
cyclec.Start()
}
///RunSomeDeal
func RunSomeDeal(c *gin.Context) error {
...
// deal some sub job
for i := 0; i < missionLen; i++ {
// this is once job, like cycle but only run once
// a new context is generated by passing the exist context and a goroutine executes the callback function
helpers.Job.Run(c, func(newCtx *gin.Context) error {
return DealMission(newCtx, someparams...)
})
}
return nil
}
// Job.Run
func (c *Job) Run(ctx *gin.Context, f func(ctx *gin.Context) error) {
e := &Entry{
Job: FuncJob(f),
}
if c.getJobContext != nil {
e.span = c.getJobContext(ctx)
}
go c.runWithRecovery(e)
}
func (c *Job) runWithRecovery(e *Entry) {
ctx := gin.CreateNewContext(c.gin)
...
defer func() {
if r := recover(); r != nil {
const size = 64 << 10
buf := make([]byte, size)
buf = buf[:runtime.Stack(buf, false)]
requestId, _ := ctx.Get("requestId")
handleName := ctx.CustomContext.HandlerName()
info, _ := json.Marshal(map[string]interface{}{
...some kv for log
})
log.Printf(...)
}
gin.RecycleContext(c.gin, ctx)
}()
if c.beforeRun != nil {
ok := c.beforeRun(ctx, e.span)
if !ok {
return
}
}
error := e.Job.Run(ctx)
...
if c.afterRun != nil {
c.afterRun(ctx)
}
}
// DealMission
func DealMission(c *gin.Context, params...) {
// lock something use sync.mutex
doSomeLock()
defer func() {
// ...not trigger
unlockErr := unlockxxxxx(...)
if unlockErr != nil {
panic("some error info")
}
} ()
base.DebugLog(...)
err := SomeOtherFunc(c, params...)
base.DebugLog(...)
}
// some other func
func SomeOtherFunc(ctx *gin.Context, params...) error {
err := CallOther()
base.DebugLog(...)
err := CallOther()
base.DebugLog(...)
// there is no logs after this call func, and Job.runWithRecovery not catch any panic error
err := CallOther()
// print log...
base.DebugLog(...)
}
in this sub job, the log stop at a certain line, and no panic,no error, and the defer seems not trigger because the mutex lock is not unlock
log for other job is well, and log of next cycle job is well too

Go channel infinite loop

I am trying to catch errors from a group of goroutines using a channel, but the channel enters an infinite loop, starts consuming CPU.
func UnzipFile(f *bytes.Buffer, location string) error {
zipReader, err := zip.NewReader(bytes.NewReader(f.Bytes()), int64(f.Len()))
if err != nil {
return err
}
if err := os.MkdirAll(location, os.ModePerm); err != nil {
return err
}
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, errorList)
fileWaitGroup := &sync.WaitGroup{}
for _, file := range zipReader.File {
fileWaitGroup.Add(1)
go writeZipFileToLocal(file, location, errorChannel, fileWaitGroup)
}
fileWaitGroup.Wait()
close(errorChannel)
log.Println(errorList)
return nil
}
func errorChannelWatch(ch chan error, list []error) {
for {
select {
case err := <- ch:
list = append(list, err)
}
}
}
func writeZipFileToLocal(file *zip.File, location string, ch chan error, wg *sync.WaitGroup) {
defer wg.Done()
zipFilehandle, err := file.Open()
if err != nil {
ch <- err
return
}
defer zipFilehandle.Close()
if file.FileInfo().IsDir() {
if err := os.MkdirAll(filepath.Join(location, file.Name), os.ModePerm); err != nil {
ch <- err
}
return
}
localFileHandle, err := os.OpenFile(filepath.Join(location, file.Name), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
if err != nil {
ch <- err
return
}
defer localFileHandle.Close()
if _, err := io.Copy(localFileHandle, zipFilehandle); err != nil {
ch <- err
return
}
ch <- fmt.Errorf("Test error")
}
So I am looping a slice of files and writing them to my disk, when there is an error I report back to the errorChannel to save that error into a slice.
I use a sync.WaitGroup to wait for all goroutines and when they are done I want to print errorList and check if there was any error during the execution.
The list is always empty, even if I add ch <- fmt.Errorf("test") at the end of writeZipFileToLocal and the channel always hangs up.
I am not sure what I am missing here.
1. For the first point, the infinite loop:
Citing from golang language spec:
A receive operation on a closed channel can always proceed
immediately, yielding the element type's zero value after any
previously sent values have been received.
So in this function
func errorChannelWatch(ch chan error, list []error) {
for {
select {
case err := <- ch:
list = append(list, err)
}
}
}
after ch gets closed this turns into an infinite loop adding nil values to list.
Try this instead:
func errorChannelWatch(ch chan error, list []error) {
for err := range ch {
list = append(list, err)
}
}
2. For the second point, why you don't see anything in your error list:
The problem is this call:
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, errorList)
Here you hand errorChannelWatch the errorList as a value. So the slice errorList will not be changed by the function. What is changed, is the underlying array, as long as the append calls don't need to allocate a new one.
To remedy the situation, either hand a slice pointer to errorChannelWatch or rewrite it as a call to a closure, capturing
errorList.
For the first proposed solution, change errorChannelWatch to
func errorChannelWatch(ch chan error, list *[]error) {
for err := range ch {
*list = append(*list, err)
}
}
and the call to
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, &errorList)
For the second proposed solution, just change the call to
errorChannel := make(chan error)
errorList := []error{}
go func() {
for err := range errorChannel {
errorList = append(errorList, err)
}
} ()
3. A minor remark:
One could think, that there is a synchronisation problem here:
fileWaitGroup.Wait()
close(errorChannel)
log.Println(errorList)
How can you be sure, that errorList isn't modified, after the call to close? One could reason, that you can't know, how many values the goroutine errorChannelWatch still has to process.
Your synchronisation seems correct to me, as you do the wg.Done()
after the send to the error channel and so all error values will
be sent, when fileWaitGroup.Wait() returns.
But that can change, if someone later adds a buffering to the error
channel or alters the code.
So I would advise to at least explain the synchronisation in a comment.

Resources