Streaming commands output progress - go

I'm writing a service that has to stream output of a executed command both to parent and to log. When there is a long process, the problem is that cmd.StdoutPipe gives me a final (string) result.
Is it possible to give partial output of what is going on, like in shell
func main() {
cmd := exec.Command("sh", "-c", "some long runnig task")
stdout, _ := cmd.StdoutPipe()
cmd.Start()
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
m := scanner.Text()
fmt.Println(m)
log.Printf(m)
}
cmd.Wait()
}
P.S. Just to output would be:
cmd.Stdout = os.Stdout
But in my case it is not enough.

The code you posted works (with a reasonable command executed).
Here is a simple "some long running task" written in Go for you to call and test your code:
func main() {
fmt.Println("Child started.")
time.Sleep(time.Second*2)
fmt.Println("Tick...")
time.Sleep(time.Second*2)
fmt.Println("Child ended.")
}
Compile it and call it as your command. You will see the different lines appear immediately as written by the child process, "streamed".
Reasons why it may not work for you
The Scanner returned by bufio.NewScanner() reads whole lines and only returns something if a newline character is encountered (as defined by the bufio.ScanLines() function).
If the command you execute doesn't print newline characters, its output won't be returned immediately (only when newline character is printed, internal buffer is filled or the process ends).
Possible workarounds
If you have no guarantee that the child process prints newline characters but you still want to stream the output, you can't read whole lines. One solution is to read by words, or even read by characters (runes). You can achieve this by setting a different split function using the Scanner.Split() method:
scanner := bufio.NewScanner(stdout)
scanner.Split(bufio.ScanRunes)
The bufio.ScanRunes function reads the input by runes so Scanner.Scan() will return whenever a new rune is available.
Or reading manually without a Scanner (in this example byte-by-byte):
oneByte := make([]byte, 1)
for {
_, err := stdout.Read(oneByte)
if err != nil {
break
}
fmt.Printf("%c", oneByte[0])
}
Note that the above code would read runes that multiple bytes in UTF-8 encoding incorrectly. To read multi UTF-8-byte runes, we need a bigger buffer:
oneRune := make([]byte, utf8.UTFMax)
for {
count, err := stdout.Read(oneRune)
if err != nil {
break
}
fmt.Printf("%s", oneRune[:count])
}
Things to keep in mind
Processes have default buffers for standard output and for standard error (usually the size of a few KB). If a process writes to the standard output or standard error, it goes into the respective buffer. If this buffer gets full, further writes will block (in the child process). If you don't read the standard output and standard error of a child process, your child process may hang if the buffer is full.
So it is recommended to always read both the standard output and error of a child process. Even if you know that the command don't normally write to its standard error, if some error occurs, it will probably start dumping error messages to its standard error.
Edit: As Dave C mentions by default the standard output and error streams of the child process are discarded and will not cause a block / hang if not read. But still, by not reading the error stream you might miss a thing or two from the process.

I found good examples how to implement progress output in this article by Krzysztof Kowalczyk

Related

Scan strings containing white spaces using fmt.Scan()/fmt.Scanf()/fmt.Scanln()?

Using the Go language, to read input strings with spaces, I have to use
s, err := bufio.NewReader(os.Stdin).ReadString('\n')
Is there is any way to use fmt.Scan, fmt.Scanf, or fmt.Scanln()?
If you're building a CLI tool I highly suggest you check out cobra. It's written in pure go (see dependencies) and used by multiple well-known projects.
Alternatively, I wrote a quick (gross) example to demonstrate how you could gain finer control with the Reader interface by linearly reading individual bytes from stdin.
func byteByByte() [][]byte {
reader := bufio.NewReader(os.Stdin)
buffer, result := []byte{}, [][]byte{}
for {
c, err := reader.ReadByte()
if err != nil {
break
}
if c == byte(32) {
result, buffer = append(result, buffer), []byte{}
continue
}
buffer = append(buffer, c)
}
return result
}
Here we are temporarily buffering results until a space is reached, at which, the temporary buffer is dumped into a larger one.
This is meant as an example to show you how the reader interface can be used with more control/granularity, not as a piece of code to be used verbatim.

Reliably capture the output of an external command

I need to call lots of short-lived (and occasionally some long-lived) external processes in rapid succession and process both stdout and stderr in realtime. I've found numerous solutions for this using StdoutPipe and StderrPipe with a bufio.Scanner for each, packaged into goroutines. This works most of the time, but it swallows the external command's output occasionally, and I can't figure out why.
Here's a minimal example displaying that behaviour on MacOS X (Mojave) and on Linux:
package main
import (
"bufio"
"log"
"os/exec"
"sync"
)
func main() {
for i := 0; i < 50000; i++ {
log.Println("Loop")
var wg sync.WaitGroup
cmd := exec.Command("echo", "1")
stdout, err := cmd.StdoutPipe()
if err != nil {
panic(err)
}
cmd.Start()
stdoutScanner := bufio.NewScanner(stdout)
stdoutScanner.Split(bufio.ScanLines)
wg.Add(1)
go func() {
for stdoutScanner.Scan() {
line := stdoutScanner.Text()
log.Printf("[stdout] %s\n", line)
}
wg.Done()
}()
cmd.Wait()
wg.Wait()
}
}
I've left out the stderr handling for this. When running this, I get only about 49,900 [stdout] 1 lines (the actual number varies with each run), though there should be 50,000. I'm seeing 50,000 loop lines, so it doesn't seem to die prematurely. This smells like a race condition somewhere, but I can't figure out where.
It works just fine if I don't put the scanning loop in a goroutine, but then I lose the ability to simultaneously read stderr, which I need.
I've tried running this with -race, Go reports no data races.
I'm out of ideas, what am I getting wrong?
You're not checking for errors in several places.
In some, this is not actually causing problems, but it's still a good idea to check:
cmd.Start()
may return an error, in which case the command was never run. (This is not the actual problem.)
When stdoutScanner.Scan() returns false, stdoutScanner.Err() may show an error. If you start checking this, you'll find some errors:
2020/02/19 15:38:17 [stdout err] read |0: file already closed
This isn't the actual problem, but—aha—this matches the symptoms you see: not all of the output got seen. Now, why would reading stdout claim that the file is closed? Well, where did stdout come from? It's from here:
stdout, err := cmd.StdoutPipe()
Take a look at the source code for this function, which ends with these lines:
c.closeAfterStart = append(c.closeAfterStart, pw)
c.closeAfterWait = append(c.closeAfterWait, pr)
return pr, nil
(and pr is the pipe-read return value). Hmm: what could closeAfterWait mean?
Now, here are your last two lines in your loop:
cmd.Wait()
wg.Wait()
That is, first we wait for cmd to finish. (When cmd finishes, what gets closed?) Then we wait for the goroutine that's reading cmd's stdout to finish. (Hm, what could still be reading from the pr pipe?)
The fix is now obvious: swap the wg.Wait(), which waits for the consumer of the stdout pipe to finish reading it, with the cmd.Wait(), which waits for echo ... to exit and then closes the read end of the pipe. If you close while the readers are still reading, they may never read what you expected.

Copy exec.Command output to file as the buffer receives data

I have a script that dumps quite a bit of text into STDOUT when run. I'm trying to execute this script and write the output to a file without holding the entire buffer in memory at one time. (We're talking many megabytes of text that this script outputs at one time.)
The following works, but because I'm doing this across several goroutines, my memory consumption shoots up to > 5GB which I would really like to avoid:
var out bytes.Buffer
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Stdout = &out
err := cmd.Run()
if err != nil {
log.Fatal(err)
}
out.WriteTo(io) // io is the writer connected to the new file
Ideally as out fills up, I want to be emptying it into my target file to keep memory usage low. I've tried changing this to:
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Start()
stdout, _ := cmd.StdoutPipe()
r := *bufio.NewReader(stdout)
r.WriteTo(io)
cmd.Wait()
However when I print out these variables stdout is <nil>, r is {[0 0 0 0 0...]}, and r.WriteTo panics: invalid memory address or nil pointer dereference.
Is it possible to write the output of cmd as it is generated to keep memory usage down? Thanks!
Why don't you just write to a file directly?
file, _ := os.Create("/some/file")
cmd.Stdout = file
Or use your io thing (that's a terrible name for a variable, by the way, since it's a) the name of a standard library package, b) ambiguous--what does it mean?)
cmd.Stdout = io

How to resume reading after EOF in named pipe

I'm writing a program which opens a named pipe for reading, and then processes any lines written to this pipe:
err = syscall.Mkfifo("/tmp/myfifo", 0666)
if err != nil {
panic(err)
}
pipe, err := os.OpenFile("/tmp/myfifo", os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
panic(err)
}
reader := bufio.NewReader(pipe)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
process(line)
}
This works fine as long as the writing process does not restart or for other reasons send an EOF. When this happens, the loop terminates (as expected from the specifications of Scanner).
However, I want to keep the pipe open to accept further writes. I could just reinitialize the scanner of course, but I believe this would create a race condition where the scanner might not be ready while a new process has begun writing to the pipe.
Are there any other options? Do I need to work directly with the File type instead?
From the bufio GoDoc:
Scan ... returns false when the scan stops, either by reaching the end of the input or an error.
So you could possibly leave the file open and read until EOF, then trigger scanner.Scan() again when the file has changed or at a regular interval (i.e. make a goroutine), and make sure the pipe variable doesn't go out of scope so you can reference it again.
If I understand your concern about a race condition correctly, this wouldn't be an issue (unless write and read operations must be synchronized) but when the scanner is re-initialized it will end up back at the beginning of the file.

Race conditions in io.Pipe?

I have a function which returns the Reader end of an io.Pipe and kicks off a go-routine which writes data to the Writer end of it, and then closes the pipe.
func GetPipeReader() io.ReadCloser {
r, w := io.Pipe()
go func() {
_, err := io.CopyN(w, SomeReaderOfSize(N), N)
w.CloseWithError(err)
}()
return r
}
func main() {
var buf bytes.Buffer
io.Copy(&buf, GetPipeReader())
println("got", buf.Len(), "bytes")
}
https://play.golang.org/p/OAijIwmtRr
This seems to always work in my testing, in that I get all the data I wrote. But the API docs are a bit worrying to me:
func Pipe() (*PipeReader, *PipeWriter)
Pipe creates a synchronous in-memory pipe. [...] Reads on one end are
matched with writes on the other, [...] there is no internal
buffering.
func (w *PipeWriter) CloseWithError(err error) error
CloseWithError closes the writer; subsequent reads from the read half
of the pipe will return no bytes and the error err, or EOF if err is
nil.
What I want to know is, what are the possible race conditions here? Is is plausible that my go-routine will write a bunch of data and then close the pipe before I can read it all?
Do I need to use a channel for some signalling on when to close? What can go wrong, basically.
No, there are no race conditions. As the documentation mentions, reads on one end are matched with writes on the other. So, when CloseWithError() is reached, it means every Write has successfully completed and been matched with a corresponding Read - so the other end must have read everything there was to read.

Resources