Copy exec.Command output to file as the buffer receives data - go

I have a script that dumps quite a bit of text into STDOUT when run. I'm trying to execute this script and write the output to a file without holding the entire buffer in memory at one time. (We're talking many megabytes of text that this script outputs at one time.)
The following works, but because I'm doing this across several goroutines, my memory consumption shoots up to > 5GB which I would really like to avoid:
var out bytes.Buffer
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Stdout = &out
err := cmd.Run()
if err != nil {
log.Fatal(err)
}
out.WriteTo(io) // io is the writer connected to the new file
Ideally as out fills up, I want to be emptying it into my target file to keep memory usage low. I've tried changing this to:
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Start()
stdout, _ := cmd.StdoutPipe()
r := *bufio.NewReader(stdout)
r.WriteTo(io)
cmd.Wait()
However when I print out these variables stdout is <nil>, r is {[0 0 0 0 0...]}, and r.WriteTo panics: invalid memory address or nil pointer dereference.
Is it possible to write the output of cmd as it is generated to keep memory usage down? Thanks!

Why don't you just write to a file directly?
file, _ := os.Create("/some/file")
cmd.Stdout = file
Or use your io thing (that's a terrible name for a variable, by the way, since it's a) the name of a standard library package, b) ambiguous--what does it mean?)
cmd.Stdout = io

Related

Why does Go io.MultiWriter fail to write all data equally?

I am trying to get a list of files from the fd (find) utility and pass them to fzf while simultaneously saving that list to a text file on the hard disk. In the following code, everything is passed from fd to fzf just fine, but not all of the results make it to the text file. Why is this? The number of results that make it to the text file varies each time I run the code (the total number of lines output by fd is about 1200, but the text file will only ever receive between 200 to 900 of those lines).
fdFile, _ := os.OpenFile("./fd-output.txt", os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0600)
defer fdFile.Close()
find := exec.Command("fd", ".", "/etc")
fzf := exec.Command("fzf")
fzf.Stderr = os.Stderr
r, w := io.Pipe()
find.Stdout = w
fzf.Stdin = r
var fzfSelection bytes.Buffer
fzf.Stdout = &fzfSelection
find.Start()
go func() {
outs := []io.Writer{fdFile, w}
io.Copy(io.MultiWriter(outs...), r)
}()
fzf.Start()
find.Wait()
w.Close()
fzf.Wait()
fmt.Print(fzfSelection.String())
I read somewhere that my issue sounds like some of the data is not being flushed to the file. However, I have tried appending .Flush() to every reader/writer in the above code and always get an error that tells me the method does not exist.

What is the difference between os.Stdout and syscall.Stdout?

I have been trying to work ForkExec() and I am not able to get this one work, is there a difference between syscall.Stdout and os.Stdout?
Here is a small example of the code I am trying to run.
command := "/usr/bin/echo"
args := []string{"Hello there."}
attr := new(syscall.ProcAttr)
attr.Env = os.Environ()
attr.Files = []uintptr{uintptr(syscall.Stdin), uintptr(syscall.Stdout), uintptr(syscall.Stderr)}
pid , err := syscall.ForkExec(command, args, attr)
if err != nil {
log.Fatal(err)
}
fmt.Println(pid)
The output is not showing up on the screen.
Thanks a lot for your help in advance.
os.Stdout is a *os.File. It works with go functions that want an io.Writer or similar interfaces. syscall.Stdout is an integer constant. It's the file descriptor number of stdout, which is useful for low-level syscalls.
syscall.ForkExec does indeed want file descriptor numbers... but it's unclear why you're using that instead of os/exec.Cmd which is much more straightforward.

How to resume reading after EOF in named pipe

I'm writing a program which opens a named pipe for reading, and then processes any lines written to this pipe:
err = syscall.Mkfifo("/tmp/myfifo", 0666)
if err != nil {
panic(err)
}
pipe, err := os.OpenFile("/tmp/myfifo", os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
panic(err)
}
reader := bufio.NewReader(pipe)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
process(line)
}
This works fine as long as the writing process does not restart or for other reasons send an EOF. When this happens, the loop terminates (as expected from the specifications of Scanner).
However, I want to keep the pipe open to accept further writes. I could just reinitialize the scanner of course, but I believe this would create a race condition where the scanner might not be ready while a new process has begun writing to the pipe.
Are there any other options? Do I need to work directly with the File type instead?
From the bufio GoDoc:
Scan ... returns false when the scan stops, either by reaching the end of the input or an error.
So you could possibly leave the file open and read until EOF, then trigger scanner.Scan() again when the file has changed or at a regular interval (i.e. make a goroutine), and make sure the pipe variable doesn't go out of scope so you can reference it again.
If I understand your concern about a race condition correctly, this wouldn't be an issue (unless write and read operations must be synchronized) but when the scanner is re-initialized it will end up back at the beginning of the file.

How to capture/log everything after spawning an interactive program

I have a method that can spawn an interactive process, now how do I log everything (including stdin and stdout) after spawning ?
e.g.,
func execute(cmd1 string, slice []string) {
cmd := exec.Command(cmd1, slice...)
// redirect the output to terminal
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.Stdin = os.Stdin
cmd.Run()
}
..
The interactive program could be :
execute(ftp)
I think I have to dup stdin, stdout and read write in separate thread.
Rather than redirecting it's output to the terminal read it and then you can log/print do whatever you want with it.
stdout, err := cmd.StdoutPipe()
b, _ := ioutil.ReadAll(stdout)
fmt.Println(string(b))
Something like the code above would work though there are many options. I think you'll want to remove all that code you have to redirect to the terminal.
you could store the output in a temporary buffer and write it to several places
outBuf := bytes.Buffer{}
cmd := exec.Command(cmd1, slice...)
cmd.Stdout = &outBuf
cmd.Run()
if outBuf.Len() > 0 {
log.Printf("%s", outBuf.String())
fmt.Fprintf(os.Stdout, "%s", outBuf.String())
}

Streaming commands output progress

I'm writing a service that has to stream output of a executed command both to parent and to log. When there is a long process, the problem is that cmd.StdoutPipe gives me a final (string) result.
Is it possible to give partial output of what is going on, like in shell
func main() {
cmd := exec.Command("sh", "-c", "some long runnig task")
stdout, _ := cmd.StdoutPipe()
cmd.Start()
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
m := scanner.Text()
fmt.Println(m)
log.Printf(m)
}
cmd.Wait()
}
P.S. Just to output would be:
cmd.Stdout = os.Stdout
But in my case it is not enough.
The code you posted works (with a reasonable command executed).
Here is a simple "some long running task" written in Go for you to call and test your code:
func main() {
fmt.Println("Child started.")
time.Sleep(time.Second*2)
fmt.Println("Tick...")
time.Sleep(time.Second*2)
fmt.Println("Child ended.")
}
Compile it and call it as your command. You will see the different lines appear immediately as written by the child process, "streamed".
Reasons why it may not work for you
The Scanner returned by bufio.NewScanner() reads whole lines and only returns something if a newline character is encountered (as defined by the bufio.ScanLines() function).
If the command you execute doesn't print newline characters, its output won't be returned immediately (only when newline character is printed, internal buffer is filled or the process ends).
Possible workarounds
If you have no guarantee that the child process prints newline characters but you still want to stream the output, you can't read whole lines. One solution is to read by words, or even read by characters (runes). You can achieve this by setting a different split function using the Scanner.Split() method:
scanner := bufio.NewScanner(stdout)
scanner.Split(bufio.ScanRunes)
The bufio.ScanRunes function reads the input by runes so Scanner.Scan() will return whenever a new rune is available.
Or reading manually without a Scanner (in this example byte-by-byte):
oneByte := make([]byte, 1)
for {
_, err := stdout.Read(oneByte)
if err != nil {
break
}
fmt.Printf("%c", oneByte[0])
}
Note that the above code would read runes that multiple bytes in UTF-8 encoding incorrectly. To read multi UTF-8-byte runes, we need a bigger buffer:
oneRune := make([]byte, utf8.UTFMax)
for {
count, err := stdout.Read(oneRune)
if err != nil {
break
}
fmt.Printf("%s", oneRune[:count])
}
Things to keep in mind
Processes have default buffers for standard output and for standard error (usually the size of a few KB). If a process writes to the standard output or standard error, it goes into the respective buffer. If this buffer gets full, further writes will block (in the child process). If you don't read the standard output and standard error of a child process, your child process may hang if the buffer is full.
So it is recommended to always read both the standard output and error of a child process. Even if you know that the command don't normally write to its standard error, if some error occurs, it will probably start dumping error messages to its standard error.
Edit: As Dave C mentions by default the standard output and error streams of the child process are discarded and will not cause a block / hang if not read. But still, by not reading the error stream you might miss a thing or two from the process.
I found good examples how to implement progress output in this article by Krzysztof Kowalczyk

Resources