Why does Go io.MultiWriter fail to write all data equally? - go

I am trying to get a list of files from the fd (find) utility and pass them to fzf while simultaneously saving that list to a text file on the hard disk. In the following code, everything is passed from fd to fzf just fine, but not all of the results make it to the text file. Why is this? The number of results that make it to the text file varies each time I run the code (the total number of lines output by fd is about 1200, but the text file will only ever receive between 200 to 900 of those lines).
fdFile, _ := os.OpenFile("./fd-output.txt", os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0600)
defer fdFile.Close()
find := exec.Command("fd", ".", "/etc")
fzf := exec.Command("fzf")
fzf.Stderr = os.Stderr
r, w := io.Pipe()
find.Stdout = w
fzf.Stdin = r
var fzfSelection bytes.Buffer
fzf.Stdout = &fzfSelection
find.Start()
go func() {
outs := []io.Writer{fdFile, w}
io.Copy(io.MultiWriter(outs...), r)
}()
fzf.Start()
find.Wait()
w.Close()
fzf.Wait()
fmt.Print(fzfSelection.String())
I read somewhere that my issue sounds like some of the data is not being flushed to the file. However, I have tried appending .Flush() to every reader/writer in the above code and always get an error that tells me the method does not exist.

Related

Inconsistent results with concurrent function?

I am trying to process lines from a file concurrently, but for some reason I appear to be getting inconsistent results. A simplified version of my code is below:
var wg sync.WaitGroup
semaphore := make(chan struct{}, 2)
lengths:= []int{}
for _, file := range(args[1:]){
// Open the file and start reading it
reader, err := os.Open(file)
if err != nil {
fmt.Println("Problem reading input file:", file)
fmt.Println("Error:", err)
os.Exit(0)
}
scanner := bufio.NewScanner(reader)
// Start streaming lines
for scanner.Scan() {
wg.Add(1)
text := scanner.Text()
semaphore <- struct{}{}
go func(line string) {
length := getInformation(line)
lengths = append(lengths, length)
<-semaphore
wg.Done()
}(text)
}
}
wg.Wait()
sort.Ints(lengths)
fmt.Println("Lengths:", lengths)
The getInformation function is just returning the length of the line. I then take that line and add it to an array. The issue I'm having is that when I run this multiple times against the same file I get different number of items in my array. I had assumed that since I was using a waitGroup that all lines would be processed every time and therefore the contents of lengths would be the same, but this does not appear to be the case. Can anyone see what I am doing wrong here?
the lengths = append(lengths, length) is getting executed concurrently. This is not safe and will cause problems like missing entries from slice. You can fix this by wrapping the append calls in a mutex, or have the gorountines publish their results to a channel and have a single place that collects them up into a slice.

Copy exec.Command output to file as the buffer receives data

I have a script that dumps quite a bit of text into STDOUT when run. I'm trying to execute this script and write the output to a file without holding the entire buffer in memory at one time. (We're talking many megabytes of text that this script outputs at one time.)
The following works, but because I'm doing this across several goroutines, my memory consumption shoots up to > 5GB which I would really like to avoid:
var out bytes.Buffer
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Stdout = &out
err := cmd.Run()
if err != nil {
log.Fatal(err)
}
out.WriteTo(io) // io is the writer connected to the new file
Ideally as out fills up, I want to be emptying it into my target file to keep memory usage low. I've tried changing this to:
cmd := exec.Command("/path/to/script/binary", "arg1", "arg2")
cmd.Start()
stdout, _ := cmd.StdoutPipe()
r := *bufio.NewReader(stdout)
r.WriteTo(io)
cmd.Wait()
However when I print out these variables stdout is <nil>, r is {[0 0 0 0 0...]}, and r.WriteTo panics: invalid memory address or nil pointer dereference.
Is it possible to write the output of cmd as it is generated to keep memory usage down? Thanks!
Why don't you just write to a file directly?
file, _ := os.Create("/some/file")
cmd.Stdout = file
Or use your io thing (that's a terrible name for a variable, by the way, since it's a) the name of a standard library package, b) ambiguous--what does it mean?)
cmd.Stdout = io

How to resume reading after EOF in named pipe

I'm writing a program which opens a named pipe for reading, and then processes any lines written to this pipe:
err = syscall.Mkfifo("/tmp/myfifo", 0666)
if err != nil {
panic(err)
}
pipe, err := os.OpenFile("/tmp/myfifo", os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
panic(err)
}
reader := bufio.NewReader(pipe)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
process(line)
}
This works fine as long as the writing process does not restart or for other reasons send an EOF. When this happens, the loop terminates (as expected from the specifications of Scanner).
However, I want to keep the pipe open to accept further writes. I could just reinitialize the scanner of course, but I believe this would create a race condition where the scanner might not be ready while a new process has begun writing to the pipe.
Are there any other options? Do I need to work directly with the File type instead?
From the bufio GoDoc:
Scan ... returns false when the scan stops, either by reaching the end of the input or an error.
So you could possibly leave the file open and read until EOF, then trigger scanner.Scan() again when the file has changed or at a regular interval (i.e. make a goroutine), and make sure the pipe variable doesn't go out of scope so you can reference it again.
If I understand your concern about a race condition correctly, this wouldn't be an issue (unless write and read operations must be synchronized) but when the scanner is re-initialized it will end up back at the beginning of the file.

Streaming commands output progress

I'm writing a service that has to stream output of a executed command both to parent and to log. When there is a long process, the problem is that cmd.StdoutPipe gives me a final (string) result.
Is it possible to give partial output of what is going on, like in shell
func main() {
cmd := exec.Command("sh", "-c", "some long runnig task")
stdout, _ := cmd.StdoutPipe()
cmd.Start()
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
m := scanner.Text()
fmt.Println(m)
log.Printf(m)
}
cmd.Wait()
}
P.S. Just to output would be:
cmd.Stdout = os.Stdout
But in my case it is not enough.
The code you posted works (with a reasonable command executed).
Here is a simple "some long running task" written in Go for you to call and test your code:
func main() {
fmt.Println("Child started.")
time.Sleep(time.Second*2)
fmt.Println("Tick...")
time.Sleep(time.Second*2)
fmt.Println("Child ended.")
}
Compile it and call it as your command. You will see the different lines appear immediately as written by the child process, "streamed".
Reasons why it may not work for you
The Scanner returned by bufio.NewScanner() reads whole lines and only returns something if a newline character is encountered (as defined by the bufio.ScanLines() function).
If the command you execute doesn't print newline characters, its output won't be returned immediately (only when newline character is printed, internal buffer is filled or the process ends).
Possible workarounds
If you have no guarantee that the child process prints newline characters but you still want to stream the output, you can't read whole lines. One solution is to read by words, or even read by characters (runes). You can achieve this by setting a different split function using the Scanner.Split() method:
scanner := bufio.NewScanner(stdout)
scanner.Split(bufio.ScanRunes)
The bufio.ScanRunes function reads the input by runes so Scanner.Scan() will return whenever a new rune is available.
Or reading manually without a Scanner (in this example byte-by-byte):
oneByte := make([]byte, 1)
for {
_, err := stdout.Read(oneByte)
if err != nil {
break
}
fmt.Printf("%c", oneByte[0])
}
Note that the above code would read runes that multiple bytes in UTF-8 encoding incorrectly. To read multi UTF-8-byte runes, we need a bigger buffer:
oneRune := make([]byte, utf8.UTFMax)
for {
count, err := stdout.Read(oneRune)
if err != nil {
break
}
fmt.Printf("%s", oneRune[:count])
}
Things to keep in mind
Processes have default buffers for standard output and for standard error (usually the size of a few KB). If a process writes to the standard output or standard error, it goes into the respective buffer. If this buffer gets full, further writes will block (in the child process). If you don't read the standard output and standard error of a child process, your child process may hang if the buffer is full.
So it is recommended to always read both the standard output and error of a child process. Even if you know that the command don't normally write to its standard error, if some error occurs, it will probably start dumping error messages to its standard error.
Edit: As Dave C mentions by default the standard output and error streams of the child process are discarded and will not cause a block / hang if not read. But still, by not reading the error stream you might miss a thing or two from the process.
I found good examples how to implement progress output in this article by Krzysztof Kowalczyk

Golang is it safe to switch cmd.Stdout

I execute process with Go and write output to file (log file)
cmd := exec.Command(path)
cmd.Dir = dir
t := time.Now()
t1 := t.Format("20060102-150405")
fs, err := os.Create(dir + "/var/log/" + t1 + ".std")
if err == nil {
cmd.Stdout = fs
}
I wish to rotate logs and change log file daily
http://golang.org/pkg/os/exec/
// Stdout and Stderr specify the process's standard output and error.
//
// If either is nil, Run connects the corresponding file descriptor
// to the null device (os.DevNull).
//
// If Stdout and Stderr are the same writer, at most one
// goroutine at a time will call Write.
Stdout io.Writer
Stderr io.Writer
Is it safe to change cmd.Stdout variable daily from arbitary goroutine or I have to implement goroutine that will copy from Stdout to another file and switch files?
It is safe to change those variables directly. However, if you change them once the command has actually been run then they will have no effect on the actual running child process. To rotate the output of the running process "live" you will have to implement that in the process itself, or pipe everything through the parent and use a goroutine as you suggest.

Resources