I'm writing a program which opens a named pipe for reading, and then processes any lines written to this pipe:
err = syscall.Mkfifo("/tmp/myfifo", 0666)
if err != nil {
panic(err)
}
pipe, err := os.OpenFile("/tmp/myfifo", os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
panic(err)
}
reader := bufio.NewReader(pipe)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
process(line)
}
This works fine as long as the writing process does not restart or for other reasons send an EOF. When this happens, the loop terminates (as expected from the specifications of Scanner).
However, I want to keep the pipe open to accept further writes. I could just reinitialize the scanner of course, but I believe this would create a race condition where the scanner might not be ready while a new process has begun writing to the pipe.
Are there any other options? Do I need to work directly with the File type instead?
From the bufio GoDoc:
Scan ... returns false when the scan stops, either by reaching the end of the input or an error.
So you could possibly leave the file open and read until EOF, then trigger scanner.Scan() again when the file has changed or at a regular interval (i.e. make a goroutine), and make sure the pipe variable doesn't go out of scope so you can reference it again.
If I understand your concern about a race condition correctly, this wouldn't be an issue (unless write and read operations must be synchronized) but when the scanner is re-initialized it will end up back at the beginning of the file.
Related
This is an example from Go docs, that just hangs waiting for STDIN input:
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
The same thing happens if you read an empty file.
What is the propper way to break hanging scanner?
The only solution that comes to mind is a periodic check to see if scanner has received new data.
There is a feeling that I'm missing some nonsense and the solution is actually obscenely simple and obvious.
You can't end (terminate) a Scanner.Scan() call.
You loop until Scanner.Scan() keeps returns true. Scanner.Scan() will keep returning true as long as lines (be them empty or not) are successfully read. It returns false if end of input is reached or reading fails.
So for you to exit the loop, you have to "transmit" end of input signal on your terminal. This can be done by pressing CTRL+D on unix systems, and CTRL+Z on Windows.
Of course you can modify the loop body, and add a condition (if) to terminate if a certain input is entered, e.g. exit:
for scanner.Scan() {
line := scanner.Text()
fmt.Println(line)
if line == "exit" {
break
}
}
I need to call lots of short-lived (and occasionally some long-lived) external processes in rapid succession and process both stdout and stderr in realtime. I've found numerous solutions for this using StdoutPipe and StderrPipe with a bufio.Scanner for each, packaged into goroutines. This works most of the time, but it swallows the external command's output occasionally, and I can't figure out why.
Here's a minimal example displaying that behaviour on MacOS X (Mojave) and on Linux:
package main
import (
"bufio"
"log"
"os/exec"
"sync"
)
func main() {
for i := 0; i < 50000; i++ {
log.Println("Loop")
var wg sync.WaitGroup
cmd := exec.Command("echo", "1")
stdout, err := cmd.StdoutPipe()
if err != nil {
panic(err)
}
cmd.Start()
stdoutScanner := bufio.NewScanner(stdout)
stdoutScanner.Split(bufio.ScanLines)
wg.Add(1)
go func() {
for stdoutScanner.Scan() {
line := stdoutScanner.Text()
log.Printf("[stdout] %s\n", line)
}
wg.Done()
}()
cmd.Wait()
wg.Wait()
}
}
I've left out the stderr handling for this. When running this, I get only about 49,900 [stdout] 1 lines (the actual number varies with each run), though there should be 50,000. I'm seeing 50,000 loop lines, so it doesn't seem to die prematurely. This smells like a race condition somewhere, but I can't figure out where.
It works just fine if I don't put the scanning loop in a goroutine, but then I lose the ability to simultaneously read stderr, which I need.
I've tried running this with -race, Go reports no data races.
I'm out of ideas, what am I getting wrong?
You're not checking for errors in several places.
In some, this is not actually causing problems, but it's still a good idea to check:
cmd.Start()
may return an error, in which case the command was never run. (This is not the actual problem.)
When stdoutScanner.Scan() returns false, stdoutScanner.Err() may show an error. If you start checking this, you'll find some errors:
2020/02/19 15:38:17 [stdout err] read |0: file already closed
This isn't the actual problem, but—aha—this matches the symptoms you see: not all of the output got seen. Now, why would reading stdout claim that the file is closed? Well, where did stdout come from? It's from here:
stdout, err := cmd.StdoutPipe()
Take a look at the source code for this function, which ends with these lines:
c.closeAfterStart = append(c.closeAfterStart, pw)
c.closeAfterWait = append(c.closeAfterWait, pr)
return pr, nil
(and pr is the pipe-read return value). Hmm: what could closeAfterWait mean?
Now, here are your last two lines in your loop:
cmd.Wait()
wg.Wait()
That is, first we wait for cmd to finish. (When cmd finishes, what gets closed?) Then we wait for the goroutine that's reading cmd's stdout to finish. (Hm, what could still be reading from the pr pipe?)
The fix is now obvious: swap the wg.Wait(), which waits for the consumer of the stdout pipe to finish reading it, with the cmd.Wait(), which waits for echo ... to exit and then closes the read end of the pipe. If you close while the readers are still reading, they may never read what you expected.
I have a program which makes an ssh connection to a new (every time the program is executed) gcp instance to retrieve information. The problem is that sometimes I got this error and I don't know why:
2019/08/22 12:30:37 ssh: Stdout already set
My code(avoiding error handle):
results := /home/example.txt
client, err := ssh.Dial("tcp", addrIP+":22", clientConfig)
session, err := client.NewSession()
defer session.Close()
data, err := session.Output(" cat " + results)
if err != nil {
log.Print("Fails when new output")
log.Fatal(err)
}
During the output is when the error occurs.
Calling session.Output would set the Stdout of the session to a buffer, then run the command provided, and return the contents in the buffer.
If the Stdout of this session is already set (for example, if you call the session.Output multiple times), an error of "Stdout already set" will be returned.
If you need to run multiple commands in one session, just manually set the Stdout to some buffer maintained by yourself, and use the session.Run() method instead of session.Output.
I was trying to read in CSV file in Golang line by line with a for loop that required an if statement with a break to see if the error reading the file was EOF. I find this syntax rather unnecessary when I could in java for example read the line inside a while loop conditional and simultaneously check for the EOF error. I thought that declaring a variable inside of a for loop was possible and I know for sure that you can do this with if statements in Golang. Doing:
if v := 2; v > 1{
fmt.Println("2 is better than 1")
}
The first snippet of code I have here is what I know to work in my program.
reader := csv.NewReader(some_file)
for {
line, err := reader.Read()
if err == io.EOF {
break
}
//do data parsing from your line here
}
I do not know whether or not this second snippet is conceptually possible or just syntactically incorrect.
reader := csv.NewReader(some_file)
for line, err := reader.Read(); err != io.EOF {
//do data parsing from your line here
}
Would like some clarification/benefits/conventions of doing it one way over another, Thanks :)
It is conventional way to write simpler statements rather than lengthy, complex one, isn't it?
So, I consider the 1st version is more conventional way than the 2nd version. Moreover, the for loop in your 2nd version isn't in the right way. If you want to use that, then fix it like following or whatever you wish:
for line, err := reader.Read(); err != io.EOF; line, err = reader.Read() {
//do data parsing from your line here
}
I'm writing a service that has to stream output of a executed command both to parent and to log. When there is a long process, the problem is that cmd.StdoutPipe gives me a final (string) result.
Is it possible to give partial output of what is going on, like in shell
func main() {
cmd := exec.Command("sh", "-c", "some long runnig task")
stdout, _ := cmd.StdoutPipe()
cmd.Start()
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
m := scanner.Text()
fmt.Println(m)
log.Printf(m)
}
cmd.Wait()
}
P.S. Just to output would be:
cmd.Stdout = os.Stdout
But in my case it is not enough.
The code you posted works (with a reasonable command executed).
Here is a simple "some long running task" written in Go for you to call and test your code:
func main() {
fmt.Println("Child started.")
time.Sleep(time.Second*2)
fmt.Println("Tick...")
time.Sleep(time.Second*2)
fmt.Println("Child ended.")
}
Compile it and call it as your command. You will see the different lines appear immediately as written by the child process, "streamed".
Reasons why it may not work for you
The Scanner returned by bufio.NewScanner() reads whole lines and only returns something if a newline character is encountered (as defined by the bufio.ScanLines() function).
If the command you execute doesn't print newline characters, its output won't be returned immediately (only when newline character is printed, internal buffer is filled or the process ends).
Possible workarounds
If you have no guarantee that the child process prints newline characters but you still want to stream the output, you can't read whole lines. One solution is to read by words, or even read by characters (runes). You can achieve this by setting a different split function using the Scanner.Split() method:
scanner := bufio.NewScanner(stdout)
scanner.Split(bufio.ScanRunes)
The bufio.ScanRunes function reads the input by runes so Scanner.Scan() will return whenever a new rune is available.
Or reading manually without a Scanner (in this example byte-by-byte):
oneByte := make([]byte, 1)
for {
_, err := stdout.Read(oneByte)
if err != nil {
break
}
fmt.Printf("%c", oneByte[0])
}
Note that the above code would read runes that multiple bytes in UTF-8 encoding incorrectly. To read multi UTF-8-byte runes, we need a bigger buffer:
oneRune := make([]byte, utf8.UTFMax)
for {
count, err := stdout.Read(oneRune)
if err != nil {
break
}
fmt.Printf("%s", oneRune[:count])
}
Things to keep in mind
Processes have default buffers for standard output and for standard error (usually the size of a few KB). If a process writes to the standard output or standard error, it goes into the respective buffer. If this buffer gets full, further writes will block (in the child process). If you don't read the standard output and standard error of a child process, your child process may hang if the buffer is full.
So it is recommended to always read both the standard output and error of a child process. Even if you know that the command don't normally write to its standard error, if some error occurs, it will probably start dumping error messages to its standard error.
Edit: As Dave C mentions by default the standard output and error streams of the child process are discarded and will not cause a block / hang if not read. But still, by not reading the error stream you might miss a thing or two from the process.
I found good examples how to implement progress output in this article by Krzysztof Kowalczyk