How to parse output of pipe in real time? - bash

When I pipe two commands, it seems the first command must finish before the second command could parse the output.
For example,
$ ping -c 5 10.11.12.13 | while read line; do echo $line; done
I expect it would generate output every second, but not. Is it true or I miss something (e.g., buffering effect)?
The problem is: if the first command runs over long period of time and I want to parse the output in real time. How to do it using shell?
Thanks.

You can force the first command to become unbuffered by using a command like unbuffer (from the expect package) or stdbuf. This way, it will flush its output after each line, rather than after it outputs e.g. 4096 bytes.

Related

Trim output of running program

I have a program that when it runs, outputs hundreds of lines starting with "Info:" and a few lines that have useful output. To make things easier, I have created a simple python and bash combination script to emulate the issue I'm having:
wait_2sec.py:
import time
print("Hello!")
time.sleep(2)
print("Goodbye!")
I am attempting to trim my output by running:
python wait_2sec.py | sed '/Goodbye/d'
However, sed does not output Hello! until after the python script has finished. I don't know whether the pipe waits until after the program is finished to begin running the sed command, or if the sed command is the hangup.
I am open to using another command to trim output if sed does not work for this use-case.
I don't know whether the pipe waits until after the program is finished to begin running the sed command, or if the sed command is the hangup.
It's actually neither of both, it's that python normally buffers its output (if not to a terminal) until the buffer is full, hence Moustapha's suggestion may work provided that unbuffer is installed. But you can simply use python's built-in option -u (Force the stdout and stderr streams to be unbuffered.) instead:
python -u wait_2sec.py | sed '/Goodbye/d'
You can try to run your script using the following command:
unbuffer python wait_2sec.py | sed '/Goodbye/d'
Refrence: https://unix.stackexchange.com/a/200413

How to make tee in Linux provide screen output line by line, not at the end of execution? [duplicate]

Usually, stdout is line-buffered. In other words, as long as your printf argument ends with a newline, you can expect the line to be printed instantly. This does not appear to hold when using a pipe to redirect to tee.
I have a C++ program, a, that outputs strings, always \n-terminated, to stdout.
When it is run by itself (./a), everything prints correctly and at the right time, as expected. However, if I pipe it to tee (./a | tee output.txt), it doesn't print anything until it quits, which defeats the purpose of using tee.
I know that I could fix it by adding a fflush(stdout) after each printing operation in the C++ program. But is there a cleaner, easier way? Is there a command I can run, for example, that would force stdout to be line-buffered, even when using a pipe?
you can try stdbuf
$ stdbuf --output=L ./a | tee output.txt
(big) part of the man page:
-i, --input=MODE adjust standard input stream buffering
-o, --output=MODE adjust standard output stream buffering
-e, --error=MODE adjust standard error stream buffering
If MODE is 'L' the corresponding stream will be line buffered.
This option is invalid with standard input.
If MODE is '0' the corresponding stream will be unbuffered.
Otherwise MODE is a number which may be followed by one of the following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.
In this case the corresponding stream will be fully buffered with the buffer
size set to MODE bytes.
keep this in mind, though:
NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does
for e.g.) then that will override corresponding settings changed by 'stdbuf'.
Also some filters (like 'dd' and 'cat' etc.) dont use streams for I/O,
and are thus unaffected by 'stdbuf' settings.
you are not running stdbuf on tee, you're running it on a, so this shouldn't affect you, unless you set the buffering of a's streams in a's source.
Also, stdbuf is not POSIX, but part of GNU-coreutils.
Try unbuffer (man page) which is part of the expect package. You may already have it on your system.
In your case you would use it like this:
unbuffer ./a | tee output.txt
The -p option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.
You can use setlinebuf from stdio.h.
setlinebuf(stdout);
This should change the buffering to "line buffered".
If you need more flexibility you can use setvbuf.
You may also try to execute your command in a pseudo-terminal using the script command (which should enforce line-buffered output to the pipe)!
script -q /dev/null ./a | tee output.txt # Mac OS X, FreeBSD
script -c "./a" /dev/null | tee output.txt # Linux
Be aware the script command does not propagate back the exit status of the wrapped command.
The unbuffer command from the expect package at the #Paused until further notice answer did not worked for me the way it was presented.
Instead of using:
./a | unbuffer -p tee output.txt
I had to use:
unbuffer -p ./a | tee output.txt
(-p is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments)
The expect package can be installed on:
MSYS2 with pacman -S expect
Mac OS with brew install expect
Update
I recently had buffering problems with python inside a shell script (when trying to append timestamp to its output). The fix was to pass -u flag to python this way:
run.sh with python -u script.py
unbuffer -p /bin/bash run.sh 2>&1 | tee /dev/tty | ts '[%Y-%m-%d %H:%M:%S]' >> somefile.txt
This command will put a timestamp on the output and send it to a file and stdout at the same time.
The ts program (timestamp) can be installed with the moreutils package.
Update 2
Recently, also had problems with grep buffering the output, when I used the argument grep --line-buffered on grep to it stop buffering the output.
If you use the C++ stream classes instead, every std::endl is an implicit flush. Using C-style printing, I think the method you suggested (fflush()) is the only way.
The best answer IMO is grep's --line-buffer option as stated here:
https://unix.stackexchange.com/a/53445/40003

Get bash sub shell output immediately from named pipe

I have a few commands i run between brackets which i then redirect to a named pipe and tail the pipe however it looks like the redirection happens only after the block has finished executing as i don't see any output from the tail command for a while and it only shows the last command ouput when i do. Any ideas how view the output of the block in realtime?
Example Script
#!/usr/bin/env bash
mkfifo /tmp/why_you_no_out;
trap "rm /tmp/why_you_no_out" 0;
{
for ((i=1;i<=100;i++)); do
printf "$i";
done
sleep 10s;
printf "\n12356";
} >> /tmp/why_you_no_out &
printf "here";
tail -n 1 -f /tmp/why_you_no_out
Sounds like the issue is buffering. Most shells don't want to write data a byte at a time because it's wasteful. Instead, they wait until they have a sizable chunk of data before committing it unless the output is connected to your terminal.
If you're looking to unbuffer the output of an arbitrary command, you may find the "unbuffer" utility helpful or any of the solutions mentioned in this question: How to make output of any shell command unbuffered?
If you're dealing with specific applications, they may have options to reduce buffering. For example, GNU's grep includes the --line-buffered option.

The stdin of chained commands and commands in command substitution

Let me present my findings first and put my questions at the end. (1) applies to zsh only and (2), (3) apply to both zsh and bash.
1. stdin of command substitution
ls | echo $(cat)
ls | { echo $(cat) }
The first one prints cat: -: Input/output error; while the second one produces the output of ls.
2. chained commands after pipe
ls | { head -n1; cat}
ls | { read a; cat}
The first command doest work properly. cat encounters EOF and directly exits. But the second form works: the first line is read into a and cat gets the rest of them.
3. mixed stdin
ls | { python -c 'import sys; print(sys.argv)' $(head -n1) }
ls | { python -c 'import sys; print(sys.argv); print(input())' $(head -n1) }
Inside the {} in the first line, the command is to print the cmdline arguments; in the second form, the command also reads a line from stdin.
The first command can run successfully while the second form throws due to that input() reads the EOF.
My questions are:
(as in section 1) What is the difference between the form with {} and without ?
(as in section 2) Is it possible for the head and cat to read the same stdin sequentially? How can the second form succeed while the first form fails?
(as in section 3) How is the stdin of the command in a command substitution connected to the stdin of the original command (echo here). Who reads first? And how to make the stdin kept open so that both commands (python and head) can read the same stdin sequentially?
You are not taking input buffering into account and it explains most of your observations.
head reads several kilobytes of input each time it needs data, which makes it much more efficient. So it is likely that it will read all of stdin before any other process has a chance to. That's obvious in case 2, where the execution order is perhaps clearer.
If input were coming from a regular file, head could seek back to the end of the lines it used before terminating. But since a pipe is not seekable, it cannot do that. If you use "here-strings" -- the <<< syntax, then stdin will turn out to be seekable because here-strings are implemented using a temporary file. I don't know if you can rely on that fact, though.
read does not buffer input, at least not beyond the current line (and even then, only if it has no other line end delimiter specified on the command line). It carefully only reads what it needs precisely because it is generally used in a context where its input comes from a pipe and seeking wouldn't be possible. That's extremely useful -- so much so that the fact that it works is almost invisible -- but it's also one of the reasons shell scripting can be painfully slow.
You can see this more clearly by sending enough data into the pipe to satisfy head's initial read. Try this, for example:
seq 1 10000 | { head -n1; head -n2; }
(I changed the second head to head -n2 because the first head happens to leave stdin positioned exactly at the end of a line, so that the second head sees a blank line as the first line.)
The other thing you need to understand is what command substitution does, and when it does it. Command substitution reads the entire output of a command and inserts it into the command line. That happens even before the command has been identified, never mind started execution.
Consider the following little snippet:
$(printf %cc%co e h) hello, world
It should be clear from that that the command substitution is fully performed before the echo utility (or builtin) is started.
Your first scenario triggers an oddity of zsh which is explained by Stéphane Chazelas in this answer on Unix.SE. Effectively, zsh does the command substitution before the pipeline is set up, so cat is reading from the main zsh's standard input. (Stéphane explains why this is and how it leads to an EIO error. Although I think it is dependent on the precise zsh configuration and option settings, since on my default zsh install, it just locks up my terminal. At some point I'll have to figure out why.) If you use braces, then the redirection is set up before the command substitution is performed.

how to use mpiexec in linux shell

I have a file a.txt and each line contains a parameter. Now I want to use mpiexec to call my program such as a.out to calculate with each parameter. So I use linux shell script to handle this. The code is sample
cat a.txt | while read line
do
mpiexec -v -hostfile hosts -np 16 ./a.out ${line}
done
Unexpectedly, the script end after processing only one line of file a.txt. So, it is because of the wrong use of pipe? How can I tackle with this problem?
#!/bin/bash
for LINE in `cat a.txt | xargs -r`; do
mpiexec -v -hostfile hosts -np 16 ./a.out $LINE
done
I had this issue too. Claudio's solution helped set me on the right path to understanding why the loop exits after the first iteration. First off, here is a solution which is pretty close to what you wrote:
cat a.txt | while read line; do
</dev/null mpiexec -np 16 ./a.out ${line}
done
Note that I am just using mpiexec on a local computer, (python's threading situation is bad enough to need this) so I can't test if this works with separate hosts. You can try adding that back in yourself.
The reason that your script didn't work is that mpiexec seems to gobble up whatever is attached to the standard input. I assume it does this so that in case a.out needs that input, it would gobble all the input and send it along with the command to run a.out that gets sent to the other servers. The result is that on the first iteration, read reads the first line from your file. Then mpiexec reads the rest of the lines, even though a.out probably doesn't use them in your case. Then on the second iteration, read tries to read more lines, but since mpiexec already read the rest, read is told that the end of file has been reached, so the loop exits.
Since we want to prevent mpiexec from reading the standard in, we redirect mpiexec's standard in to come from /dev/null. Since /dev/null always contains nothing, mpiexec will read nothing and leave the standard input alone.

Resources