How can I split and re-join STDOUT from multiple processes? - bash

I am working on a pipeline that has a few branch points that subsequently merge-- they look something like this:
command2
/ \
command1 command4
\ /
command3
Each command writes to STDOUT and accepts input via STDIN. STDOUT from command1 needs to be passed to both command2 and command3, which are run sequentially, and their output needs to be effectively concatenated and passed to command4. I initially thought that something like this would work:
$ command1 | (command2; command3) | command4
That doesn't work though, as only STDOUT from command2 is passed to command 4, and when I remove command4 it's apparent that command3 isn't being passed the appropriate stream from command1 -- in other words, it's as if command2 is exhausting or consuming the stream. I get the same result with { command2 ; command3 ; } in the middle as well. So I figured I should be using 'tee' with process substitution, and tried this:
$ command1 | tee >(command2) | command3 | command4
But surprisingly that didn't work either -- it appears that the output of command1 and the output of command2 are piped into command3, which results in errors and only the output of command3 being piped into command4. I did find that the following gets the appropriate input and output to and from command2 and command3:
$ command1 | tee >(command2) >(command3) | command4
However, this streams the output of command1 to command4 as well, which leads to issues as command2 and command3 produce a different specification than command1. The solution I've arrived on seems hacky, but it does work:
$ command1 | tee >(command2) >(command3) > /dev/null | command4
That suppresses command1 passing its output to command4, while collecting STDOUT from command2 and command3. It works, but I feel like I'm missing a more obvious solution. Am I? I've read dozens of threads and haven't found a solution to this problem that works in my use case, nor have I seen an elaboration of the exact problem of splitting and re-joining streams (though I can't be the first one to deal with this). Should I just be using named pipes? I tried but had difficulty getting that working as well, so maybe that's another story for another thread. I'm using bash in RHEL5.8.

You can play around with file descriptors like this;
((date | tee >( wc >&3) | wc) 3>&1) | wc
or
((command1 | tee >( command2 >&3) | command3) 3>&1) | command4
To explain, that is tee >( wc >&3) will output the original data on stdout, and the inner wc will output the result on FD 3. The outer 3>&1) will then merge FD3 output back into STDOUT so output from both wc is sent to the tailing command.
HOWEVER, there is nothing in this pipeline (or the one in your own solution) which will guanrantee that the output will not be mangled. That is incomplete lines from command2 will not be mixed up with lines of command3 -- if that is a concern, you will need to do one of two things;
Write your own tee program which internally uses popen and read each line back before sending complete lines to stdout for command4 to read
Write the output from command2 and command3 to a file and use cat to merge the data as input to command4

Please see also https://unix.stackexchange.com/questions/28503/how-can-i-send-stdout-to-multiple-commands. Amongst all answers, I found this answer particularly fits my need.
Expand a little bit #Soren's answer,
$ ((date | tee >( wc >&3) | wc) 3>&1) | cat -n
1 1 6 29
2 1 6 29
You can do without using tee but an environment variable,
$ (z=$(date); (echo "$z"| wc ); (echo "$z"| wc) ) | cat -n
1 1 6 29
2 1 6 29
In my case, I applied this technique and wrote a much complex script that runs under busybox.

I believe your solution is good and it uses tee as documented.
If you read manpage of tee, it says:
Copy standard input to each FILE, and also to standard output
Your files are process substitutions.
And the standard output is what you need to remove, because you don't want it, and that's what you did with redirecting it to /dev/null

Related

Tee to commands only, not stdout

I already know how to use tee with process substitution to send output to various commands, and stdout, eg
command0 | tee >(command1) >(command2)
With the above line, stdout will be composed of interleaved lines from command0, command1, and command2.
Is there a way to prevent tee from writing to stdout, without removing the output of any commands it pipes to? So for the example above, for stdout to only have output from command1 and command2?
Most answers relating to teeing without stdout are only writing directly to files, and recommend using something like this:
command0 | tee file1 file2 >/dev/null
But with process substitution, that would consume all output from the other commands too.
command0 | tee >(command1) >(command2) >/dev/null
Is there some way to tell tee not to print to stdout, or to only consume the output directly from tee?
Try this:
( command0 | tee >(command1 1>&3 ) | command2 ) 3>&1
It redirects the stdout of command1 to pipe 3, so that command2 sees only the original source. At end, you redirect pipe 3 to stdout again.
Use this to test it:
( echo test | tee >( sed 's/^/1 /' >&3 ) | sed 's/^/2 /' ) 3>&1
The output is unordered and in my case:
2 test
1 test
I have seen a comment and an answer that use an extra >, but don't really explain why it does what it does. It seems like it is redirecting output somewhere but all I can tell so far is that it does what I'm looking for. This works:
command0 | tee > >(command1) >(command2)
command0 | tee >(command1) > >(command2)
it appears not to matter where the extra > is, so long as it is before at least one of the arguments to tee. So this will not work:
command0 | tee >(command1) >(command2) >
Without knowing what this is called, and with no further leads, I can't explain further.

Redirect both stdout and stderr to file, print stdout only [duplicate]

This question already has answers here:
Separately redirecting and recombining stderr/stdout without losing ordering
(2 answers)
Closed 4 years ago.
I have a large amount of text coming in stdout and stderr; I would like to log all of it in a file (in the same order), and print only what comes from stdout in the console for further processing (like grep).
Any combination of > file or &> file, even with | or |& will permanently redirect the stream and I cannot pipe it afterwards:
my_command > output.log | grep something # logs only stdout, prints only stderr
my_command &> output.log | grep something # logs everything in correct order, prints nothing
my_command > output.log |& grep something # logs everything in correct order, prints nothing
my_command &> output.log |& grep something # logs everything in correct order, prints nothing
Any use of tee will either
print what comes from stderr then log everything that comes from stdout and print it out, so I lose the order of the text that comes in
log both in the correct order if I use |& tee but I lose control over the streams since now everything is in stdout.
example:
my_command | tee output.log | grep something # logs only stdout, prints all of stderr then all of stdout
my_command |& tee output.log | grep something # logs everything, prints everything to stdout
my_command | tee output.log 3>&1 1>&2 2>&3 | tee -a output.log | grep something # logs only stdout, prints all of stderr then all of stdout
Now I'm all out of ideas.
This is what my test case looks like:
testFunction() {
echo "output";
1>&2 echo "error";
echo "output-2";
1>&2 echo "error-2";
echo "output-3";
1>&2 echo "error-3";
}
I would like my console output to look like:
output
output-2
output-3
And my output.log file to look like:
output
error
output-2
error-2
output-3
error-3
For more details, I'm filtering the output of mvn clean install with grep to only keep minimal information in the terminal, but I also would like to have a full log somewhere in case I need to investigate a stack trace or something. The java test logs are sent to stderr so I choose to discard it in my console output.
While not really a solution which uses redirects or anything of that order, you might want to use annotate-output for this.
Assume that script.sh contains your function, then you can do:
$ annotate-output ./script.sh
13:17:15 I: Started ./script.sh
13:17:15 O: output
13:17:15 E: error
13:17:15 E: error-2
13:17:15 O: output-2
13:17:15 E: error-3
13:17:15 O: output-3
13:17:15 I: Finished with exitcode 0
So now it is easy to reprocess that information and send it to the files you want:
$ annotate-output ./script.sh \
| awk '{s=substr($0,13)}/ [OE]: /{print s> "logfile"}/ O: /{print s}'
output
output-2
output-3
$ cat logfile
output
error
error-2
output-2
error-3
output-3
Or any other combination of tee sed cut ...
As per comment from #CharlesDuffy:
Since stdout and stderr are processed in parallel, it can happen that some lines received on
stdout will show up before later-printed stderr lines (and vice-versa).
This is unfortunately very hard to fix with the current annotation strategy. A fix would
involve switching to PTRACE'ing the process. Giving nice a (much) higher priority over the
executed program could, however, cause this behaviour to show up less frequently.
source: man annotate-output

Can I take an output stream, duplicate it with tee, munge one of them, and pipe BOTH back as input into diff?

As an example, taking one single program's stdout, obtaining two copies of it with tee and sending them both (one or preferably both able to be piped through other programs) back into vimdiff.
Bonus points if it can be done without having to create a file on disk.
I know how to direct input into a program that takes two inputs, like this
vimdiff <(curl http://google.com) <(curl http://archives.com/last_night/google.com)
and with tee for making two output streams
echo "abc" | tee >(sed 's/a/zzz/') >(sed 's/c/zzz/')
but I do not know how to connect the pipes back together into a diamond shape.
It's not so hard if you can use a fifo:
test -e fifo || mkfifo fifo
echo abc | tee >(sed s/a/zzz/ > fifo) | sed s/c/zzz/ | diff - fifo
Just as a side note, to have this work under ZSH an extra ">" is needed after tee (multios option should be set):
$ setopt multios
$ test -e fifo || mkfifo fifo
$ echo abc | tee > >(sed s/a/zzz/ > fifo) | sed s/c/zzz/ | diff - fifo

Broken pipe in tee with process substituion

I just found out about process substitution using >() and am super excited about it, however when I tried it, it doesn't always work. e.g.
This works:
cat /usr/share/dict/words |tee >(tail -1) > /dev/null
ZZZ
And this gives a broken pipe error:
cat /usr/share/dict/words |tee >(head -1) > /dev/null
1080
tee: /dev/fd/63: Broken pipe
Any idea why?
Thanks!
Update: This is on RHEL 4 and RHEL 6.2
here's an explanation of why you get the error with head but not with tail:
head -1 only has to read one line of its input. then it will exit and the tee continues feeding its output into...
tail -1 on the other hand has to read the complete input in order to complete its job, so it will never terminate the pipe before tee is finished.
you can safely ignore the broken pipe message and many programs stopped reporting such errors. on my machine I don't see it.

Pipe output to two different commands [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
osx/linux: pipes into two processes?
Is there a way to pipe the output from one command into the input of two other commands, running them simultaneously?
Something like this:
$ echo 'test' |(cat) |(cat)
test
test
The reason I want to do this is that I have a program which receives an FM radio signal from a USB SDR device, and outputs the audio as raw PCM data (like a .wav file but with no header.) Since the signal is not music but POCSAG pager data, I need to pipe it to a decoder program to recover the pager text. However I also want to listen to the signal so I know whether any data is coming in or not. (Otherwise I can't tell if the decoder is broken or there's just no data being broadcast.) So as well as piping the data to the pager decoder, I also need to pipe the same data to the play command.
Currently I only know how to do one - either pipe it to the decoder and read the data in silence, or pipe it to play and hear it without seeing any decoded text.
How can I pipe the same data to both commands, so I can read the text and hear the audio?
I can't use tee as it only writes the duplicated data to a file, but I need to process the data in real-time.
It should be ok if you use both tee and mkfifo.
mkfifo pipe
cat pipe | (command 1) &
echo 'test' | tee pipe | (command 2)
Recent bash present >(command) syntax:
echo "Hello world." | tee >(sed 's/^/1st: /') >(sed 's/^/2nd cmd: /') >/dev/null
May return:
2nd cmd: Hello world.
1st: Hello world.
download somefile.ext, save them, compute md5sum and sha1sum:
wget -O - http://somewhere.someland/somepath/somefile.ext |
tee somefile.ext >(md5sum >somefile.md5) | sha1sum >somefile.sha1
or
wget -O - http://somewhere.someland/somepath/somefile.ext |
tee >(md5sum >somefile.md5) >(sha1sum >somefile.sha1) >somefile.ext
Old answer
There is a way to do that via unnamed pipe (tested under linux):
(( echo "hello" |
tee /dev/fd/5 |
sed 's/^/1st occure: /' >/dev/fd/4
) 5>&1 |
sed 's/^/2nd command: /'
) 4>&1
give:
2nd command: hello
1st occure: hello
This sample will let you download somefile.ext, save them, compute his md5sum and compute his sha1sum:
(( wget -O - http://somewhere.someland/somepath/somefile.ext |
tee /dev/fd/5 |
md5sum >/dev/fd/4
) 5>&1 |
tee somefile.ext |
sha1sum
) 4>&1
Maybe take a look at tee command. What it does is simply print its input to a file, but it also prints its input to the standard output. So something like:
echo "Hello" | tee try.txt | <some_command>
Will create a file with content "Hello" AND also let "Hello" (flow through the pipeline) end up as <some_command>'s STDIN.

Resources