Related
I want to execute a long running command in Bash, and both capture its exit status, and tee its output.
So I do this:
command | tee out.txt
ST=$?
The problem is that the variable ST captures the exit status of tee and not of command. How can I solve this?
Note that command is long running and redirecting the output to a file to view it later is not a good solution for me.
There is an internal Bash variable called $PIPESTATUS; it’s an array that holds the exit status of each command in your last foreground pipeline of commands.
<command> | tee out.txt ; test ${PIPESTATUS[0]} -eq 0
Or another alternative which also works with other shells (like zsh) would be to enable pipefail:
set -o pipefail
...
The first option does not work with zsh due to a little bit different syntax.
Dumb solution: Connecting them through a named pipe (mkfifo). Then the command can be run second.
mkfifo pipe
tee out.txt < pipe &
command > pipe
echo $?
using bash's set -o pipefail is helpful
pipefail: the return value of a pipeline is the status of
the last command to exit with a non-zero status,
or zero if no command exited with a non-zero status
There's an array that gives you the exit status of each command in a pipe.
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo $?
0
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo ${PIPESTATUS[*]}
1 0
$ touch x
$ cat x| sed 's'
sed: 1: "s": substitute pattern can not be delimited by newline or backslash
$ echo ${PIPESTATUS[*]}
0 1
This solution works without using bash specific features or temporary files. Bonus: in the end the exit status is actually an exit status and not some string in a file.
Situation:
someprog | filter
you want the exit status from someprog and the output from filter.
Here is my solution:
((((someprog; echo $? >&3) | filter >&4) 3>&1) | (read xs; exit $xs)) 4>&1
echo $?
See my answer for the same question on unix.stackexchange.com for a detailed explanation and an alternative without subshells and some caveats.
By combining PIPESTATUS[0] and the result of executing the exit command in a subshell, you can directly access the return value of your initial command:
command | tee ; ( exit ${PIPESTATUS[0]} )
Here's an example:
# the "false" shell built-in command returns 1
false | tee ; ( exit ${PIPESTATUS[0]} )
echo "return value: $?"
will give you:
return value: 1
So I wanted to contribute an answer like lesmana's, but I think mine is perhaps a little simpler and slightly more advantageous pure-Bourne-shell solution:
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; printf $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out - command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, printf will execute and print icommand1's exit code on its stdout, but that stdout is redirected to file descriptor 3.
While command1 is running, its stdout is being piped to command2 (printf's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor 1 - because we want file descriptor 1 free for a little bit later, because we will bring the printf output on file descriptor 3 back down into file descriptor 1 - because that's what the command substitution (the backticks), will capture and that's what will get placed into the variable.
The final bit of magic is that first exec 4>&1 we did as a separate command - it opens file descriptor 4 as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it - but since command2's output is going to file descriptor 4 as far as the command substitution is concerned, the command substitution doesn't capture it - however once it gets "out" of the command substitution it is effectively still going to the script's overall file descriptor 1.
(The exec 4>&1 has to be a separate command because many common shells don't like it when you try to write to a file descriptor inside a command substitution, that is opened in the "external" command that is using the substitution. So this is the simplest portable way to do it.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the printf's output jumps over command 2 so that command2 doesn't catch it, and then command 2's output jumps over and out of the command substitution just as printf lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its merry way being written to the standard output, just as in a normal pipe.
Also, as I understand it, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out - this, and not having to define an additional function, is why I think this might be a somewhat better solution than the one proposed by lesmana.
Per the caveats lesmana mentions, it's possible that command1 will at some point end up using file descriptors 3 or 4, so to be more robust, you would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; printf $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Note that I use compound commands in my example, but subshells (using ( ) instead of { } will also work, though may perhaps be less efficient.)
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
I'm not sure how often things use file descriptor three and four directly - I think most of the time programs use syscalls that return not-used-at-the-moment file descriptors, but sometimes code writes to file descriptor 3 directly, I guess (I could imagine a program checking a file descriptor to see if it's open, and using it if it is, or behaving differently accordingly if it's not). So the latter is probably best to keep in mind and use for general-purpose cases.
(command | tee out.txt; exit ${PIPESTATUS[0]})
Unlike #cODAR's answer this returns the original exit code of the first command and not only 0 for success and 127 for failure. But as #Chaoran pointed out you can just call ${PIPESTATUS[0]}. It is important however that all is put into brackets.
In Ubuntu and Debian, you can apt-get install moreutils. This contains a utility called mispipe that returns the exit status of the first command in the pipe.
Outside of bash, you can do:
bash -o pipefail -c "command1 | tee output"
This is useful for example in ninja scripts where the shell is expected to be /bin/sh.
The simplest way to do this in plain bash is to use process substitution instead of a pipeline. There are several differences, but they probably don't matter very much for your use case:
When running a pipeline, bash waits until all processes complete.
Sending Ctrl-C to bash makes it kill all the processes of a pipeline, not just the main one.
The pipefail option and the PIPESTATUS variable are irrelevant to process substitution.
Possibly more
With process substitution, bash just starts the process and forgets about it, it's not even visible in jobs.
Mentioned differences aside, consumer < <(producer) and producer | consumer are essentially equivalent.
If you want to flip which one is the "main" process, you just flip the commands and the direction of the substitution to producer > >(consumer). In your case:
command > >(tee out.txt)
Example:
$ { echo "hello world"; false; } > >(tee out.txt)
hello world
$ echo $?
1
$ cat out.txt
hello world
$ echo "hello world" > >(tee out.txt)
hello world
$ echo $?
0
$ cat out.txt
hello world
As I said, there are differences from the pipe expression. The process may never stop running, unless it is sensitive to the pipe closing. In particular, it may keep writing things to your stdout, which may be confusing.
PIPESTATUS[#] must be copied to an array immediately after the pipe command returns.
Any reads of PIPESTATUS[#] will erase the contents.
Copy it to another array if you plan on checking the status of all pipe commands.
"$?" is the same value as the last element of "${PIPESTATUS[#]}",
and reading it seems to destroy "${PIPESTATUS[#]}", but I haven't absolutely verified this.
declare -a PSA
cmd1 | cmd2 | cmd3
PSA=( "${PIPESTATUS[#]}" )
This will not work if the pipe is in a sub-shell. For a solution to that problem,
see bash pipestatus in backticked command?
Base on #brian-s-wilson 's answer; this bash helper function:
pipestatus() {
local S=("${PIPESTATUS[#]}")
if test -n "$*"
then test "$*" = "${S[*]}"
else ! [[ "${S[#]}" =~ [^0\ ] ]]
fi
}
used thus:
1: get_bad_things must succeed, but it should produce no output; but we want to see output that it does produce
get_bad_things | grep '^'
pipeinfo 0 1 || return
2: all pipeline must succeed
thing | something -q | thingy
pipeinfo || return
Pure shell solution:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (cat || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
hello world
And now with the second cat replaced by false:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (false || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
Some command failed:
Second command failed: 1
First command failed: 141
Please note the first cat fails as well, because it's stdout gets closed on it. The order of the failed commands in the log is correct in this example, but don't rely on it.
This method allows for capturing stdout and stderr for the individual commands so you can then dump that as well into a log file if an error occurs, or just delete it if no error (like the output of dd).
It may sometimes be simpler and clearer to use an external command, rather than digging into the details of bash. pipeline, from the minimal process scripting language execline, exits with the return code of the second command*, just like a sh pipeline does, but unlike sh, it allows reversing the direction of the pipe, so that we can capture the return code of the producer process (the below is all on the sh command line, but with execline installed):
$ # using the full execline grammar with the execlineb parser:
$ execlineb -c 'pipeline { echo "hello world" } tee out.txt'
hello world
$ cat out.txt
hello world
$ # for these simple examples, one can forego the parser and just use "" as a separator
$ # traditional order
$ pipeline echo "hello world" "" tee out.txt
hello world
$ # "write" order (second command writes rather than reads)
$ pipeline -w tee out.txt "" echo "hello world"
hello world
$ # pipeline execs into the second command, so that's the RC we get
$ pipeline -w tee out.txt "" false; echo $?
1
$ pipeline -w tee out.txt "" true; echo $?
0
$ # output and exit status
$ pipeline -w tee out.txt "" sh -c "echo 'hello world'; exit 42"; echo "RC: $?"
hello world
RC: 42
$ cat out.txt
hello world
Using pipeline has the same differences to native bash pipelines as the bash process substitution used in answer #43972501.
* Actually pipeline doesn't exit at all unless there is an error. It executes into the second command, so it's the second command that does the returning.
Why not use stderr? Like so:
(
# Our long-running process that exits abnormally
( for i in {1..100} ; do echo ploop ; sleep 0.5 ; done ; exit 5 )
echo $? 1>&2 # We pass the exit status of our long-running process to stderr (fd 2).
) | tee ploop.out
So ploop.out receives the stdout. stderr receives the exit status of the long running process. This has the benefit of being completely POSIX-compatible.
(Well, with the exception of the range expression in the example long-running process, but that's not really relevant.)
Here's what this looks like:
...
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
5
Note that the return code 5 does not get output to the file ploop.out.
I want to execute a long running command in Bash, and both capture its exit status, and tee its output.
So I do this:
command | tee out.txt
ST=$?
The problem is that the variable ST captures the exit status of tee and not of command. How can I solve this?
Note that command is long running and redirecting the output to a file to view it later is not a good solution for me.
There is an internal Bash variable called $PIPESTATUS; it’s an array that holds the exit status of each command in your last foreground pipeline of commands.
<command> | tee out.txt ; test ${PIPESTATUS[0]} -eq 0
Or another alternative which also works with other shells (like zsh) would be to enable pipefail:
set -o pipefail
...
The first option does not work with zsh due to a little bit different syntax.
Dumb solution: Connecting them through a named pipe (mkfifo). Then the command can be run second.
mkfifo pipe
tee out.txt < pipe &
command > pipe
echo $?
using bash's set -o pipefail is helpful
pipefail: the return value of a pipeline is the status of
the last command to exit with a non-zero status,
or zero if no command exited with a non-zero status
There's an array that gives you the exit status of each command in a pipe.
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo $?
0
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo ${PIPESTATUS[*]}
1 0
$ touch x
$ cat x| sed 's'
sed: 1: "s": substitute pattern can not be delimited by newline or backslash
$ echo ${PIPESTATUS[*]}
0 1
This solution works without using bash specific features or temporary files. Bonus: in the end the exit status is actually an exit status and not some string in a file.
Situation:
someprog | filter
you want the exit status from someprog and the output from filter.
Here is my solution:
((((someprog; echo $? >&3) | filter >&4) 3>&1) | (read xs; exit $xs)) 4>&1
echo $?
See my answer for the same question on unix.stackexchange.com for a detailed explanation and an alternative without subshells and some caveats.
By combining PIPESTATUS[0] and the result of executing the exit command in a subshell, you can directly access the return value of your initial command:
command | tee ; ( exit ${PIPESTATUS[0]} )
Here's an example:
# the "false" shell built-in command returns 1
false | tee ; ( exit ${PIPESTATUS[0]} )
echo "return value: $?"
will give you:
return value: 1
So I wanted to contribute an answer like lesmana's, but I think mine is perhaps a little simpler and slightly more advantageous pure-Bourne-shell solution:
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; printf $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out - command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, printf will execute and print icommand1's exit code on its stdout, but that stdout is redirected to file descriptor 3.
While command1 is running, its stdout is being piped to command2 (printf's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor 1 - because we want file descriptor 1 free for a little bit later, because we will bring the printf output on file descriptor 3 back down into file descriptor 1 - because that's what the command substitution (the backticks), will capture and that's what will get placed into the variable.
The final bit of magic is that first exec 4>&1 we did as a separate command - it opens file descriptor 4 as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it - but since command2's output is going to file descriptor 4 as far as the command substitution is concerned, the command substitution doesn't capture it - however once it gets "out" of the command substitution it is effectively still going to the script's overall file descriptor 1.
(The exec 4>&1 has to be a separate command because many common shells don't like it when you try to write to a file descriptor inside a command substitution, that is opened in the "external" command that is using the substitution. So this is the simplest portable way to do it.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the printf's output jumps over command 2 so that command2 doesn't catch it, and then command 2's output jumps over and out of the command substitution just as printf lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its merry way being written to the standard output, just as in a normal pipe.
Also, as I understand it, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out - this, and not having to define an additional function, is why I think this might be a somewhat better solution than the one proposed by lesmana.
Per the caveats lesmana mentions, it's possible that command1 will at some point end up using file descriptors 3 or 4, so to be more robust, you would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; printf $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Note that I use compound commands in my example, but subshells (using ( ) instead of { } will also work, though may perhaps be less efficient.)
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
I'm not sure how often things use file descriptor three and four directly - I think most of the time programs use syscalls that return not-used-at-the-moment file descriptors, but sometimes code writes to file descriptor 3 directly, I guess (I could imagine a program checking a file descriptor to see if it's open, and using it if it is, or behaving differently accordingly if it's not). So the latter is probably best to keep in mind and use for general-purpose cases.
(command | tee out.txt; exit ${PIPESTATUS[0]})
Unlike #cODAR's answer this returns the original exit code of the first command and not only 0 for success and 127 for failure. But as #Chaoran pointed out you can just call ${PIPESTATUS[0]}. It is important however that all is put into brackets.
In Ubuntu and Debian, you can apt-get install moreutils. This contains a utility called mispipe that returns the exit status of the first command in the pipe.
Outside of bash, you can do:
bash -o pipefail -c "command1 | tee output"
This is useful for example in ninja scripts where the shell is expected to be /bin/sh.
The simplest way to do this in plain bash is to use process substitution instead of a pipeline. There are several differences, but they probably don't matter very much for your use case:
When running a pipeline, bash waits until all processes complete.
Sending Ctrl-C to bash makes it kill all the processes of a pipeline, not just the main one.
The pipefail option and the PIPESTATUS variable are irrelevant to process substitution.
Possibly more
With process substitution, bash just starts the process and forgets about it, it's not even visible in jobs.
Mentioned differences aside, consumer < <(producer) and producer | consumer are essentially equivalent.
If you want to flip which one is the "main" process, you just flip the commands and the direction of the substitution to producer > >(consumer). In your case:
command > >(tee out.txt)
Example:
$ { echo "hello world"; false; } > >(tee out.txt)
hello world
$ echo $?
1
$ cat out.txt
hello world
$ echo "hello world" > >(tee out.txt)
hello world
$ echo $?
0
$ cat out.txt
hello world
As I said, there are differences from the pipe expression. The process may never stop running, unless it is sensitive to the pipe closing. In particular, it may keep writing things to your stdout, which may be confusing.
PIPESTATUS[#] must be copied to an array immediately after the pipe command returns.
Any reads of PIPESTATUS[#] will erase the contents.
Copy it to another array if you plan on checking the status of all pipe commands.
"$?" is the same value as the last element of "${PIPESTATUS[#]}",
and reading it seems to destroy "${PIPESTATUS[#]}", but I haven't absolutely verified this.
declare -a PSA
cmd1 | cmd2 | cmd3
PSA=( "${PIPESTATUS[#]}" )
This will not work if the pipe is in a sub-shell. For a solution to that problem,
see bash pipestatus in backticked command?
Base on #brian-s-wilson 's answer; this bash helper function:
pipestatus() {
local S=("${PIPESTATUS[#]}")
if test -n "$*"
then test "$*" = "${S[*]}"
else ! [[ "${S[#]}" =~ [^0\ ] ]]
fi
}
used thus:
1: get_bad_things must succeed, but it should produce no output; but we want to see output that it does produce
get_bad_things | grep '^'
pipeinfo 0 1 || return
2: all pipeline must succeed
thing | something -q | thingy
pipeinfo || return
Pure shell solution:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (cat || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
hello world
And now with the second cat replaced by false:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (false || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
Some command failed:
Second command failed: 1
First command failed: 141
Please note the first cat fails as well, because it's stdout gets closed on it. The order of the failed commands in the log is correct in this example, but don't rely on it.
This method allows for capturing stdout and stderr for the individual commands so you can then dump that as well into a log file if an error occurs, or just delete it if no error (like the output of dd).
It may sometimes be simpler and clearer to use an external command, rather than digging into the details of bash. pipeline, from the minimal process scripting language execline, exits with the return code of the second command*, just like a sh pipeline does, but unlike sh, it allows reversing the direction of the pipe, so that we can capture the return code of the producer process (the below is all on the sh command line, but with execline installed):
$ # using the full execline grammar with the execlineb parser:
$ execlineb -c 'pipeline { echo "hello world" } tee out.txt'
hello world
$ cat out.txt
hello world
$ # for these simple examples, one can forego the parser and just use "" as a separator
$ # traditional order
$ pipeline echo "hello world" "" tee out.txt
hello world
$ # "write" order (second command writes rather than reads)
$ pipeline -w tee out.txt "" echo "hello world"
hello world
$ # pipeline execs into the second command, so that's the RC we get
$ pipeline -w tee out.txt "" false; echo $?
1
$ pipeline -w tee out.txt "" true; echo $?
0
$ # output and exit status
$ pipeline -w tee out.txt "" sh -c "echo 'hello world'; exit 42"; echo "RC: $?"
hello world
RC: 42
$ cat out.txt
hello world
Using pipeline has the same differences to native bash pipelines as the bash process substitution used in answer #43972501.
* Actually pipeline doesn't exit at all unless there is an error. It executes into the second command, so it's the second command that does the returning.
Why not use stderr? Like so:
(
# Our long-running process that exits abnormally
( for i in {1..100} ; do echo ploop ; sleep 0.5 ; done ; exit 5 )
echo $? 1>&2 # We pass the exit status of our long-running process to stderr (fd 2).
) | tee ploop.out
So ploop.out receives the stdout. stderr receives the exit status of the long running process. This has the benefit of being completely POSIX-compatible.
(Well, with the exception of the range expression in the example long-running process, but that's not really relevant.)
Here's what this looks like:
...
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
5
Note that the return code 5 does not get output to the file ploop.out.
I have this Bash code that runs Scala test code:
scripts=(
Hello.scala
)
for script in "${scripts[#]}"; do
echo scala "${script}"
scala -nocompdaemon "${script}" > >(tee -a _testoutput.txt) \
2> >(tee -a _testerrors.txt >&2)
done
How can I interpret >(tee -a _testoutput.txt)? I normally use | (pipe) for using tee. What's the difference when using this expression?
In this case I believe > >(tee -a _testoutput.txt) and | tee -a _testoutput.txt would behave identically.
The standard error version of that is obviously necessary since there is no standard error pipe.
The other main difference between the pipe version and the process substitution (>(...)) version is where the subshell happens.
If, for example, the >(...) were on the entire loop and you needed variables set in the loop to persist outside of the loop you couldn't do that with the pipe version (see Bash FAQ 24 for more about this).
One additional difference, correctly pointed out by Charles Duffy, is that a pipe affects the exit status of the pipeline (by default you get the exit status of the final command in the pipeline though set -o pipefail changes that and the Bash PIPESTATUS array holds all of the exit statuses). Process substitution, on the other hand, doesn't affect the exit status.
>( list ) is called "process substitution". It's more powerful than a normal pipe: You can't use | to redirect standard output and standard error to different programs so easily.
UPDATE: While not actually solving the original problem presented with regards to my piping endeavours, I've solved my problem by simplifying it greatly, and just ditching the pipes altogether. Here's a proof-of-concept script that generates, in parallel while reading just once from disk, CRC32, MD5, SHA1, SHA224, SHA256, SHA384, and SHA512 checksums, and returning them as a JSON object (gonna use the output in PHP and Ruby). It's crude without error checking, but it works:
#!/bin/bash
checksums="`tee <"$1" \
>( cfv -C -q -t sfv -f - - | tail -n 1 | sed -e 's/^.* \([a-fA-F0-9]\{8\}\)$/"crc32":"\1"/' ) \
>( md5sum - | sed -e 's/^\([a-fA-F0-9]\{32\}\) .*$/"md5":"\1"/' ) \
>( sha1sum - | sed -e 's/^\([a-fA-F0-9]\{40\}\) .*$/"sha1":"\1"/' ) \
>( sha224sum - | sed -e 's/^\([a-fA-F0-9]\{56\}\) .*$/"sha224":"\1"/' ) \
>( sha256sum - | sed -e 's/^\([a-fA-F0-9]\{64\}\) .*$/"sha256":"\1"/' ) \
>( sha384sum - | sed -e 's/^\([a-fA-F0-9]\{96\}\) .*$/"sha384":"\1"/' ) \
>( sha512sum - | sed -e 's/^\([a-fA-F0-9]\{128\}\) .*$/"sha512":"\1"/') \
>/dev/null`\
"
json="{"
for checksum in $checksums; do json="$json$checksum,"; done
echo "${json:0: -1}}"
THE ORIGINAL QUESTION:
I'm a bit afraid to ask this question, as I got so many hits on my search phrase that after applying the knowledge harvested from Using named pipes with bash - Problem with data loss, and reading through another 20 pages, I still am at a bit of a stand-still with this.
So, to continue nevertheless, I'm doing a simple script to enable me to concurrently create CRC32, MD5, and SHA1 checksums on a file while only reading it from disk once. I'm using cfv for that purpose.
Originally, I just hacked together a simply script that wrote cat'ted the file to tee with three cfv commands writing to three separate files under /tmp/, and then attempted to cat them out to stdout afterwards, but ended up with empty output unless I made my script sleep for a second before trying to read the files. Thinking that was weird, I assumed I was a moron in my scripting, so I tried to do a different approach by having the cfv workers output to a named pipe instead. So far, this is my script, after having applied techniques from forementioned link:
!/bin/bash
# Bail out if argument isn't a file:
[ ! -f "$1" ] && echo "'$1' is not a file!" && exit 1
# Choose a name for a pipe to stuff with CFV output:
pipe="/tmp/pipe.chksms"
# Don't leave an orphaned pipe on exiting or being terminated:
trap "rm -f $pipe; exit" EXIT TERM
# Create the pipe (except if it already exists (e.g. SIGKILL'ed b4)):
[ -p "$pipe" ] || mkfifo $pipe
# Start a background process that reads from the pipe and echoes what it
# receives to stdout (notice the pipe is attached last, at done):
while true; do
while read line; do
[ "$line" = "EOP" ] && echo "quitting now" && exit 0
echo "$line"
done
done <$pipe 3>$pipe & # This 3> business is to make sure there's always
# at least one producer attached to the pipe (the
# consumer loop itself) until we're done.
# This sort of works without "hacks", but tail errors out when the pipe is
# killed, naturally, and script seems to "hang" until I press enter after,
# which I believe is actually EOF to tail, so it's no solution anyway:
#tail -f $pipe &
tee <"$1" >( cfv -C -t sfv -f - - >$pipe ) >( cfv -C -t sha1 -f - - >$pipe ) >( cfv -C -t md5 -f - - >$pipe ) >/dev/null
#sleep 1s
echo "EOP" >$pipe
exit
So, executed as it stands, I get this output:
daniel#lnxsrv:~/tisso$ ./multisfv file
: : : quitting now
- : Broken pipe (CF)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
- : Broken pipe (CF)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
- : Broken pipe (CF)
daniel#lnxsrv:~/tisso$ close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
But, with the sleep 1s commented out, I the get expected output,
daniel#lnxsrv:~/tisso$ ./multisfv file
3bc1b5ff125e03fb35491e7d67014a3e *
-: 1 files, 1 OK. 0.013 seconds, 79311.7K/s
5e3bb0e3ec410a8d8e14fef1a6daababfc48c7ce *
-: 1 files, 1 OK. 0.016 seconds, 62455.0K/s
; Generated by cfv v1.18.3 on 2012-03-09 at 23:45.23
;
2a0feb38
-: 1 files, 1 OK. 0.051 seconds, 20012.9K/s
quitting now
This puzzles me, as I'd assume that tee wouldn't exit until after each cfv recipient it forks data to has exited, and thus the echo "EOP" statement would execute until all cfv substreams have finished, which would mean they'd have written their output to my named pipe... And then the echo statement would execute.
As the behavior is the same without pipes, just using output temporary files, I'm thinking this must be some race condition having to do with the way tee pushes data onto its recipients? I tried a simple "wait" command, but it'll of course wait for my bash child process - the while loop - to finish, so I just get a hanging process.
Any ideas?
TIA,
Daniel :)
tee will exit once it writes the last bit of input to the last output pipe and closes it (that is, the unnamed pipes created by bash, not your fifo, aka "named pipe"). It has no need to wait for the processes reading the pipes to finish; indeed, it has no idea that it is even writing to pipes. Since pipes have buffers, it's quite likely that tee finishes writing before the processes at the other end have finished reading. So the script will write 'EOP' into the fifo, causing the read loop to terminate. That will close the fifo's only reader, and all the cfv processes will get SIGPIPE when they next try to write to stdout.
The obvious question to ask here is why you don't just run three (or N) independent processes reading the file and computing different summaries. If "the file" were actually being generated on the fly or downloaded from some remote site, or some other slow process, it might make sense to do things the way you're trying to do them, but if the file is present on local disk, it's pretty likely that only one disk access will actually happen; the lagging summarizers will read the file from the buffer cache. If that's all you need, GNU parallel should work fine, or you can just start up the processes in bash (with &) and then wait for them. YMMV but I would think that either of these solutions will be less resource-intensive than setting up all those pipes and simulating the buffer cache in userland with tee.
By the way, if you want to serialize output from multiple processes, you can use the flock utility. Just using a fifo isn't enough; there is no guarantee that the processes writing to the fifo will write entire lines atomically and if you knew they did that, you wouldn't need the fifo.
I'd like to redirect the stdout of process proc1 to two processes proc2 and proc3:
proc2 -> stdout
/
proc1
\
proc3 -> stdout
I tried
proc1 | (proc2 & proc3)
but it doesn't seem to work, i.e.
echo 123 | (tr 1 a & tr 1 b)
writes
b23
to stdout instead of
a23
b23
Editor's note:
- >(…) is a process substitution that is a nonstandard shell feature of some POSIX-compatible shells: bash, ksh, zsh.
- This answer accidentally sends the output process substitution's output through the pipeline too: echo 123 | tee >(tr 1 a) | tr 1 b.
- Output from the process substitutions will be unpredictably interleaved, and, except in zsh, the pipeline may terminate before the commands inside >(…) do.
In unix (or on a mac), use the tee command:
$ echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
b23
a23
Usually you would use tee to redirect output to multiple files, but using >(...) you can
redirect to another process. So, in general,
$ proc1 | tee >(proc2) ... >(procN-1) >(procN) >/dev/null
will do what you want.
Under windows, I don't think the built-in shell has an equivalent. Microsoft's Windows PowerShell has a tee command though.
Like dF said, bash allows to use the >(…) construct running a command in place of a filename. (There is also the <(…) construct to substitute the output of another command in place of a filename, but that is irrelevant now, I mention it just for completeness).
If you don't have bash, or running on a system with an older version of bash, you can do manually what bash does, by making use of FIFO files.
The generic way to achieve what you want, is:
decide how many processes should receive the output of your command, and create as many FIFOs, preferably on a global temporary folder:
subprocesses="a b c d"
mypid=$$
for i in $subprocesses # this way we are compatible with all sh-derived shells
do
mkfifo /tmp/pipe.$mypid.$i
done
start all your subprocesses waiting input from the FIFOs:
for i in $subprocesses
do
tr 1 $i </tmp/pipe.$mypid.$i & # background!
done
execute your command teeing to the FIFOs:
proc1 | tee $(for i in $subprocesses; do echo /tmp/pipe.$mypid.$i; done)
finally, remove the FIFOs:
for i in $subprocesses; do rm /tmp/pipe.$mypid.$i; done
NOTE: for compatibility reasons, I would do the $(…) with backquotes, but I couldn't do it writing this answer (the backquote is used in SO). Normally, the $(…) is old enough to work even in old versions of ksh, but if it doesn't, enclose the … part in backquotes.
Unix (bash, ksh, zsh)
dF.'s answer contains the seed of an answer based on tee and output process substitutions
(>(...)) that may or may not work, depending on your requirements:
Note that process substitutions are a nonstandard feature that (mostly)
POSIX-features-only shells such as dash (which acts as /bin/sh on Ubuntu,
for instance), do not support. Shell scripts targeting /bin/sh should not rely on them.
echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
The pitfalls of this approach are:
unpredictable, asynchronous output behavior: the output streams from the commands inside the output process substitutions >(...) interleave in unpredictable ways.
In bash and ksh (as opposed to zsh - but see exception below):
output may arrive after the command has finished.
subsequent commands may start executing before the commands in the process substitutions have finished - bash and ksh do not wait for the output process substitution-spawned processes to finish, at least by default.
jmb puts it well in a comment on dF.'s answer:
be aware that the commands started inside >(...) are dissociated from the original shell, and you can't easily determine when they finish; the tee will finish after writing everything, but the substituted processes will still be consuming the data from various buffers in the kernel and file I/O, plus whatever time is taken by their internal handling of data. You can encounter race conditions if your outer shell then goes on to rely on anything produced by the sub-processes.
zsh is the only shell that does by default wait for the processes run in the output process substitutions to finish, except if it is stderr that is redirected to one (2> >(...)).
ksh (at least as of version 93u+) allows use of argument-less wait to wait for the output process substitution-spawned processes to finish.
Note that in an interactive session that could result in waiting for any pending background jobs too, however.
bash v4.4+ can wait for the most recently launched output process substitution with wait $!, but argument-less wait does not work, making this unsuitable for a command with multiple output process substitutions.
However, bash and ksh can be forced to wait by piping the command to | cat, but note that this makes the command run in a subshell. Caveats:
ksh (as of ksh 93u+) doesn't support sending stderr to an output process substitution (2> >(...)); such an attempt is silently ignored.
While zsh is (commendably) synchronous by default with the (far more common) stdout output process substitutions, even the | cat technique cannot make them synchronous with stderr output process substitutions (2> >(...)).
However, even if you ensure synchronous execution, the problem of unpredictably interleaved output remains.
The following command, when run in bash or ksh, illustrates the problematic behaviors (you may have to run it several times to see both symptoms): The AFTER will typically print before output from the output substitutions, and the output from the latter can be interleaved unpredictably.
printf 'line %s\n' {1..30} | tee >(cat -n) >(cat -n) >/dev/null; echo AFTER
In short:
Guaranteeing a particular per-command output sequence:
Neither bash nor ksh nor zsh support that.
Synchronous execution:
Doable, except with stderr-sourced output process substitutions:
In zsh, they're invariably asynchronous.
In ksh, they don't work at all.
If you can live with these limitations, using output process substitutions is a viable option (e.g., if all of them write to separate output files).
Note that tzot's much more cumbersome, but potentially POSIX-compliant solution also exhibits unpredictable output behavior; however, by using wait you can ensure that subsequent commands do not start executing until all background processes have finished.
See bottom for a more robust, synchronous, serialized-output implementation.
The only straightforward bash solution with predictable output behavior is the following, which, however, is prohibitively slow with large input sets, because shell loops are inherently slow.
Also note that this alternates the output lines from the target commands.
while IFS= read -r line; do
tr 1 a <<<"$line"
tr 1 b <<<"$line"
done < <(echo '123')
Unix (using GNU Parallel)
Installing GNU parallel enables a robust solution with serialized (per-command) output that additionally allows parallel execution:
$ echo '123' | parallel --pipe --tee {} ::: 'tr 1 a' 'tr 1 b'
a23
b23
parallel by default ensures that output from the different commands doesn't interleave (this behavior can be modified - see man parallel).
Note: Some Linux distros come with a different parallel utility, which won't work with the command above; use parallel --version to determine which one, if any, you have.
Windows
Jay Bazuzi's helpful answer shows how to do it in PowerShell. That said: his answer is the analog of the looping bash answer above, it will be prohibitively slow with large input sets and also alternates the output lines from the target commands.
bash-based, but otherwise portable Unix solution with synchronous execution and output serialization
The following is a simple, but reasonably robust implementation of the approach presented in tzot's answer that additionally provides:
synchronous execution
serialized (grouped) output
While not strictly POSIX compliant, because it is a bash script, it should be portable to any Unix platform that has bash.
Note: You can find a more full-fledged implementation released under the MIT license in this Gist.
If you save the code below as script fanout, make it executable and put int your PATH, the command from the question would work as follows:
$ echo 123 | fanout 'tr 1 a' 'tr 1 b'
# tr 1 a
a23
# tr 1 b
b23
fanout script source code:
#!/usr/bin/env bash
# The commands to pipe to, passed as a single string each.
aCmds=( "$#" )
# Create a temp. directory to hold all FIFOs and captured output.
tmpDir="${TMPDIR:-/tmp}/$kTHIS_NAME-$$-$(date +%s)-$RANDOM"
mkdir "$tmpDir" || exit
# Set up a trap that automatically removes the temp dir. when this script
# exits.
trap 'rm -rf "$tmpDir"' EXIT
# Determine the number padding for the sequential FIFO / output-capture names,
# so that *alphabetic* sorting, as done by *globbing* is equivalent to
# *numerical* sorting.
maxNdx=$(( $# - 1 ))
fmtString="%0${#maxNdx}d"
# Create the FIFO and output-capture filename arrays
aFifos=() aOutFiles=()
for (( i = 0; i <= maxNdx; ++i )); do
printf -v suffix "$fmtString" $i
aFifos[i]="$tmpDir/fifo-$suffix"
aOutFiles[i]="$tmpDir/out-$suffix"
done
# Create the FIFOs.
mkfifo "${aFifos[#]}" || exit
# Start all commands in the background, each reading from a dedicated FIFO.
for (( i = 0; i <= maxNdx; ++i )); do
fifo=${aFifos[i]}
outFile=${aOutFiles[i]}
cmd=${aCmds[i]}
printf '# %s\n' "$cmd" > "$outFile"
eval "$cmd" < "$fifo" >> "$outFile" &
done
# Now tee stdin to all FIFOs.
tee "${aFifos[#]}" >/dev/null || exit
# Wait for all background processes to finish.
wait
# Print all captured stdout output, grouped by target command, in sequences.
cat "${aOutFiles[#]}"
Since #dF: mentioned that PowerShell has tee, I thought I'd show a way to do this in PowerShell.
PS > "123" | % {
$_.Replace( "1", "a"),
$_.Replace( "2", "b" )
}
a23
1b3
Note that each object coming out of the first command is processed before the next object is created. This can allow scaling to very large inputs.
You can also save the output in a variable and use that for the other processes:
out=$(proc1); echo "$out" | proc2; echo "$out" | proc3
However, that works only if
proc1 terminates at some point :-)
proc1 doesn't produce too much output (don't know what the limits are there but it's probably your RAM)
But it is easy to remember and leaves you with more options on the output you get from the processes you spawned there, e. g.:
out=$(proc1); echo $(echo "$out" | proc2) / $(echo "$out" | proc3) | bc
I had difficulties doing something like that with the | tee >(proc2) >(proc3) >/dev/null approach.
another way to do would be,
eval `echo '&& echo 123 |'{'tr 1 a','tr 1 b'} | sed -n 's/^&&//gp'`
output:
a23
b23
no need to create a subshell here