Incorrect results with bash process substitution and tail? - bash

Using bash process substitution, I want to run two different commands on a file simultaneously. In this example it is not necessary but imagine that "cat /usr/share/dict/words" was a very expensive operation such as uncompressing a 50gb file.
cat /usr/share/dict/words | tee >(head -1 > h.txt) >(tail -1 > t.txt) > /dev/null
After this command I would expect h.txt to contain the first line of the words file "A", and t.txt to contain the last line of the file "Zyzzogeton".
However what actually happens is that h.txt contains "A" but t.txt contains "argillaceo" which is about 5% into the file.
Why does this happen? It seems like either the "tail" process is terminating early or the streams are getting mixed up.
Running another similar command like this behaves as expected:
cat /usr/share/dict/words | tee >(grep ^a > a.txt) >(grep ^z > z.txt) > /dev/null
After this command I'd expect a.txt to contain all the words that begin with "a", while z.txt contains all of the words that begin with "z", which is exactly what happened.
So why doesn't this work with "tail", and with what other commands will this not work?

Ok, what seems to happen is that once the head -1 command finishes it exits and that causes tee to get a SIGPIPE it tries to write to the named pipe that the process substitution setup which generates an EPIPE and according to man 2 write will also generate SIGPIPE in the writing process, which causes tee to exit and that forces the tail -1 to exit immediately, and the cat on the left gets a SIGPIPE as well.
We can see this a little better if we add a bit more to the process with head and make the output both more predictable and also written to stderr without relying on the tee:
for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
which when I run it gave me the output:
1
Head done
2
so it got just 1 more iteration of the loop before everything exited (though t.txt still only has 1 in it). If we then did
echo "${PIPESTATUS[#]}"
we see
141 141
which this question ties to SIGPIPE in a very similar fashion to what we're seeing here.
The coreutils maintainers have added this as an example to their tee "gotchas" for future posterity.
For a discussion with the devs about how this fits into POSIX compliance you can see the (closed notabug) report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22195
If you have access to GNU version 8.24 they have added some options (not in POSIX) that can help like -p or --output-error=warn. Without that you can take a bit of a risk but get the desired functionality in the question by trapping and ignoring SIGPIPE:
trap '' PIPE
for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
trap - PIPE
will have the expected results in both h.txt and t.txt, but if something else happened that wanted SIGPIPE to be handled correctly you'd be out of luck with this approach.
Another hacky option would be to zero out t.txt before starting then not let the head process list finish until it is non-zero length:
> t.txt; for i in {1..10}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done"; while [ ! -s t.txt ]; do sleep 1; done) >(tail -1 > t.txt; date) >/dev/null

Related

Exit tail upon string detection

I'm writing a barrier to stall the execution of a script until a certain keyword is logged. The script is pretty simple:
tail -F -n0 logfile.log | while read LINE; do
[[ "$LINE" == *'STOP'* ]] && echo ${LINE} && break;
done
or
tail -F -n0 logfile.log | grep -m1 STOP
The thing is it doesn't quit as soon as the keyword is detected, but only after the next line is written. I.e:
printf "foo\n" >> logfile.log # keeps reading
printf "foo\n" >> logfile.log # keeps reading
printf "STOP\n" >> logfile.log # STOP printed
printf "foo\n" >> logfile.log # code exits at last
Unfortunately I can't rely on the fact that another line will be logged after the "STOP" (not within an interval useful for my purposes at least).
The workaround found so far is to tail also another file I know for sure gets updated quite frequently, but what is the "clean" solution so that the code will exit right after it logs STOP?
In bash, when executing a command of the form
command1 | command2
and command2 dies or terminates, the pipe which receives /dev/stdout from command1 becomes broken. This, however, does not terminate command1 instantly.
So to achieve what you want is to use process substitution and not a pipe
awk '/STOP/{exit}1' < <(tail -f logfile)
When you use awk, you can see the behaviour in a bit more detail:
$ touch logfile
$ tail -f logfile | awk '/STOP/{exit}1;END{print "end"}'
This awk program will check if "STOP" is seen, and if not print the line again. If "STOP" is seen it will print "end"
When you do in another terminal
$ echo "a" >> logfile
$ echo "STOP" >> logfile
$ echo "b" >> logfile
You see that awk prints the following output:
a # result of print
end # awk test STOP, exits and executes END statement
Furthermore, if you look more closely, you see that awk is at this point already terminated.
ps before sending "STOP":
13625 pts/39 SN 0:00 | \_ bash
32151 pts/39 SN+ 0:00 | \_ tail -f foo
32152 pts/39 SN+ 0:00 | \_ awk 1;/STOP/{exit}1;END{print "end"}
ps after sending "STOP":
13625 pts/39 SN 0:00 | \_ bash
32151 pts/39 SN+ 0:00 | \_ tail -f foo
So the awk program terminated, but tail did not crash because it is not yet aware the pipe is broken as it did not attempt to write to it.
When you do the following in the terminal with the pipeline, you see the exit status of tail:
$ echo "${PIPESTATUS[0]} ${PIPESTATUS[1]}"
$ 141 0
Which states that awk terminated nicely, but tail terminated with exit code 141 which means SIGPIPE.

What is the difference between using process substitution vs. a pipe?

I came across an example for the using tee utility in the tee info page:
wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) > dvd.iso
I looked up the >(...) syntax and found something called "process substitution". From what I understand, it makes a process look like a file that another process could write/append its output to. (Please correct me if I'm wrong on that point.)
How is this different from a pipe? (|) I see a pipe is being used in the above example—is it just a precedence issue? or is there some other difference?
There's no benefit here, as the line could equally well have been written like this:
wget -O - http://example.com/dvd.iso | tee dvd.iso | sha1sum > dvd.sha1
The differences start to appear when you need to pipe to/from multiple programs, because these can't be expressed purely with |. Feel free to try:
# Calculate 2+ checksums while also writing the file
wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) >(md5sum > dvd.md5) > dvd.iso
# Accept input from two 'sort' processes at the same time
comm -12 <(sort file1) <(sort file2)
They're also useful in certain cases where you for any reason can't or don't want to use pipelines:
# Start logging all error messages to file as well as disk
# Pipes don't work because bash doesn't support it in this context
exec 2> >(tee log.txt)
ls doesntexist
# Sum a column of numbers
# Pipes don't work because they create a subshell
sum=0
while IFS= read -r num; do (( sum+=num )); done < <(curl http://example.com/list.txt)
echo "$sum"
# apt-get something with a generated config file
# Pipes don't work because we want stdin available for user input
apt-get install -c <(sed -e "s/%USER%/$USER/g" template.conf) mysql-server
Another major difference is the propagation of return values / exit codes (I'll use simpler commands to illustrate):
Pipe:
$ ls -l /notthere | tee listing.txt
ls: cannot access '/notthere': No such file or directory
$ echo $?
0
-> exit code of tee is propagated
Process substitution:
$ ls -l /notthere > >(tee listing.txt)
ls: cannot access '/notthere': No such file or directory
$ echo $?
2
-> exit code of ls is propagated
There are of course several methods to work around this (e.g. set -o pipefail, variable PIPESTATUS), but I think it's worth mentioning since this is the default behavior.
Yet another rather subtle, yet potentially annoying difference lies in subprocess termination (best illustrated using commands that produce lots of output):
Pipe:
#!/usr/bin/env bash
tar --create --file /tmp/etc-backup.tar --verbose --directory /etc . 2>&1 | tee /tmp/etc-backup.log
retval=${PIPESTATUS[0]}
(( ${retval} == 0 )) && echo -e "\n*** SUCCESS ***\n" || echo -e "\n*** FAILURE (EXIT CODE: ${retval}) ***\n"
-> after the line containing the pipe construct, all commands of the pipe have already terminated (otherwise PIPESTATUS could not contain their respective exit codes)
Process substitution:
#!/usr/bin/env bash
tar --create --file /tmp/etc-backup.tar --verbose --directory /etc . &> >(tee /tmp/etc-backup.log)
retval=$?
(( ${retval} == 0 )) && echo -e "\n*** SUCCESS ***\n" || echo -e "\n*** FAILURE (EXIT CODE: ${retval}) ***\n"
-> after the line containing the process substitution, the command within >(...), i.e. tee in this example, may still be running, potentially causing desynchronized console output (SUCCESS / FAILURE message gets mixed in with still flowing tar output) [*]
[*] Can be reproduced on the framebuffer console, but does not seem to affect GUI terminals like KDE's Konsole (likely due to different buffering strategies).

false | true; echo $? [duplicate]

I currently have a script that does something like
./a | ./b | ./c
I want to modify it so that if any of a, b, or c exit with an error code I print an error message and stop instead of piping bad output forward.
What would be the simplest/cleanest way to do so?
In bash you can use set -e and set -o pipefail at the beginning of your file. A subsequent command ./a | ./b | ./c will fail when any of the three scripts fails. The return code will be the return code of the first failed script.
Note that pipefail isn't available in standard sh.
You can also check the ${PIPESTATUS[]} array after the full execution, e.g. if you run:
./a | ./b | ./c
Then ${PIPESTATUS} will be an array of error codes from each command in the pipe, so if the middle command failed, echo ${PIPESTATUS[#]} would contain something like:
0 1 0
and something like this run after the command:
test ${PIPESTATUS[0]} -eq 0 -a ${PIPESTATUS[1]} -eq 0 -a ${PIPESTATUS[2]} -eq 0
will allow you to check that all commands in the pipe succeeded.
If you really don't want the second command to proceed until the first is known to be successful, then you probably need to use temporary files. The simple version of that is:
tmp=${TMPDIR:-/tmp}/mine.$$
if ./a > $tmp.1
then
if ./b <$tmp.1 >$tmp.2
then
if ./c <$tmp.2
then : OK
else echo "./c failed" 1>&2
fi
else echo "./b failed" 1>&2
fi
else echo "./a failed" 1>&2
fi
rm -f $tmp.[12]
The '1>&2' redirection can also be abbreviated '>&2'; however, an old version of the MKS shell mishandled the error redirection without the preceding '1' so I've used that unambiguous notation for reliability for ages.
This leaks files if you interrupt something. Bomb-proof (more or less) shell programming uses:
tmp=${TMPDIR:-/tmp}/mine.$$
trap 'rm -f $tmp.[12]; exit 1' 0 1 2 3 13 15
...if statement as before...
rm -f $tmp.[12]
trap 0 1 2 3 13 15
The first trap line says 'run the commands 'rm -f $tmp.[12]; exit 1' when any of the signals 1 SIGHUP, 2 SIGINT, 3 SIGQUIT, 13 SIGPIPE, or 15 SIGTERM occur, or 0 (when the shell exits for any reason).
If you're writing a shell script, the final trap only needs to remove the trap on 0, which is the shell exit trap (you can leave the other signals in place since the process is about to terminate anyway).
In the original pipeline, it is feasible for 'c' to be reading data from 'b' before 'a' has finished - this is usually desirable (it gives multiple cores work to do, for example). If 'b' is a 'sort' phase, then this won't apply - 'b' has to see all its input before it can generate any of its output.
If you want to detect which command(s) fail, you can use:
(./a || echo "./a exited with $?" 1>&2) |
(./b || echo "./b exited with $?" 1>&2) |
(./c || echo "./c exited with $?" 1>&2)
This is simple and symmetric - it is trivial to extend to a 4-part or N-part pipeline.
Simple experimentation with 'set -e' didn't help.
Unfortunately, the answer by Johnathan requires temporary files and the answers by Michel and Imron requires bash (even though this question is tagged shell). As pointed out by others already, it is not possible to abort the pipe before later processes are started. All processes are started at once and will thus all run before any errors can be communicated. But the title of the question was also asking about error codes. These can be retrieved and investigated after the pipe finished to figure out whether any of the involved processes failed.
Here is a solution that catches all errors in the pipe and not only errors of the last component. So this is like bash's pipefail, just more powerful in the sense that you can retrieve all the error codes.
res=$( (./a 2>&1 || echo "1st failed with $?" >&2) |
(./b 2>&1 || echo "2nd failed with $?" >&2) |
(./c 2>&1 || echo "3rd failed with $?" >&2) > /dev/null 2>&1)
if [ -n "$res" ]; then
echo pipe failed
fi
To detect whether anything failed, an echo command prints on standard error in case any command fails. Then the combined standard error output is saved in $res and investigated later. This is also why standard error of all processes is redirected to standard output. You can also send that output to /dev/null or leave it as yet another indicator that something went wrong. You can replace the last redirect to /dev/null with a file if yo uneed to store the output of the last command anywhere.
To play more with this construct and to convince yourself that this really does what it should, I replaced ./a, ./b and ./c by subshells which execute echo, cat and exit. You can use this to check that this construct really forwards all the output from one process to another and that the error codes get recorded correctly.
res=$( (sh -c "echo 1st out; exit 0" 2>&1 || echo "1st failed with $?" >&2) |
(sh -c "cat; echo 2nd out; exit 0" 2>&1 || echo "2nd failed with $?" >&2) |
(sh -c "echo start; cat; echo end; exit 0" 2>&1 || echo "3rd failed with $?" >&2) > /dev/null 2>&1)
if [ -n "$res" ]; then
echo pipe failed
fi
This answer is in the spirit of the accepted answer, but using shell variables instead of temporary files.
if TMP_A="$(./a)"
then
if TMP_B="$(echo "TMP_A" | ./b)"
then
if TMP_C="$(echo "TMP_B" | ./c)"
then
echo "$TMP_C"
else
echo "./c failed"
fi
else
echo "./b failed"
fi
else
echo "./a failed"
fi

nice way to kill piped process?

I want to process each stdout-line for a shell, the moment it is created. I want to grab the output of test.sh (a long process). My current approach is this:
./test.sh >tmp.txt &
PID=$!
tail -f tmp.txt | while read line; do
echo $line
ps ${PID} > /dev/null
if [ $? -ne 0 ]; then
echo "exiting.."
fi
done;
But unfortunately, this will print "exiting" and then wait, as the tail -f is still running. I tried both break and exit
I run this on FreeBSD, so I cannot use the --pid= option of some linux tails.
I can use ps and grep to get the pid of the tail and kill it, but thats seems very ugly to me.
Any hints?
why do you need the tail process?
Could you instead do something along the lines of
./test.sh | while read line; do
# process $line
done
or, if you want to keep the output in tmp.txt :
./test.sh | tee tmp.txt | while read line; do
# process $line
done
If you still want to use an intermediate tail -f process, maybe you could use a named pipe (fifo) instead of a regular pipe, to allow detaching the tail process and getting its pid:
./test.sh >tmp.txt &
PID=$!
mkfifo tmp.fifo
tail -f tmp.txt >tmp.fifo &
PID_OF_TAIL=$!
while read line; do
# process $line
kill -0 ${PID} >/dev/null || kill ${PID_OF_TAIL}
done <tmp.fifo
rm tmp.fifo
I should however mention that such a solution presents several heavy problems of race conditions :
the PID of test.sh could be reused by another process;
if the test.sh process is still alive when you read the last line, you won't have any other occasion to detect its death afterwards and your loop will hang.

Pipe command output, but keep the error code [duplicate]

This question already has answers here:
Pipe output and capture exit status in Bash
(16 answers)
Closed 5 years ago.
How do I get the correct return code from a unix command line application after I've piped it through another command that succeeded?
In detail, here's the situation :
$ tar -cEvhf - -I ${sh_tar_inputlist} | gzip -5 -c > ${sh_tar_file} -- when only the tar command fails $?=0
$ echo $?
0
And, what I'd like to see is:
$ tar -cEvhf - -I ${sh_tar_inputlist} 2>${sh_tar_error_file} | gzip -5 -c > ${sh_tar_file}
$ echo $?
1
Does anyone know how to accomplish this?
Use ${PIPESTATUS[0]} to get the exit status of the first command in the pipe.
For details, see http://tldp.org/LDP/abs/html/internalvariables.html#PIPESTATUSREF
See also http://cfajohnson.com/shell/cus-faq-2.html for other approaches if your shell does not support $PIPESTATUS.
Look at $PIPESTATUS which is an array variable holding exit statuses. So ${PIPESTATUS[0]} holds the exit status of the first command in the pipe, ${PIPESTATUS[1]} the exit status of the second command, and so on.
For example:
$ tar -cEvhf - -I ${sh_tar_inputlist} | gzip -5 -c > ${sh_tar_file}
$ echo ${PIPESTATUS[0]}
To print out all statuses use:
$ echo ${PIPESTATUS[#]}
Here is a general solution using only POSIX shell and no temporary files:
Starting from the pipeline:
foo | bar | baz
exec 4>&1
error_statuses=`((foo || echo "0:$?" >&3) |
(bar || echo "1:$?" >&3) |
(baz || echo "2:$?" >&3)) 3>&1 >&4`
exec 4>&-
$error_statuses contains the status codes of any failed processes, in random order, with indexes to tell which command emitted each status.
# if "bar" failed, output its status:
echo $error_statuses | grep '1:' | cut -d: -f2
# test if all commands succeeded:
test -z "$error_statuses"
# test if the last command succeeded:
echo $error_statuses | grep '2:' >/dev/null
As others have pointed out, some modern shells provide PIPESTATUS to get this info. In classic sh, it's a bit more difficult, and you need to use a fifo:
#!/bin/sh
trap 'rm -rf $TMPDIR' 0
TMPDIR=$( mktemp -d )
mkfifo ${FIFO=$TMPDIR/fifo}
cmd1 > $FIFO &
cmd2 < $FIFO
wait $!
echo The return value of cmd1 is $?
(Well, you don't need to use a fifo. You can have the commands early in the pipe echo a status variable and eval that in the main shell, redirecting file descriptors all over the place and basically bending over backwards to check things, but using a fifo is much, much easier.)

Resources