Broken pipe in tee with process substituion - bash

I just found out about process substitution using >() and am super excited about it, however when I tried it, it doesn't always work. e.g.
This works:
cat /usr/share/dict/words |tee >(tail -1) > /dev/null
ZZZ
And this gives a broken pipe error:
cat /usr/share/dict/words |tee >(head -1) > /dev/null
1080
tee: /dev/fd/63: Broken pipe
Any idea why?
Thanks!
Update: This is on RHEL 4 and RHEL 6.2

here's an explanation of why you get the error with head but not with tail:
head -1 only has to read one line of its input. then it will exit and the tee continues feeding its output into...
tail -1 on the other hand has to read the complete input in order to complete its job, so it will never terminate the pipe before tee is finished.
you can safely ignore the broken pipe message and many programs stopped reporting such errors. on my machine I don't see it.

Related

How to continue a while loop in bash if certain message printed by program?

I'm running a bash script using some software which follows the basic pattern below.
while read sample; do
software ${sample} > output.txt
done <samples.txt
For certain samples this message is printed: "The site Pf3D7_02_v3:274217 overlaps with another variant, skipping..."
This message does not stop the software running but makes the results false. Therefore if the message is given I'd like to stop the software and continue the while loop moving onto the next sample. There are lots of samples in samples.txt which is why I can't do this manually. A way of denoting which sample the message is for would also help. As it is I just get many lines of that message with out knowing which loop the message was given for.
Is it possible to help with this?
Fyi the program I'm using is called bcftools consensus. Do let me know if I need to give more information.
Edit: added "> output.txt" - realised I'd stripped it down too much
Edit 2: Here is the full piece of script using a suggestion by chepner below. Sorry it's a bit arduous:
mkfifo p
while IFS= read -r sample; do
bcftools consensus --fasta-ref $HOME/Pf/MSP2_3D7_I_region_ref_noprimer.fasta --sample ${sample} --missing N $EPHEMERAL/bam/Pf_eph/MSP2_I_PfC_Final/Pf_60_public_Pf3D7_02_v3.final.normalised_bcf.vcf.gz --output ${sample}_MSP2_I_consensus_seq.fasta | tee p &
grep -q -m 1 "The site Pf3D7_02_v3" p && kill $!
done <$HOME/Pf/Pf_git/BF_Mali_samples.txt
rm p
I would use a named pipe to grep the output as it is produced.
mkfifo p
while IFS= read -r sample; do
software "$sample" > p &
tee < p output.txt | grep -q -m 1 "The site Pf3D7_02_v3:274217" p && kill $!
done < samples.txt
rm p
software will write its output to the named pipe in the background, but block until tee starts reading. tee will read from the pipe and write that data both to your output file and to grep. If grep finds a match, it will exit and cause kill to terminate software (if it has not already terminated).
If your version of grep doesn't support the -m option (it's common, but non-standard), you can use awk instead.
awk '/The site Pf3D7_02:v3:274217/ { exit 1; }' p && kill $!
while read -u3 sample; do
software ${sample} |
tee output.txt |
{ grep -q -m 1 "The site Pf3D7_02_v3:274217" && cat <&3 }
done 3< samples.txt
The input file is redirected on file descriptor 3. The idea is to eat everything from the 3rd file descriptor if the specified text is detected. Because we redirect output to a file, it's easy to tee output.txt and then check grep for the string. If grep is successful, then we cat <&3 eat everything from the input, so the next read -u3 will fail.
Or:
while read sample; do
if
software ${sample} |
tee output.txt |
grep -q -m 1 "The site Pf3D7_02_v3:274217"
then
break;
fi
done < samples.txt
Because the exit status of the pipeline is the command last executed, we can just check if grep returns with success and then break the loop.

Why does command-with-error | grep 'regex-not-matching-error' still print an error?

I am learning bash.
I ran following code and got some output,
$ cat non_exist_file | grep -e 'dummy'
cat: non_exist_file: No such file or directory
It was strange for me because I expected the output should have nothing.
I have read a instruction in bash manual below,
Pipelines
[time [-p]] [ ! ] command [ [|⎪|&] command2 ... ]
...
If |& is used, command's standard error, in addition to its standard output,
is connected to command2's standard input through the pipe; it is shorthand
for 2>&1 |.
On the basis instruction above, I expected the pipeline passes the error message,
cat: non_exist_file: No such file or directory
to the grep as standard input. And the final output will be nothing because any word in the error message does not match. However I got the error message.
What is happened to the code above? I am afraid I made a cheap misunderstanding. Please teach me.
| only connects standard output, but cat prints the error message (as expected) to standard error.
As the man page says, use |& to also connect standard error to grep's standard input:
cat non_exist_file |& grep -e 'dummy'
another option with same result as the last answer, this will redirect stderr to stdout
cat non_exist_file 2>&1 | grep -e 'dummy'

bzip2 - Broken pipe

I have the following code in my shell script:
bzip2 -dc $filename | head -10 > $output
Sometimes I'm getting this error (debug output enabled):
+ head -10
+ bzip2 -dc mylog.bz2
bzip2: I/O or other error, bailing out. Possible reason follows.
bzip2: Broken pipe
Input file = mylog.bz2, output file = (stdout)
It looks like head command is exiting abruptly and bzip2 receives SIGPIPE.
What can I do with this? I need to be sure that first 10 lines will be in the $output file no matter what. There is no guarantee that this is always the case if one of the processes fails miserably I guess.
The bzip command will fail when the head command quits after having outputted its lines. There is no data loss; the head command has done its job.
If you are concerned about this, you may replace the call to head with a sed script that does the same thing:
bzip -dc "$filename" | sed -n '1,10p' >"$output"
This sed script will read all the data from the pipe but not quit when done with line 10.
It looks like head command is exiting abruptly and bzip2 receives SIGPIPE.
What do you expect head does? It reads as much from the input as it's configured to output, then shuts down. This is pretty much by design.
Also:
head -10
My version of head expects something that's more like
head -n10
You can use xargs for avoiding empty content to the head from the pipe which could be the cause for SIGPIPE. This way even if bzip2 doesn't provide any output you wouldn't be seeing any errors.
bzip2 -dc $filename | xargs -r head -10 > $output
where the option -r says
-r If the standard input does not contain any nonblanks, do not run the command. Normally, the command is run once even if there is no input. This option is a GNU extension.

why cant I redirect the output from sed to a file

I am trying to run the following command
./someprogram | tee /dev/tty | sed 's/^.\{2\}//' > output_file
But the file is always blank when I go to check it. If I remove > output_file from the end of the command, I am able to see the output from sed without any issues.
Is there any way that I can redirect the output from sed in this command to a file?
Remove output-buffering from sed command using the -u flag and make sure what you want to log isn't on stderr
-u, --unbuffered
load minimal amounts of data from the input files and flush the output buffers more often
Final command :
./someprogram | tee /dev/tty | sed -u 's/^.\{2\}//' > output_file
This happens with streams (usually a program sending output to stdout during its whole lifetime).
sed / grep and other commands do some buffering in those cases and you have to explicitly disable it to be able to have an output while the program is still running.
You got a Stderr & stdout problem. Checkout In the shell, what does " 2>&1 " mean? on this topic. Should fix you right up.

Direct output to standard output and an output file simultaneously

I know that
./executable &>outputfile
will redirect the standard output and standard error to a file. This is what I want, but I would also like the output to continue to be printed in the terminal. What is the best way to do this?
Ok, here is my exact command: I have tried
./damp2Plan 10 | tee log.txt
and
./damp2Plan 10 2>&1 | tee log.txt
where 10 is just an argument passed to main. Neither work correctly. The result is that the very first printf statement in the code does go to terminal and log.txt just fine, but none of the rest do. I'm on Ubuntu 12.04 (Precise Pangolin).
Use tee:
./executable 2>&1 | tee outputfile
tee outputs in chunks and there may be some delay before you see any output. If you want closer to real-time output, you could redirect to a file as you are now, and monitor it with tail -f in a different shell:
./executable 2>&1 > outputfile
tail -f outputfile

Resources