How to continue a while loop in bash if certain message printed by program? - bash

I'm running a bash script using some software which follows the basic pattern below.
while read sample; do
software ${sample} > output.txt
done <samples.txt
For certain samples this message is printed: "The site Pf3D7_02_v3:274217 overlaps with another variant, skipping..."
This message does not stop the software running but makes the results false. Therefore if the message is given I'd like to stop the software and continue the while loop moving onto the next sample. There are lots of samples in samples.txt which is why I can't do this manually. A way of denoting which sample the message is for would also help. As it is I just get many lines of that message with out knowing which loop the message was given for.
Is it possible to help with this?
Fyi the program I'm using is called bcftools consensus. Do let me know if I need to give more information.
Edit: added "> output.txt" - realised I'd stripped it down too much
Edit 2: Here is the full piece of script using a suggestion by chepner below. Sorry it's a bit arduous:
mkfifo p
while IFS= read -r sample; do
bcftools consensus --fasta-ref $HOME/Pf/MSP2_3D7_I_region_ref_noprimer.fasta --sample ${sample} --missing N $EPHEMERAL/bam/Pf_eph/MSP2_I_PfC_Final/Pf_60_public_Pf3D7_02_v3.final.normalised_bcf.vcf.gz --output ${sample}_MSP2_I_consensus_seq.fasta | tee p &
grep -q -m 1 "The site Pf3D7_02_v3" p && kill $!
done <$HOME/Pf/Pf_git/BF_Mali_samples.txt
rm p

I would use a named pipe to grep the output as it is produced.
mkfifo p
while IFS= read -r sample; do
software "$sample" > p &
tee < p output.txt | grep -q -m 1 "The site Pf3D7_02_v3:274217" p && kill $!
done < samples.txt
rm p
software will write its output to the named pipe in the background, but block until tee starts reading. tee will read from the pipe and write that data both to your output file and to grep. If grep finds a match, it will exit and cause kill to terminate software (if it has not already terminated).
If your version of grep doesn't support the -m option (it's common, but non-standard), you can use awk instead.
awk '/The site Pf3D7_02:v3:274217/ { exit 1; }' p && kill $!

while read -u3 sample; do
software ${sample} |
tee output.txt |
{ grep -q -m 1 "The site Pf3D7_02_v3:274217" && cat <&3 }
done 3< samples.txt
The input file is redirected on file descriptor 3. The idea is to eat everything from the 3rd file descriptor if the specified text is detected. Because we redirect output to a file, it's easy to tee output.txt and then check grep for the string. If grep is successful, then we cat <&3 eat everything from the input, so the next read -u3 will fail.
Or:
while read sample; do
if
software ${sample} |
tee output.txt |
grep -q -m 1 "The site Pf3D7_02_v3:274217"
then
break;
fi
done < samples.txt
Because the exit status of the pipeline is the command last executed, we can just check if grep returns with success and then break the loop.

Related

Infinite loop when redirecting output to the input file [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

How to continually process last lines of two files when the files change randomly?

I have the following simple snippet:
#!/bin/bash
tail -f "data/top.right.log" | while read val1
do
val2=$(tail -n 1 "data/top.left.log")
echo $(echo "$val1 - $val2" | bc)
done
top.left.log and top.right.log are files to which some other processes continually write. The bash script simply subtracts the last lines of both files and show a result.
I would like to make the script more efficient. In pseudo-code I would like to do this:
#!/bin/bash
magiccommand "data/top.right.log" "data/top.left.log" | while read val1 val2
do
echo $(echo "$val1 - $val2" | bc)
done
so that whenever top.left.log OR top.right.log changes the echo command is called.
I have already tried various snippets from StackOverflow but often they rely on the fact that the files do not change or that both files contain the same amount of lines which is not my case.
If you have inotify-tools you can use following command:
inotifywait -q -e modify file1 file2
Description:
inotifywait efficiently waits for changes to files using Linux's inotify(7) interface.
It is suitable for waiting for changes to files from shell scripts.
It can either exit once an event occurs, or continually execute and output events as they occur.
An example:
while : ;
do
inotifywait -q -e modify file1 file2
echo `tail -n1 file1`
echo `tail -n1 file2`
done
Create a temporary file that you touch each time the files are processed. If any of the files is newer than the temporary file, process the files again.
#!/bin/bash
log1=top.left.log
log2=top.right.log
tmp=last_change
last_change=0
touch "$tmp"
while : ; do
if [[ $log1 -nt $tmp || $log2 -nt $tmp ]] ; then
touch "$tmp"
x=$(tail -n1 "$log1")
y=$(tail -n1 "$log2")
echo $(( x - y ))
fi
done
You might need to remove the temporary file once the script is killed.
If the files are changing fast, you might miss some lines. Otherwise, adding sleep 1 somewhere would decrease the CPU usage.
Instead of calling tail every time, you can open file descriptors once and read line after line. This makes use of the fact that the files are kept open, and read will always read from the next line of a file.
First, open the files in bash, assigning them file descriptors 3 and 4
exec 3<file1 4<file2
Now, you can read from these files using read -u <fd>. In combination with inotifywait of Dawid's answer, this gives you an efficient way to read files line by line:
while :; do
# TODO: add some break condition
# wait until one of the files has changed
inotifywait -q -e modify file1 file2
# read the next line of file1 into val1_new
# if file1 has not changed and there is no new line, read will return with failure
read -u 3 val1_new && val1="$val1_new"
# same for file2
read -u 4 val2_new && val2="$val2_new"
done
You may extend this by reading until you have reached the last line, or parsing inotifywait's output to detect which file has changed.
A possible way is to parse the output of tail -f and display the difference in value whenever the ==> <== pattern is found.
I came up with this script:
$ cat test.awk
$0 ~ /==>.*right.*<==/ {var=1}
$0 ~ /==>.*left.*<==/ {var=2}
$1~/[0-9]+/ && var==1 { val1=$1 }
$1~/[0-9]+/ && var==2 { val2=$1 }
val1 != "" && val2 != "" && $1~/[0-9]+/{
print val1-val2
}
The script assume the values are integer [0-9]+ in both file.
You can use it like this:
tail -f top.right.log top.left.log | awk -f test.awk
Whenever a value is appended in any of the file, the difference between the last value of each file is displayed.

Processing the real-time last line in a currently being written text file

I have a text file which is in fact open and does logging activities performed by process P1 in the system. I was wondering how I can get the real time content of the last line of this file in a bash script and do "echo" a message, say "done was seen", if the line equals to "done".
You could use something like this :
tail -f log.txt | sed -n '/^done$/q' && echo done was seen
Explanation:
tail -f will output appended data as the file grows
sed -n '/^done$/q' will exit when a line containing only done is encountered, ending the command pipeline.
This should work for you:
tail -f log.txt | grep -q -m 1 done && echo done was seen
The -m flag to grep means "exit after N matches", and the && ensures that the echo statement will only be done on a successful exit from grep.

Grep without filtering

How do I grep without actually filtering, or highlighting?
The goal is to find out if a certain text is in the output, without affecting the output. I could tee to a file and then inspect the file offline, but, if the output is large, that is a waste of time, because it processes the output only after the process is finished:
command | tee file
file=`mktemp`
if grep -q pattern "$file"; then
echo Pattern found.
fi
rm "$file"
I thought I could also use grep's before (-B) and after (-A) flags to achieve live processing, but that won't output anything if there are no matches.
# Won't even work - DON'T USE.
if command | grep -A 1000000 -B 1000000 pattern; then
echo Pattern found.
fi
Is there a better way to achieve this? Something like a "pretend you're grepping and set the exit code, but don't grep anything".
(Really, what I will be doing is to pipe stderr, since I'm looking for a certain error, so instead of command | ... I will use command 2> >(... >&2; result=${PIPESTATUS[*]}), which achieves the same, only it works on stderr.)
If all you want to do is set the exit code if a pattern is found, then this should do the trick:
awk -v rc=1 '/pattern/ { rc=0 } 1; END {exit rc}'
The -v rc=1 creates a variable inside the Awk program called rc (short for "return code") and initializes it to the value 1. The stanza /pattern/ { rc=0 } causes that variable to be set to 0 whenever a line is encountered that matches the regular expression pattern. The 1; is an always-true condition with no action attached, meaning the default action will be taken on every line; that default action is printing the line out, so this filter will copy its input to its output unchanged. Finally, the END {exit rc} runs when there is no more input left to process, and ensures that awk terminates with the value of the rc variable as its process exit status: 0 if a match was found, 1 otherwise.
The shell interprets exit code 0 as true and nonzero as false, so this command is suitable for use as the condition of a shell if or while statement, possibly at the end of a pipeline.
To allow output with search result you can use awk:
command | awk '/pattern/{print "Pattern found"} 1'
This will print "Pattern found" when pattern is matched in any line. (Line will be printed later)
If you want Line to print before then use:
command | awk '{print} /pattern/{print "Pattern found"}'
EDIT: To execute any command on match use:
command | awk '/pattern/{system("some_command")} 1'
EDIT 2: To take care of special characters in keyword use this:
command | awk -v search="abc*foo?bar" 'index($0, search) {system("some_command"); exit} 1'
Try this script. It will not modify anything of output of your-command and sed exit with 0 when pattern is found, 1 otherwise. I think its what you want from my understand of your question and comment.:
if your-command | sed -nr -e '/pattern/h;p' -e '${x;/^.+$/ q0;/^.+$/ !q1}'; then
echo Pattern found.
fi
Below is some test case:
ubuntu-user:~$ if echo patt | sed -nr -e '/pattern/h;p' -e '${x;/^.+$/ q0;/^.+$/ !q1}'; then echo Pattern found.; fi
patt
ubuntu-user:~$ if echo pattern | sed -nr -e '/pattern/h;p' -e '${x;/^.+$/ q0;/^.+$/ !q1}'; then echo Pattern found.; fi
pattern
Pattern found.
Note previous script fails to work when there is no ouput from your-command because then sed will not run sed expression and exit with 0 all the time.
I take it you want to print out each line of your output, but at the same time, track whether or not a particular pattern is found. Simply passing the output to sed or grep would affect the output. You need to do something like this:
pattern=0
command | while read line
do
echo "$line"
if grep -q "$pattern" <<< "$lines"
then
((pattern+=1))
fi
done
if [[ $pattern -gt 0 ]]
then
echo "Pattern was found $pattern times in the output"
else
echo "Didn't find the pattern at all"
fi
ADDENDUM
If the original command has both stdout and stderr output, which come in a specific order, with the two possibly interleaved, then will your solution ensure that the outputs are interleaved as they normally would?
Okay, I think I understand what you're talking about. You want both STDERR and STDOUT to be grepped for this pattern.
STDERR and STDOUT are two different things. They both appear on the terminal window because that's where you put them. The pipe (|) only takes STDOUT. STDERR is left alone. In the above, only the output of STDOUT would be used. If you want both STDOUT and STDERR, you have to redirect STDERR into STDOUT:
pattern=0
command 2>&1 | while read line
do
echo "$line"
if grep -q "$pattern" <<< "$lines"
then
((pattern+=1))
fi
done
if [[ $pattern -gt 0 ]]
then
echo "Pattern was found $pattern times in the output"
else
echo "Didn't find the pattern at all"
fi
Note the 2>&1. This says to take STDERR (which is File Descriptor 2) and redirect it into STDOUT (File Descriptor 1). Now, both will be piped into that while read loop.
The grep -q will prevent grep from printing out its output to STDOUT. It will print to STDERR, but that shouldn't be an issue in this case. Grep only prints out STDERR if it cannot open a file requested, or the pattern is missing.
You can do this:
echo "'search string' appeared $(command |& tee /dev/stderr | grep 'search string' | wc -l) times"
This will print the entire output of command followed by the line:
'search string' appeared xxx times
The trick is, that the tee command is not used to push a copy into a file, but to copy everything in stdout to stderr. The stderr stream is immediately displayed on the screen as it is not connected to the pipe, while the copy on stdout is gobbled up by the grep/wc combination.
Since error messages are usually emitted to stderr, and you said that you want to grep for error messages, the |& operator is used for the first pipe to combine the stderr of command into its stdout, and push both into the tee command.

What is the simplest way to write a bash script to accept arguments and input from all possible direction (similar to sort -k1 -r)?

I want to write a bash script that can handle arguments and input similar to many built-in bash command. For example, like sort, it can handle
sort -k 1 -r input.txt
sort input.txt -k 1 -r
cat input.txt | sort -k 1 -r
sort -k 1 -r < input.txt
sort -k 1 -r <(cat input.txt)
I want my script to be able to handle arguments and input in the similar way
myscript.sh -i 3 -b 4 input.txt
myscript.sh input.txt -i 3 -b 4
cat input.txt | myscript.sh -i 3 -b 4
myscript.sh -i 3 -b 4 < input.txt
myscript.sh -i 3 -b 4 <(cat input.txt)
so far I only used some features from "read" and "getopts" and think that it's may be buggy if I try to do that on my own.
To make me state my question more clearly, let the content of input.text be
aaa
bbb
ccc
and I want to use value from argument i and b to do something but I'll just print it out in this example. The sample output that I want is
i : 3
b : 4
aaa
bbb
ccc
What is the best way to write a code to handle my above sample commands to give out this output?
Below is the code that got from the sandwich idea of #chepner, which is the best one so far.
#!/bin/bash -l
die () {
echo >&2 "[exception] $#"
exit 1
}
#parse param
while getopts "i:b:" OPTION; do
case "$OPTION" in
i)
i="$OPTARG"
;;
b)
b="$OPTARG"
;;
*)
die "unrecognized option"
;;
esac
done
if [ -e tmpfile ]
then
rm tmpfile
fi
shift $(($OPTIND - 1))
echo "i : "$i
echo "b : "$b
cat $1 > tmpfile
if read -t 0; then
cat >> tmpfile
fi
cat tmpfile
Executive summary: you can use read -t 0 to test if there is any input available on standard input. It will exit with status 0 data is immediately available on standard input (via a pipe or a redirected file), and 1 if not (e.g., still connected to the keyboard). You can then branch in your script based on whether or not you need to read from standard input.
if read -t 0; then
# Do one thing
else
# Do something else
fi
The tricky part for me was writing the script so that it does not block reading standard input if you don't pipe anything to it.
This seems to work; improvements welcome. First, consume everything on standard input; then process files given as arguments.
# The first call to read only succeeds when there is input
# available on standard input. It does not actually consume
# a line, though.
if read -t 0; then
# Read standard input normally
while read line; do
echo $line
done
fi
# I'll assume you populate an array called input files
# while processing your arguments
for file in "${inputfiles[#]"; do
cat $file
done
Here's a pointless wrapper around sort just do demonstrate another way of combining standard input with other input files:
if read -t 0; then
cat | sort fileA fileB
else
sort fileA file B
fi
A slightly more useful command might be sandwich, which outputs its standard input (if any) between two files given on the command line.
#!/bin/bash
cat "$1" # Output the first file
read -t 0 && cat # Pass through the standard input, if there is any
cat "$2" # Output the second file
# cat "$1" - "$2" is almost the same, but requires standard input.
Some calls to sandwich could be
$ sandwich header.txt footer.txt
$ sandwich header.txt footer.txt < body.txt
$ cat chapter1.txt chapter2.txt | sandwich <(echo "My header") <(echo "My footer")
This doesn't quite work, so there's room for improvement...
$ cat - | sandwich header.txt footer.txt
cat input.txt | command
command < input.txt
command <(cat input.txt)
These three are identical from the applications viewpoint, the shell does the redirect. The application only has to read stdin.
command input.txt
cat input.txt | command
These differ only in the file you process, in the first one it's input.txt, in the second one it's stdin, for which you can use -, most of the unix commands support it.
# echo "X" | cat -
X
So you set the default: filename=-, and if you read the filename option, you overwrite this variable.

Resources