Bash for loop stops after one iteration without error [duplicate] - bash

This question already has answers here:
What does set -e mean in a bash script?
(10 answers)
Closed 2 years ago.
Lets say I have a file dates.json:
2015-11-01T12:01:52
2015-11-03T03:58:57
2015-11-09T02:43:59
2015-11-10T08:22:00
2015-11-11T05:14:51
2015-11-11T12:47:02
2015-11-13T08:33:40
I want to separate the rows to different files according to the date.
I made the following script:
#!/bin/bash
set -e
file="$1"
for i in $(seq 1 1 31); do
if [ $i -lt 10 ]; then
echo 'looking for 2015-11-0'$i
cat $file | grep "2015-11-0"$i > $i.json
else
echo 'looking for 2015-11-'$i
cat $file | grep "2015-11-"$i > $i.json
fi
done
When I execute I get the following:
$ bash example.sh dates.json
looking for 2015-11-01
looking for 2015-11-02
If I try without the cat... rows the script prints all the echo commands, and if I try only the cat | grep command on the command line it works.
Would you know why does it behave like this?

If you need set -e in other parts of the script, you need to handle grep not to stop your script:
# cat $file | grep "2015-11-0"$i > $i.json
grep "2015-11-0"$i "$file" > $i.json || :

set -e forces script to exit if command exits with non-zero status.
+
grep returns 1 if it fails to find match in file.
+
dates.json has no 2015-11-02
=
error

it's because of our set -e which causes the script to exit. Remove this line, then it should work

Related

snakemake rule calls a shell script but exits after first command

I have a shell script that works well if I just run it from command line. When I call it from a rule within snakemake it fails.
The script runs a for loop over a file of identifiers and uses those to grep the sequences from a fastq file followed by multiple sequence alignment and makes a consensus.
Here is the script. I placed some echo statements in there and for some reason it doesn't call the commands. It stops at the grep statement.
I have tried adding set +o pipefail; in the rule but that doesn't work either.
#!/bin/bash
function Usage(){
echo -e "\
Usage: $(basename $0) -r|--read2 -l|--umi-list -f|--outfile \n\
where: ... \n\
" >&2
exit 1
}
# Check argument count
[[ "$#" -lt 2 ]] && Usage
# parse arguments
while [[ "$#" -gt 1 ]];do
case "$1" in
-r|--read2)
READ2="$2"
shift
;;
-l|--umi-list)
UMI="$2"
shift
;;
-f|--outfile)
OUTFILE="$2"
shift
;;
*)
Usage
;;
esac
shift
done
# Set defaults
# Check arguments
[[ -f "${READ2}" ]] || (echo "Cannot find input file ${READ2}, exiting..." >&2; exit 1)
[[ -f "${UMI}" ]] || (echo "Cannot find input file ${UMI}, exiting..." >&2; exit 1)
#Create output directory
OUTDIR=$(dirname "${OUTFILE}")
[[ -d "${OUTDIR}" ]] || (set -x; mkdir -p "${OUTDIR}")
# Make temporary directories
TEMP_DIR="${OUTDIR}/temp"
[[ -d "${TEMP_DIR}" ]] || (set -x; mkdir -p "${TEMP_DIR}")
#RUN consensus script
for f in $( more "${UMI}" | cut -f1);do
NAME=$(echo $f)
grep "${NAME}" "${READ2}" | cut -f1 -d ' ' | sed 's/#M/M/' > "${TEMP_DIR}/${NAME}.name"
echo subsetting reads
seqtk subseq "${READ2}" "${TEMP_DIR}/${NAME}.name" | seqtk seq -A > "${TEMP_DIR}/${NAME}.fasta"
~/software/muscle3.8.31_i86linux64 -msf -in "${TEMP_DIR}/${NAME}.fasta" -out "${TEMP_DIR}/${NAME}.muscle.fasta"
echo make consensus
~/software/EMBOSS-6.6.0/emboss/cons -sequence "${TEMP_DIR}/${NAME}.muscle.fasta" -outseq "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i 's/n//g' "${TEMP_DIR}/${NAME}.cons.fasta"
sed -i "s/EMBOSS_001/${NAME}.cons/" "${TEMP_DIR}/${NAME}.cons.fasta"
done
cat "${TEMP_DIR}/*.cons.fasta" > "${OUTFILE}"
Snakemake rule:
rule make_consensus:
input:
r2=get_extracted,
lst="{prefix}/{sample}/reads/cell_barcode_umi.count"
output:
fasta="{prefix}/{sample}/reads/fasta/{sample}.R2.consensus.fa"
shell:
"sh ./scripts/make_consensus.sh -r {input.r2} -l {input.lst} -f {output.fasta}"
Edit Snakemake error messages I changed some of the paths to a neutral filepath
RuleException:
CalledProcessError in line 29 of ~/user/scripts/consensus.smk:
Command ' set -euo pipefail; sh ./scripts/make_consensus.sh -r ~/user/file.extracted.fastq -l ~/user/cell_barcode_umi
.count -f ~/user/file.consensus.fa ' returned non-zero exit status 1.
File "~/user/scripts/consensus.smk", line 29, in __rule
_make_consensus
File "~/user/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
If there are better ways to do this than using a shell for loop please let me know!
thanks!
Edit
Script ran as standalone: first grep
grep AGGCCGTTCT_TGTGGATG R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/AGGCCGTTCT_TGTGGATG.name
Script ran through snakemake: first 2 grep statements
grep :::::::::::::: R_extracted/wgs_5_OL_debug.R2.extracted.fastq | cut -f1 -d ' ' | sed 's/#M/M/' > ./fasta/temp/::::::::::::::.name
I'm now trying to figure out where those :::: in snakemake are coming from. All ideas welcome
It stops at the grep statement
My guess is that the grep command in make_consensus.sh doesn't capture anything. grep returns exit code 1 in such cases and the non-zero exit status propagates to snakemake. (see also Handling SIGPIPE error in snakemake)
Loosely related... There is an inconsistency between the shebang of make_consensus.sh that says the script should be executed with bash (#!/bin/bash) and the actual execution using sh (sh ./scripts/make_consensus.sh). (In practice it shouldn't make any difference since sh is probably redirected to bash anyway)

displaying command output in stdout then save to file with transformation?

I have a long-running command which outputs periodically. to demonstrate let's assume it is:
function my_cmd()
{
for i in {1..9}; do
echo -n $i
for j in {1..$i}
echo -n " "
echo $i
sleep 1
done
}
the output will be:
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
I want to display the command output meanwhile save it to a file at the same time.
this can be done by my_cmd | tee -a res.txt.
Now I want to display the output to terminal as-is but save to file with a transformed flavor, say with sed "s/ //g".
so the res.txt becomes:
11
22
33
44
66
77
88
99
how can I do this transformation on-the-fly without waiting for command exits then read the file again?
Note that in your original code, {1..$i} is an error because sequences can't contain variables. I've replaced it with seq. Also, you're missing a do and a done for the inner for loop.
At any rate, I would use process substitution.
#!/usr/bin/env bash
function my_cmd {
for i in {1..9}; do
printf '%d' "$i"
for j in $(seq 1 $i); do
printf ' '
done
printf '%d\n' "$j"
sleep 1
done
}
my_cmd | tee >(tr -d ' ' >> res.txt)
Process substitution usually causes bash to create an entry in /dev/fd which is fed to the command in question. The contents of the substitution run asynchronously, so it doesn't block the process sending data to it.
Note that the process substitution isn't a REAL file, so the -a option for tee is meaningless. If you really want to append to your output file, >> within the substitution is the way to go.
If you don't like process substitution, another option would be to redirect to alternate file descriptors. For example, instead of the last line in the script above, you could use:
exec 5>&1
my_cmd | tee /dev/fd/5 | tr -d ' ' > res.txt
exec 5>&-
This creates a file descriptor, /dev/fd/5, which redirects to your real stdout, the terminal. It then tells tee to write to this, allowing the normal stdout from tee to be processed by additional pipe elements before final redirection to your log file.
The method you choose is up to you. I find process substitution clearer.
Something you need to modify in your function. And you may use tee in the for loop to print and write file at the same time. The following script may get the result you desire.
#!/bin/bash
filename="a.txt"
[ -f $filename ] && rm $filename
for i in {1..9}; do
echo -n $i | tee -a $filename
for((j=1;j<=$i;j++)); do
echo -n " "
done
echo $i | tee -a $filename
sleep 1
done
Instead of double loop, I would use printf and its formatting capability %Xs to pad with blank characters.
Moreover I would use double printing (for stdout and your file) rather than using pipe and starting new processes.
So your function could look like this:
function my_cmd() {
for i in {1..9}; do
printf "%s %${i}s\n" $i $i
printf "%s%s\n" $i $i >> res.txt
done
}

Ignoring all but the (multi-line) results of the last query sent to a program

I have an executable that accepts queries from stdin and responds to them, reading until EOF. Additionally I have an input file and a special command, let's call those EXEC, FILE and CMD respectively.
What I need to do is:
Pass FILE to EXEC as input.
Disregard all the output corresponding to commands read from FILE (/dev/null/).
Pass CMD as the last command.
Fetch output for the last command and save it in a variable.
EXEC's output can be multiline for each query.
I know how to pass FILE + CMD into the EXEC:
echo ${CMD} | cat ${FILE} - | ${EXEC}
but I have no idea how to fetch only output resulting from CMD.
Is there a magical one-liner that does this?
After looking around I've found the following partial solution:
mkfifo mypipe
(tail -f mypipe) | ${EXEC} &
cat ${FILE} | while read line; do
echo ${line} > mypipe
done
echo ${CMD} > mypipe
This allows me to redirect my input, but now the output gets printed to screen. I want to ignore all the output produced by EXEC in the while loop and get only what it prints for the last line.
I tried what first came into my mind, which is:
(tail -f mypipe) | ${EXEC} > somefile &
But it didn't work, the file was empty.
This is race-prone -- I'd suggest putting in a delay after the kill, or using an explicit sigil to determine when it's been received. That said:
#!/usr/bin/env bash
# route FD 4 to your output routine
exec 4> >(
output=; trap 'output=1' USR1
while IFS= read -r line; do
[[ $output ]] && printf '%s\n' "$line"
done
); out_pid=$!
# Capture the PID for the process substitution above; note that this requires a very
# new version of bash (4.4?)
[[ $out_pid ]] || { echo "ERROR: Your bash version is too old" >&2; exit 1; }
# Run your program in another process substitution, and close the parent's handle on FD 4
exec 3> >("$EXEC" >&4) 4>&-
# cat your file to FD 3...
cat "$file" >&3
# UGLY HACK: Wait to let your program finish flushing output from those commands
sleep 0.1
# notify the subshell writing output to disk that the ignored input is done...
kill -USR1 "$out_pid"
# UGLY HACK: Wait to let the subprocess actually receive the signal and set output=1
sleep 0.1
# ...and then write the command for which you actually want content logged.
echo "command" >&3
In validating this answer, I'm doing the following:
EXEC=stub_function
stub_function() {
local count line
count=0
while IFS= read -r line; do
(( ++count ))
printf '%s: %s\n' "$count" "$line"
done
}
cat >file <<EOF
do-not-log-my-output-1
do-not-log-my-output-2
do-not-log-my-output-3
EOF
file=file
export -f stub_function
export file EXEC
Output is only:
4: command
You could pipe it into a sed:
var=$(YOUR COMMAND | sed '$!d')
This will put only the last line into the variable
I think, that your proram EXEC does something special (open connection or remember state). When that is not the case, you can use
${EXEC} < ${FILE} > /dev/null
myvar=$(echo ${CMD} | ${EXEC})
Or with normal commands:
# Do not use (printf "==%s==\n" 1 2 3 ; printf "oo%soo\n" 4 5 6) | cat
printf "==%s==\n" 1 2 3 | cat > /dev/null
myvar=$(printf "oo%soo\n" 4 5 6 | cat)
When you need to give all input to one process, perhaps you can think of a marker that you can filter on:
(printf "==%s==\n" 1 2 3 ; printf "%s\n" "marker"; printf "oo%soo\n" 4 5 6) | cat | sed '1,/marker/ d'
You should examine your EXEC what could be used. When it is running SQL, you might use something like
(cat ${FILE}; echo 'select "DamonMarker" from dual;' ; echo ${CMD} ) |
${EXEC} | sed '1,/DamonMarker/ d'
and write this in a var with
myvar=$( (cat ${FILE}; echo 'select "DamonMarker" from dual;' ; echo ${CMD} ) |
${EXEC} | sed '1,/DamonMarker/ d' )

Display output from last command in bash until loop

until commandThatProducesOutput | grep -m 1 "Done"
do
???
sleep 5
done
While this script is running, I'd like to pipe the output that commandThatProducesOutput produces to the screen but can't seem to get the correct syntax.
How about:
output=$(commandThatProducesOutput)
until echo "$output" | grep -m 1 "Done"
do
echo "$output"
output=$(commandThatProducesOutput)
done

ksh: shell script to search for a string in all files present in a directory at a regular interval

I have a directory (output) in unix (SUN). There are two types of files created with timestamp prefix to the file name. These file are created on a regular interval of 10 minutes.
e. g:
1. 20140129_170343_fail.csv (some lines are there)
2. 20140129_170343_success.csv (some lines are there)
Now I have to search for a particular string in all the files present in the output directory and if the string is found in fail and success files, I have to count the number of lines present in those files and save the output to the cnt_succ and cnt_fail variables. If the string is not found I will search again in the same directory after a sleep timer of 20 seconds.
here is my code
#!/usr/bin/ksh
for i in 1 2
do
grep -l 0140127_123933_part_hg_log_status.csv /osp/local/var/log/tool2/final_logs/* >log_t.txt; ### log_t.txt will contain all the matching file list
while read line ### reading the log_t.txt
do
echo "$line has following count"
CNT=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT=`expr $CNT - 1`
echo $CNT
done <log_t.txt
if [ $CNT > 0 ]
then
exit
fi
echo "waiitng"
sleep 20
done
The problem I'm facing is, I'm not able to get the _success and _fail in file in line and and check their count
I'm not sure about ksh, but while ... do; ... done is notorious for running off with whatever variables you're using in bash. ksh might be similar.
If I've understand your question right, SunOS has grep, uniq and sort AFAIK, so a possible alternative might be...
First of all:
$ cat fail.txt
W34523TERG
ADFLKJ
W34523TERG
WER
ASDTQ34T
DBVSER6
W34523TERG
ASDTQ34T
DBVSER6
$ cat success.txt
abcde
defgh
234523452
vxczvzxc
jkl
vxczvzxc
asdf
234523452
vxczvzxc
dlkjhgl
jkl
wer
234523452
vxczvzxc
And now:
egrep "W34523TERG|ASDTQ34T" fail.txt | sort | uniq -c
2 ASDTQ34T
3 W34523TERG
egrep "234523452|vxczvzxc|jkl" success.txt | sort | uniq -c
3 234523452
2 jkl
4 vxczvzxc
Depending on the input data, you may want to see what options sort has on your system. Examining uniq's options may prove useful too (it can do more than just count duplicates).
Think you want something like this (will work in both bash and ksh)
#!/bin/ksh
while read -r file; do
lines=$(wc -l < "$file")
((sum+=$lines))
done < <(grep -Rl --include="[1|2]*_fail.csv" "somestring")
echo "$sum"
Note this will match files starting with 1 or 2 and ending in _fail.csv, not exactly clear if that's what you want or not.
e.g. Let's say I have two files, one starting with 1 (containing 4 lines) and one starting with 2 (containing 3 lines), both ending in `_fail.csv somewhere under my current working directory
> abovescript
7
Important to understand grep options here
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
and
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
Finaly I'm able to find the solution. Here is the complete code:
#!/usr/bin/ksh
file_name="0140127_123933.csv"
for i in 1 2
do
grep -l $file_name /osp/local/var/log/tool2/final_logs/* >log_t.txt;
while read line
do
if [ $(echo "$line" |awk '/success/') ] ## will check the success file
then
CNT_SUCC=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_SUCC=`expr $CNT_SUCC - 1`
fi
if [ $(echo "$line" |awk '/fail/') ] ## will check the fail file
then
CNT_FAIL=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_FAIL=`expr $CNT_FAIL - 1`
fi
done <log_t.txt
if [ $CNT_SUCC > 0 ] && [ $CNT_FAIL > 0 ]
then
echo " Fail count = $CNT_FAIL"
echo " Success count = $CNT_SUCC"
exit
fi
echo "waitng for next search..."
sleep 10
done
Thanks everyone for your help.
I don't think I'm getting it right, but You can't diffrinciate the files?
maybe try:
#...
CNT=`expr $CNT - 1`
if [ $(echo $line | grep -o "fail") ]
then
#do something with fail count
else
#do something with success count
fi

Resources