how to break from a while loop that reads a file in bash - bash

I was trying to break a loop I created with
ctrl+C while reading a file.
Instead it was just stopping the specific iteration and not
the whole loop.
How can I stop it completely instead of having to ctrl+C all the iterations?
The whole script can be found here
An example file is like that:
echo -e "SRR7637893\nSRR7637894\nSRR7637895\nSRR7637896" > filenames.txt
The specific code chunk that probably makes the issue is the while loop here(set -xv; has been added afterwards as suggested from markp-fuso in comments):
set -xv;
while read -r line; do
echo "Now downloading "${line}"\n"
docker run --rm -v "$OUTPUT_DIR":/data -w /data inutano/sra-toolkit:v2.9.2 fasterq-dump "${line}" -t /data/shm -e $PROCESSORS
if [[ -s $OUTPUT_DIR/${line}.fastq ]]; then
echo "Using pigz on ${line}.fastq"
pigz --best $OUTPUT_DIR/"${line}*.fastq"
else
echo "$OUTPUT_DIR/${line}.fastq not found"
fi
done < "$INPUT_txt"; set +xv

Related

Speed up shell script/Performance enhancement of shell script

Is there a way to speed up the below shell script? It's taking me a good 40 mins to update about 150000 files everyday. Sure, given the volume of files to create & update, this may be acceptable. I don't deny that. However, if there is a much more efficient way to write this or re-write the logic entirely, I'm open to it. Please I'm looking for some help
#!/bin/bash
DATA_FILE_SOURCE="<path_to_source_data/${1}"
DATA_FILE_DEST="<path_to_dest>"
for fname in $(ls -1 "${DATA_FILE_SOURCE}")
do
for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
do
FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
then
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
else
if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
then
sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
fi
fi
done
done
There are still parts of your published script that are unclear like the sed command. Although I rewrote it with saner practices and much less external calls witch should really speed it up.
#!/usr/bin/env sh
DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"
for fname in "$DATA_FILE_SOURCE/"*; do
while IFS=, read -r a b content || [ "$a" ]; do
destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
if grep -Fxq "$content" "$destfile"; then
sed -i "/$1/d" "$destfile"
fi
printf '%s\n' "$content" >>"$destfile"
done < "$fname"
done
Make it parallel (as much as you can).
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
# ...
for fname in "${DATA_FILE_SOURCE}/"*; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
while IFS= read -r line; do
FILE_TO_WRITE_TO="..."
# ...
done < "${fname}" & # forking here
pids[$!]="${fname}"
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Here’s a directly runnable skeleton showing how the harness above works (with 36 items to process and 20 parallel processes at most):
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
do_something_and_maybe_fail() {
sleep $((RANDOM % 10))
return $((RANDOM % 2 * 5))
}
for fname in some_name_{a..f}{0..5}.txt; do # 36 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${fname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Strictly avoid external processes (such as awk, grep and cut) when processing one-liners for each line. fork()ing is extremely inefficient in comparison to:
Running one single awk / grep / cut process on an entire input file (to preprocess all lines at once for easier processing in bash) and feeding the whole output into (e.g.) a bash loop.
Using Bash expansions instead, where feasible, e.g. "${line/,/.}" and other tricks from the EXPANSION section of the man bash page, without fork()ing any further processes.
Off-topic side notes:
ls -1 is unnecessary. First, ls won’t write multiple columns unless the output is a terminal, so a plain ls would do. Second, bash expansions are usually a cleaner and more efficient choice. (You can use nullglob to correctly handle empty directories / “no match” cases.)
Looping over the output from cat is a (less common) useless use of cat case. Feed the file into a loop in bash instead and read it line by line. (This also gives you more line format flexibility.)

Bash script works fine individually but NOT in a loop

The script below works fine with individual input from the command prompt:
#!/usr/local/bin/bash
set -eo pipefail
#This script determines namespaces with or without pods in it.
namespace=$1
kubectl get pods -n "$namespace" > pod_list.txt
if ! grep -q NAME pod_list.txt; then
echo pods not found
else
echo pods found
fi
But when it is put on a loop basically reading line by line inputs from a text file it doesn't work:
#!/usr/local/bin/bash
set -eo pipefail
input="namespace.txt"
while IFS= read -r line
do
echo $line
kubectl get pods -n "$line" > pod_list.txt
if grep -q NAME pod_list.txt; then
echo $line pods NOT found >> pod.txt
else
echo $line pods found >> pod.txt
fi
done < "$input"
Could anybody please help? Thanks in advance!

Display output of command in terminal while using command substitution

So I'm trying to check for the output of a command, but I also want to be able display the output directly in the terminal.
#!/bin/bash
while :
do
OUT=$(streamlink -o "$NAME" "$STREAM" best)
echo "$OUT"
if [[ $OUT == *"No playable streams"* ]]; then
echo "Delaying!"
sleep 15s
fi
done
This is what I tried to do.
The code checks if the output of a command contains that error substring, if so it'd add a delay. It works well on that part.
But it doesn't work well when the command is actually successfully downloading a file as it won't perform that echo until it is finished with the download (which would take hours). So until then I have no way of personally checking the output of the command
Plus the output of this particular command displays and updates the speed and filesize in real-time, something echo wouldn't be able to replicate.
So is there a way to be able to display the output of a command in real-time, while also command substituting them in order to check the output for substrings after the command is finished?
Use a temporary file:
TEMP=$(mktemp) || exit 1
while true
do
streamlink -o "$NAME" "$STREAM" best |& tee "$TEMP"
OUT=$( cat "$TEMP" )
#echo "$OUT" # not longer needed
if [[ $OUT == *"No playable streams"* ]]; then
echo "Delaying!"
sleep 15s
fi
done
# not really needed here because of endless loop
rm -f "$TEMP"

Bash command runs only once in a while loop [duplicate]

This question already has answers here:
While loop stops reading after the first line in Bash
(5 answers)
Closed 6 years ago.
I am writing a Bash file to execute two PhantomJS tasks.
I have two tasks written in external JS files: task1.js & task2.js.
Here's my Bash script so far:
#!/bin/bash
url=$1
cd $(cd $(dirname ${BASH_SOURCE}); pwd -P)
dir=../temp
mkdir -p $dir
file=$dir/file.txt
phantomjs "taks1.js" $url > $file
while IFS="" read -r line || [[ -n $line ]]; do
dir=../build
file=$dir/$line.html
mkdir -p $(dirname $file)
phantomjs "task2.js" $url $line > $file
done < $file
For some unknown reason task2 is being run only once, then the script stops.
If I remove the PhantomJS command, the while loop runs normally until all lines are read from the file.
Maybe someone knows why is that?
Cheers.
Your loop is reading contents from stdin. If any other program you run consumes stdin, the loop will terminate.
Either fix any program that may be consuming stdin to read from /dev/null, or use a different FD for the loop.
The first approach looks like this:
phantomjs "task2.js" "$url" "$line" >"$file" </dev/null
The second looks like this (note the 3< on establishing the redirection, and the <&3 to read from that file descriptor):
while IFS="" read -r line <&3 || [[ -n $line ]]; do
dir=../build
file=$dir/$line.html
mkdir -p "$(dirname "$file")"
phantomjs "task2.js" "$url" "$line" >"$file"
done 3< $file
By the way, consider taking file out of the loop altogether, by having the loop read directly from the first phantomjs program's output:
while IFS="" read -r line <&3 || [[ -n $line ]]; do
dir=../build
file=$dir/$line.html
mkdir -p "$(dirname "$file")"
phantomjs "task2.js" "$url" "$line" >"$file"
done 3< <(phantomjs "task1.js" "$url")

Lynx is stopping loop?

I'll just apologize beforehand; this is my first ever post, so I'm sorry if I'm not specific enough, if the question has already been answered and I just didn't look hard enough, and if I use incorrect formatting of some kind.
That said, here is my issue: In bash, I am trying to create a script that will read a file that lists several dozen URL's. Once it reads each line, I need it to run a set of actions on that, the first being to use lynx to navigate to the website. However, in practice, it will run once perfectly on the first line. Lynx goes, the download works, and then the subsequent renaming and organizing of that file go through as well. But then it skips all the other lines and acts like it has finished the whole file.
I have tested to see if it was lynx causing the issue by eliminating all the other parts of the code, and then by just eliminating lynx. It works without Lynx, but, of course, I need lynx for the rest of the output to be of any use to me. Let me just post the code:
!#/bin/bash
while read line; do
echo $line
lynx -accept_all_cookies $line
echo "lynx done"
od -N 2 -h *.zip | grep "4b50"
echo "od done, if 1 starting..."
if [[ $? -eq 0 ]]
then ls *.*>>logs/zips.log
else
od -N 2 -h *.exe | grep "5a4d"
echo "if 2 starting..."
if [[ $? -eq 0 ]]
then ls *.*>>logs/exes.log
else
od -N 2 -h *.exe | grep "5a4d, 4b50"
echo "if 3 starting..."
if [[ $? -eq 1 ]]
then
ls *.*>>logs/failed.log
fi
echo "if 3 done"
fi
echo "if 2 done"
fi
echo "if 1 done..."
FILE=`(ls -tr *.* | head -1)`
NOW=$(date +"%m_%d_%Y")
echo "vars set"
mv $FILE "criticalfreepri/${FILE%%.*}(ZCH,$NOW).${FILE#*.}" -u
echo "file moved"
rm *.zip *.exe
echo "file removed"
done < "lynx"
$SHELL
Just to be sure, I do have a file called "lynx" that contains the urls separated by a return each. Also, I used all those "echo"s to do my own sort of debugging, but I have tried it with and without the echo's. When I execute the script, the echo's all show up...
Any help is appreciated, and thank you all so much! Hope I didn't break any rules on this post!
PS: I'm on Linux Mint running things through the "terminal" program. I'm scripting with bash in Gedit, if any of that info is relevant. Thanks!
EDIT: Actually, the echo tests repeat for all three lines. So it would appear that lynx simply can't start again in the same loop?
Here is a simplified version of the script, as requested:
!#/bin/bash
while read -r line; do
echo $line
lynx $line
echo "lynx done"
done < "ref/url"
read "lynx"
$SHELL
Note that I have changed the sites the "url" file goes to:
`www.google.com
www.majorgeeks.com
http://www.sophos.com/en-us/products/free-tools/virus-removal-tool.aspx`
Lynx is not designed to use in scripts because it locks the terminal. Lynx is an interactive console browser.
If you want to access URLs in a script use wget, for example:
wget http://www.google.com/
For exit codes see: http://www.gnu.org/software/wget/manual/html_node/Exit-Status.html
to parse the html-content use:
VAR=`wget -qO- http://www.google.com/`
echo $VAR
I found a way which may fulfilled your requirement to run lynx command in loop with substitution of different url link.
Use
echo `lynx $line`
(Echo the lynx $line in single quote('))
instead of lynx $line. You may refer below:
your code
!#/bin/bash
while read -r line; do
echo $line
lynx $line
echo "lynx done"
done < "ref/url"
read "lynx"
$SHELL
try on below
!#/bin/bash
while read -r line; do
echo $line
echo `lynx $line`
echo "lynx done"
done < "ref/url"
I should have answered this question a long time ago. I got the program working, it's now on Github!
Anyway, I simply had to wrap the loop inside a function. Something like this:
progdownload () {
printlog "attmpting download from ${URL}"
if echo "${URL}" | grep -q "http://www.majorgeeks.com/" ; then
lynx -cmd_script="${WORKINGDIR}/support/mgcmd.txt" --accept-all-cookies ${URL}
else wget ${URL}
fi
}
URL="something.com"
progdownload

Resources