Bash while loop: Preventing third-party commands to read from stdin - bash

Assume an input table (intable.csv) that contains ID numbers in its second column, and a fresh output table (outlist.csv) into which the input file - extended by one column - is to be written line by line.
echo -ne "foo,NC_045043\nbar,NC_045193\nbaz,n.a.\nqux,NC_045054\n" > intable.csv
echo -n "" > outtable.csv
Further assume that one or more third-party commands (here: esearch, efetch; both part of Entrez Direct) are employed to retrieve additional information for each ID number. This additional info is to form the third column of the output table.
while IFS="" read -r line || [[ -n "$line" ]]
do
echo -n "$line" >> outtable.csv
NCNUM=$(echo "$line" | awk -F"," '{print $2}')
if [[ $NCNUM == NC_* ]]
then
echo "$NCNUM"
RECORD=$(esearch -db nucleotide -query "$NCNUM" | efetch -format gb)
echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
tr -d "\n" >> outtable.csv
else
echo ",n.a." >> outtable.csv
fi
done < intable.csv
Why does the while loop iterate only over the first input table entry under the above code, whereas it iterates over all input table entries if the code lines starting with RECORD and echo "$RECORD" are commented out? How can I correct this behavior?

This would happen if esearch reads from standard input. It will inherit the input redirection from the while loop, so it will consume the rest of the input file.
The solution is to redirect is standard input elsewhere, e.g. /dev/null.
while IFS="" read -r line || [[ -n "$line" ]]
do
echo -n "$line" >> outtable.csv
NCNUM=$(echo "$line" | awk -F"," '{print $2}')
if [[ $NCNUM == NC_* ]]
then
echo "$NCNUM"
RECORD=$(esearch -db nucleotide -query "$NCNUM" </dev/null | efetch -format gb)
echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
tr -d "\n" >> outtable.csv
else
echo ",n.a." >> outtable.csv
fi
done < intable.csv

Related

Bash - disk utilization notification

This script should output a warning notification for the utilization of the main disk if over 50%, but it provides no output. My disk is currently sat at 60% so it should in theory work.
I have added an else statement to identify if the loop is not working but the else statement isnt triggered.
I'm provided no error so its hard to identify where i have gone wrong specifically.
#!/bin/bash
df -H | grep /dev/sda2 | awk '{ printf "%d", $5}' > diskOutput.txt
input="diskOutput.txt"
while IFS= read -r line
do
if [ $line -gt 50 ]
then
up="`uptime | cut -b 1-9`"
output="WARNING UTILISATION $line - $up"
echo "$output"
else
echo "no-in"
fi
done < $input
#rm diskOutput.txt
echo "finished"
Try this.
#!/bin/bash
df -H | grep /dev/sda2 | awk '{ printf "%d", $5}' > diskOutput.txt
echo "" >>diskOutput.txt
input="diskOutput.txt"
while IFS= read -r line
do
if [ $line -gt 50 ]
then
up="`uptime | cut -b 1-9`"
output="WARNING UTILISATION $line - $up"
echo "$output"
else
echo "no-in"
fi
done < $input
#rm diskOutput.txt
echo "finished"
You are setting an internal field separator as space here.
while IFS= read -r line
But when creating file, with %d you are removing all char except digits.

Bash script to stdout stuck with redirect

My bash script is the following:
#!/bin/bash
if [ ! -f "$1" ]; then
exit
fi
while read line;do
str1="[GAC]*T"
num=$"(echo $line | tr -d -c 'T' | wc -m)"
for((i=0;i<$num;i++))do
echo $line | sed "s/$str1/&\n/" | head -n1 -q
str1="${str1}[GAC]*T"
done
str1="[GAC]*T"
done < "$1
While it works normally as it should (take the filename input and print it line by line until the letter T and next letter T and so on) it prints to the terminal.
Input:
GATTT
ATCGT
Output:
GAT
GATT
GATTT
AT
ATCGT
When I'm using the script with | tee outputfile the outputfile is correct but when using the script with > outputfile the terminal hangs / is stuck and does not finish. Moreover it works with bash -x scriptname inputfile > outputfile but is stuck with bash scriptname inputfile > outputfile.
I made modifications to your original script, please try:
if [ ! -f "$1" ]; then
exit
fi
while IFS='' read -r line || [[ -n "$line" ]];do
str1="[GAC]*T"
num=$(echo $line | tr -d -c 'T' | wc -m)
for((i=0;i<$num;i++));do
echo $line | sed "s/$str1/&\n/" | head -n1 -q
str1="${str1}[GAC]*T"
done
str1="[GAC]*T"
done < "$1"
For input:
GATTT
ATCGT
This script outputs:
GAT
GATT
GATTT
AT
ATCGT
Modifications made to your original script were:
Line while read line; do changed to while IFS='' read -r line || [[ -n "$line" ]]; do. Why I did this is explained here: Read a file line by line assigning the value to a variable
Line num=$"(echo $line | tr -d -c 'T' | wc -m)" changed to num=$(echo $line | tr -d -c 'T' | wc -m)
Line for((i=0;i<$num;i++))do changed to for((i=0;i<$num;i++));do
Line done < "$1 changed to done < "$1"
Now you can do: ./scriptname inputfile > outputfile
Try:
sed -r 's/([^T]*T+)/\1\n/g' gatc.txt > outputfile
instead of your script.
It takes some optional non-Ts, followed by at least one T and inserts a newline after the T.
cat gatc.txt
GATGATTGATTTATATCGT
sed -r 's/([^T]*T+)/\1\n/g' gatc.txt
GAT
GATT
GATTT
AT
AT
CGT
For multiple lines, to delete empty lines in the end:
echo "GATTT
ATCGT" | sed -r 's/([^T]*T+)/\1\n/g;' | sed '/^$/d'
GATTT
AT
CGT

How to pass a variable string to a file txt at the biginig of test?

I have a problem
I Have a program general like this gene.sh
that for all file (es file: geneX.csv) make a directory with the name of gene (example: Genex/geneX.csv) next this program compile an other program inside gene.sh but this progrm need a varieble and I dont know how do it.
this is the program gene.sh
#!/bin/bash
# Create a dictory for each file *.xls and *.csv
for fname in *.xlsx *csv
do
dname=${fname%.*}
[[ -d $dname ]] || mkdir "$dname"
mv "$fname" "$dname"
done
# For each gene go inside the directory and compile the programs getChromosomicPositions.sh to have the positions, and getHapolotipeStings.sh to have the variants
for geni in */; do
cd $geni
z=$(tail -n 1 *.csv | tr ';' "\n" | wc -l)
cd ..
cp getChromosomicPositions.sh $geni --->
cp getHaplotypeStrings.sh $geni
cd $geni
export z
./getChromosomicPositions.sh *.csv
export z
./getHaplotypeStrings.sh *.csv
cd ..
done
This is the program getChromosomichPositions.sh:
rm chrPosRs.txt
grep '^Haplotype\ ID' $1 | cut -d ";" -f 4-61 | tr ";" "\n" | awk '{print "select chrom,chromStart,chromEnd,name from snp147 where name=\""$1"\";"}' > listOfQuery.txt
while read l; do
echo $l > query.txt
mysql -h genome-mysql.cse.ucsc.edu -u genome -A -D hg38 --skip-column-names < query.txt > queryResult.txt
if [[ "$(cat queryResult.txt)" == "" ]];
then
cat query.txt |
while read line; do
echo $line | awk '$6 ~/rs/ {print $6}' > temp.txt;
if [[ "$(cat temp.txt)" != "" ]];
then cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g' > temp.txt;
./getHGSVposHG19.sh temp.txt ---> Hear the problem--->
else
echo $line | awk '{num=sub(/.*:g\./,"");num+=sub(/\".*/,"");if(num==2){print};num=""}' > temp2.txt
fi
done
cat query.txt >> varianti.txt
echo "Missing Data" >> chrPosRs.txt
else
cat queryResult.txt >> chrPosRs.txt
fi
done < listOfQuery.txt
rm query*
hear the problem:
I need to enter in the file temp.txt and put automatically at the beginning of the file the variable $geni of the program gene.sh
How can I do that?
Why not pass "$geni" as say the first argument when invoking your script, and treating the rest of the arguments as your expected .csv files.
./getChromosomicPositions.sh "$geni" *.csv
Alternatively, you can set it as environment variable for the script, so that it can be used there (or just export it).
geni="$geni" ./getChromosomicPositions.sh *.csv
In any case, once you have it available in the second script, you can do
if passed as the first argument:
echo "${1}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')
or if passed as environment variable:
echo "${geni}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')

How to automatically remove inactive OSSEC agents (batch)

As part of some batch "bash" program, how can I automatically remove inactive ossec agents in cases of autoscaling groups where instances are created/deleted constantly?
Here is a quick script you can run to remove 'Disconnected' and 'Never connected' agents
for OUTPUT in $(/var/ossec/bin/agent_control -l | grep -E 'Disconnected|Never' | tr ':' ',' | cut -d "," -f 2 )
do
/var/ossec/bin/manage_agents -r $OUTPUT
done
#This is to be run on ossec server, path for ossec is /var/ossec/
file=agents.txt
/var/ossec/bin/agent_control -l > $file
#Wipe working tmp files
rm remove.txt
rm removed.txt
echo -n "" > remove.txt
echo -n "" > removed.txt
#Find Disconnected agents
while IFS= read -r line
do
ids=$(echo $line | awk '{print $2}')
status=$(echo $line | awk '{print $NF}')
if [ "$status" == "Disconnected" ]; then
echo $ids >> remove.txt
fi
done < "$file"
#Find Never connected agents
while IFS= read -r line
do
ids=$(echo $line | awk '{print $2}')
status=$(echo $line | awk '{ if (NF > 1) print $(NF-1),$NF ; else print $NF; }')
if [ "$status" == "Never connected" ]; then
echo $ids >> remove.txt
fi
done < "$file"
#Remove commas
sed 's/.$//' remove.txt > removed.txt
#Remove agents with IDs in removed.txt file
file2=removed.txt
while IFS= read -r line
do
/var/ossec/bin/manage_agents -r "$line"
done < $file2
#Restart OSSEC service
/var/ossec/bin/ossec-control restart
#End

Variable loss in redirected bash while loop

I have the following code
for ip in $(ifconfig | awk -F ":" '/inet addr/{split($2,a," ");print a[1]}')
do
bytesin=0; bytesout=0;
while read line
do
if [[ $(echo ${line} | awk '{print $1}') == ${ip} ]]
then
increment=$(echo ${line} | awk '{print $4}')
bytesout=$((${bytesout} + ${increment}))
else
increment=$(echo ${line} | awk '{print $4}')
bytesin=$((${bytesin} + ${increment}))
fi
done < <(pmacct -s | grep ${ip})
echo "${ip} ${bytesin} ${bytesout}" >> /tmp/bwacct.txt
done
Which I would like to print the incremented values to bwacct.txt, but instead the file is full of zeroes:
91.227.223.66 0 0
91.227.221.126 0 0
127.0.0.1 0 0
My understanding of Bash is that a redirected for loop should preserve variables. What am I doing wrong?
First of all, simplify your script! Usually there are many better ways in bash. Also most of the time you can rely on pure bash solutions instead of running awk or other tools.
Then add some debbuging!
Here is a bit refactored script with debugging
#!/bin/bash
for ip in "$(ifconfig | grep -oP 'inet addr:\K[0-9.]+')"
do
bytesin=0
bytesout=0
while read -r line
do
read -r subIp _ _ increment _ <<< "$line"
if [[ $subIp == "$ip" ]]
then
((bytesout+=increment))
else
((bytesin+=increment))
fi
# some debugging
echo "line: $line"
echo "subIp: $subIp"
echo "bytesin: $bytesin"
echo "bytesout: $bytesout"
done <<< "$(pmacct -s | grep "$ip")"
echo "$ip $bytesin $bytesout" >> /tmp/bwacct.txt
done
Much clearer now, huh? :)

Resources