Bash: use nth row as part of the command line - bash

I am trying to use a tool called fastqtl, but it's probably less relevant here. I am interested in assigning each row of the "loc_info.txt" into the options. I wrote the following commands but it bounced back as "Error parsing command line :unrecognised option '-n+1'
Is there a way that I can make the fastQTL read and use that 1 line from "loc_info.txt" each time it runs the function?
Thanks for any suggestions!!
#!/bin/bash
tool="/path/FastQTL-2.165.linux/bin/"
vcf="/path/vcf/"
out="/path/perm_out"
for i in {1..1061}
do
${tool}fastQTL.1.165.linux --vcf ${vcf}GT.vcf.gz --bed pheno_bed.gz --region tail -n+"$i" loc_info.txt --permute 1000 --out "$i"_perm.txt
done

Read the file in a loop:
i=1
while read -r line; do
${tool}fastQTL.1.165.linux --vcf ${vcf}GT.vcf.gz --bed pheno_bed.gz --region "$line" --permute 1000 --out "$i"_perm.txt
((i++))
done < loc_info.txt

You can use a subshell for this, if you want to use the output from one command within another command so something like:
cmd1 -option $(cmd2)
here you're using the cmd2 output as input in cmd. The key here is '$' and the subshell '()'. So the solution might be:
#!/bin/bash
tool="/path/FastQTL-2.165.linux/bin/"
vcf="/path/vcf/"
out="/path/perm_out"
for i in {1..1061}
do
${tool}fastQTL.1.165.linux --vcf ${vcf}GT.vcf.gz --bed pheno_bed.gz --region $(tail -n+"$i" loc_info.txt) --permute 1000 --out "$i"_perm.txt
done

Try replacing tail -n+"$i" loc_info.txt
with $(head -n $i loc_info.txt | tail -n 1)
Example
numOfLines=$(wc -l loc_info.txt | cut -d ' ' -f 1)
for i in $(seq 1 $numOfLines)
do
${tool}fastQTL.1.165.linux --vcf ${vcf}GT.vcf.gz --bed pheno_bed.gz -
-region $(head -n $i loc_info.txt | tail -n 1) --permute 1000 --out "$i"_perm.txt
done

Related

Running `rg` in a while loop breaks after the first iteration

My script is simple:
while read -r key; do
rg --glob='!some_dir' --fixed-strings --quiet "$key" || echo "$key"
done < <(grep 'some_pattern' some/file | cut -d'"' -f2)
I hoped to use this bash script to print keys that aren't used. This loop, however, breaks after the first iteration at every run. Why and how to fix? Thank you :D
This looks like a classic signature of cases when the command run over a while..read loop starts from the standard input also. You expected the output of grep will be read over by the while loop in an iterative way, but for some reason your command rg is also reading from the same.
Close it as
rg --glob='!some_dir' --fixed-strings --quiet "$key" < /dev/null || echo "$key"
or use a different file descriptor
while read -r -u 3 key; do
rg --glob='!some_dir' --fixed-strings --quiet "$key" || echo "$key"
done 3< <(grep 'some_pattern' some/file | cut -d'"' -f2)

Bash, loop unexpected stop

I'm having problems with this last part of my bash script. It receives input from 500 web addresses and is supposed to fetch the server information from each. It works for a bit but then just stops at like the 45 element. Any thoughts with my loop at the end?
#initializing variables
timeout=5
headerFile="lab06.output"
dataFile="fortune500.tsv"
dataURL="http://www.tech.mtu.edu/~toarney/sat3310/lab09/"
dataPath="/home/pjvaglic/Documents/labs/lab06/data/"
curlOptions="--fail --connect-timeout $timeout"
#creating the array
declare -a myWebsitearray
#obtaining the data file
wget $dataURL$dataFile -O $dataPath$dataFile
#getting rid of the crap from dos
sed -n "s/^m//" $dataPath$dataFile
readarray -t myWebsitesarray < <(cut -f3 -d$'\t' $dataPath$dataFile)
myWebsitesarray=("${myWebsitesarray[#]:1}")
websitesCount=${#myWebsitesarray[*]}
echo "There are $websitesCount websites in $dataPath$dataFile"
#echo -e ${myWebsitesarray[200]}
#printing each line in the array
for line in ${myWebsitesarray[*]}
do
echo "$line"
done
#run each website URL and gather header information
for line in "${myWebsitearray[#]}"
do
((count++))
echo -e "\\rPlease wait... $count of $websitesCount"
curl --head "$curlOptions" "$line" | awk '/Server: / {print $2 }' >> $dataPath$headerFile
done
#display results
echo "Results: "
sort $dataPath$headerFile | uniq -c | sort -n
It would certainly help if you actually passed the --connect-timeout option to curl. As written, you are currently passing the single argument --fail --connect-timeout $timeout rather than 3 distinct arguments --fail, --connect-timeout, and $timeout. This is one instance where you should not quote the variable. IOW, use:
curl --head $curlOptions "$line"

Bash Parse Variable Values

I have a command and it returns 108 set of week/enumeration:
Command:
impala-shell -B -f query.sql
Results:
20180203 1
20180127 2
20180120 3
...
I parsed the results and read the week and enumeration as two variables. However, I have to use a variable wk to store intermediate results first:
wk="$(impala-shell -B -f query.sql)"
echo "$wk" | while read -r a b; do echo $a--$b; done
I tried to avoid using additional variable wk:
"$(impala-shell -B -f query.sql)" | while read -r a b; do echo $a--$b; done
But it returned:
...
20160213 104
20160206 105
20160130 106
20160123 107
20160116 108: command not found
I understand you can use wk="$(impala-shell -B -f query.sql)" && echo "$wk" | while read -r a b; do echo $a--$b; done but that doesn't avoid using a variable in the middle. How to compose a one-liner without using the variable wk?
or
awk to the rescue!
$ impala-shell -B -f query.sql | awk '{print $1"--"$2}'
You can execute commands first (inline) when using special quotes ``
Try this (untested, as i neither have your shell, nor that script):
`impala-shell -B -f query.sql` | while read -r a b; do echo $a--$b; done
Most elegant answer goes to choroba in the question comments! You just need to remove the quotes!
impala-shell -B -f query.sql | while read -r a b ; do echo $a--$b; done

Storing a line in a variable

Hi I have the following batch script where I submitted each file to a separate processing as follows:
for file in ../Positive/*.txt_rn; do
bsub <<EOF
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat \$line | awk '{print $1":"$2"-"$3}'`
four=`cat \$line | awk '{print $4}' | cut -d\: -f4`
fasta=\$name".fa"
op=\$name".rs"
echo \$name | xargs samtools faidx /somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa > \$fasta
Process -F \$fasta -M "list_"\$four".txt" -p 0.003 | awk '(\$5 >= 0.67)' > \$op
if [ -s "\$op" ]
then
cat "\$line" >> ../Positive_Strand/$file".cons"
fi
rm \$lne
rm \$op
rm \$fasta
done < $file
EOF
done
I am am somehow unable to store the values of the column from the line (which is in $line variable into the $name and $four variable and hence unable to carry on further processes. Also any suggestions to edit the code for a better version of it would be welcome.
If you change EOF to 'EOF' then you will more properly disable shell interpretation. Your problem is that your back-ticks (`) are not escaped.
I've fixed your indentation and cleaned up some of your code. Note that the syntax highlighting here doesn't understand cat <<'EOF'. If you paste that into vim with highlighting enabled, you'll see that block is all the same color since it's just a string.
bsub_helper() {
cat <<'EOF'
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat $line | awk '{print $1":"$2"-"$3}'`
four=`cat $line | awk '{print $4}' | cut -d: -f4`
fasta="$name.fa"
op="$name.rs"
genome="/somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa"
echo $name | xargs samtools faidx "$genome" > "$fasta"
Process -F "$fasta" -M "list_$four.txt" -p 0.003 | awk '($5 >= 0.67)' > "$op"
if [ -s "$op" ]
then
cat "$line" >> "../Positive_Strand/$file.cons"
fi
rm "$lne" "$op" "$fasta"
EOF
echo " done < \"$1\""
}
for file in ../Positive/*.txt_rn; do
bsub_helper "$file" |bsub
done
I created a helper function because I needed to get the input in two commands. I am assuming that $file is the only variable in that block that you want interpreted. I also surrounded that variable (among others) with quotes so that the code can support file names with spaces in them. The final line of the helper has nested double quotes for this reason.
I left your echo $name | xargs … line alone because it's so odd. Without quotes around $name, xargs will take each whitespace-separated entry as its own file. With quotes, xargs will only supply one (likely invalid) file name to samtools.
If $name is a single file, try:
samtools faidx "$genome" "$name" > "$fasta"
If $name is multiple files and none of them have spaces, try:
samtools faidx "$genome" $name > "$fasta"
The only reason to use xargs here would be if you have too much content for one command line, but if you're running echo $name | xargs then you'll run into the same problem.

AWK: execute CURL on each line and parse result

given an input stream with following lines:
123
456
789
098
...
I would like to call
curl -s http://foo.bar/some.php?id=xxx
with xxx being the number for each line, and everytime let an awk script fetch some information from the curl output which is written to the output stream. I am wondering if this is possible without using the awk "system()" call in following way:
cat lines | grep "^[0-9]*$" | awk '
{
system("curl -s " $0 \
" | awk \'{ #parsing; print }\'")
}'
You can use bash and avoid awk system call:
grep "^[0-9]*$" lines | while read line; do
curl -s "http://foo.bar/some.php?id=$line" | awk 'do your parsing ...'
done
A shell loop would achieve a similar result, as follows:
#!/bin/bash
for f in $(cat lines|grep "^[0-9]*$"); do
curl -s "http://foo.bar/some.php?id=$f" | awk '{....}'
done
Alternative methods for doing similar tasks include using Perl or Python with an HTTP client.
If your file gets dynamically appended the id's, you can daemonize a small while loop to keep checking for more data in the file, like this:
while IFS= read -d $'\n' -r a || sleep 1; do [[ -n "$a" ]] && curl -s "http://foo.bar/some.php?id=${a}"; done < lines.txt
Otherwise if it's static, you can change the sleep 1 to break and it will read the file and quit when there is no data left, pretty useful to know how to do.

Resources