Bash variable caused syntax error inside awk [duplicate] - bash

This question already has answers here:
How to cat <<EOF >> a file containing code?
(5 answers)
Closed 4 years ago.
I got the following error
awk: cmd. line:1: {if(NR%4==2) print length(.)}
awk: cmd. line:1: ^ syntax error
While running sh s.sh .
#!/bin/bash
#usage: sh afterqc_pbs.sh /work/waterhouse_team/All_RawData/Banana_Each_Cell_Raw/Banana_Illumina/QUT_WT_GN366_4
for r1 in $(find $1 -name "*R1*.gz");
do
base=${r1%_R1*}
echo $base
output=$(basename $(echo $r1 | sed 's/R1//g'))
r2=$(echo $r1 | sed 's/R1/R2/g')
echo $r1
echo $r2
#cat <<EOF
qsub <<EOF
#!/bin/bash -l
#PBS -N $output
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -l mem=50G
#PBS -l ncpus=1
#PBS -M m.lorenc#qut.edu.au
##PBS -m bea
cd \$PBS_O_WORKDIR
echo "Average\n"
zcat $r1 | awk '{if(NR%4==2) {count++; bases += length} } END{print bases/count}'
echo "Distribution\n"
zcat $r1 | awk '{if(NR%4==2) print length($1)}' | sort -n | uniq -c
EOF
done
How is it possible that awk does not picks up . rather than the length?
Thank you in advance.

The issue is your use of qsub <<EOF. When you use << (called a "here document" or "heredoc"), any variables inside get expanded. The single quotes around them don't count, as the heredoc takes priority. If you want to change this, change <<EOF to <<"EOF". https://www.gnu.org/software/bash/manual/bashref.html#Here-Documents has more details.

Related

bash or zsh: how to pass multiple inputs to interactive piped parameters?

I have 3 different files that I want to compare
words_freq
words_freq_deduped
words_freq_alpha
For each file, I run a command like so, which I iterate on constantly to compare the results.
For example, I would do this:
$ cat words_freq | grep -v '[soe]'
$ cat words_freq_deduped | grep -v '[soe]'
$ cat words_freq_alpha | grep -v '[soe]'
and then review the results, and then do it again, with an additional filter
$ cat words_freq | grep -v '[soe]' | grep a | grep r | head -n20
a
$ cat words_freq_deduped | grep -v '[soe]' | grep a | grep r | head -n20
b
$ cat words_freq_alpha | grep -v '[soe]' | grep a | grep r | head -n20
c
This continues on until I've analyzed my data.
I would like to write a script that could take the piped portions, and pass it to each of these files, as I iterate on the grep/head portions of the command.
e.g. The following would dump the results of running the 3 commands above AND also compare the 3 results, and dump additional calculations on them
$ myScript | grep -v '[soe]' | grep a | grep r | head -n20
the letters were in all 3 runs, and it took 5 seconds
a
b
c
How can I do this using bash/python or zsh for the myScript part?
EDIT: After asking the question, it occurred to me that I could use eval to do it, like so, which I've added as an answer as well
The following approach allows me to process multiple files by using eval, which I know is frowned upon - any other suggestions are greatly appreciated!
$ myScript "grep -v '[soe]' | grep a | grep r | head -n20"
myScript
#!/usr/bin/env bash
function doIt(){
FILE=$1
CMD="cat $1 | $2"
echo processing file "$FILE"
eval "$CMD"
echo
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"
You can't avoid your shell from running pipes itself, so using it like that isn't very practical - you'd need to either quote everything and then eval it, which would make it hard to pass arguments with spaces, or quote every pipe, which you can then eval, making it so you have to quote every pipe. But yeah, these solutions are kinda hacky.
I'd suggest doing one of these two:
Keep your editor open, and put whatever you want to run inside the doIt function itself before you run it. Then run it in your shell without any arguments:
#!/usr/bin/env bash
doIt() {
# grep -v '[soe]' < "$1"
grep -v '[soe]' < "$1" | grep a | grep r | head -n20
}
doIt words_freq
doIt words_freq_deduped
doIt words_freq_alpha
Or, you could always use a "for" in your shell, which you can use Ctrl+r to find in your history when you want to use:
$ for f in words_freq*; do grep -v '[soe]' < "$f" | grep a | grep r | head -n20; done
But if you really want your approach, I tried to make it accept spaces, but it ended up being even hackier:
#!/usr/bin/env bash
doIt() {
local FILE=$1
shift
echo processing file "$FILE"
local args=()
for n in $(seq 1 $#); do
arg=$1
shift
if [[ $arg == '|' ]]; then
args+=('|')
else
args+=("\"$arg\"")
fi
done
eval "cat '$FILE' | ${args[#]}"
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"
With this version you can use it like this:
$ ./myScript grep "a a" "|" head -n1
Notice that it need you to quote the |, and that it now handles arguments with spaces.
Not fully understood problem correctly.
I understood you want to write a script without pipes, by including the filtering logic into the script.
And feeding the filtering patterns as arguments.
Here is a gawk script (standard Linux awk).
With one sweep on 3 input files, without piping.
script.awk
BEGIN {
RS="!#!#!#!#!#!#!#";
# set record separator to something unlikely matched, causing each file to be read entirely as a single record
}
$0 !~ excludeRegEx # if file does not match excludeRegEx
&& $0 ~ includeRegEx1 # and match includeRegEx1
&& $0 ~ includeRegEx2 { # and match includeRegEx2
system "head -n20 "FILENAME; # call shell command "head -n20 " on current filename
}
Running script.awk
awk -v excludeRegEx='[soe]' \
-v includeRegEx1='a' \
-v includeRegEx2='r' \
-f script.awk words_freq words_freq_deduped words_freq_alpha
The following approach allows me to process multiple files by using eval, which I know is frowned upon - any other suggestions are greatly appreciated!
$ myScript "grep -v '[soe]' | grep a | grep r | head -n20"
myScript
#!/usr/bin/env bash
function doIt(){
FILE=$1
CMD="cat $1 | $2"
echo processing file "$FILE"
eval "$CMD"
echo
}
doIt words_freq "$#"
doIt words_freq_deduped "$#"
doIt words_freq_alpha "$#"

Is there a way to only require one echo in this scenario?

I have the following line of code:
for h in "${Hosts[#]}" ; do echo "$MyLog" | grep -m 1 -B 3 -A 1 $h >> /LogOutput ; done
My hosts variable is a large array of hosts
Is there a better way to do this that doesn't require me to echo on each loop? Like grep on a variable instead?
No echo, no loop
#!/bin/bash
hosts=(host1 host2 host3)
MyLog="
asf host
sdflkj
sadkjf
sdlkjds
lkasf
sfal
asf host2
sdflkj
sadkjf
"
re="${hosts[#]}"
egrep -m 1 -B 3 -A 1 ${re// /|} <<< "$MyLog"
Variant with one echo
echo "$MyLog" | egrep -m 1 -B 3 -A 1 ${re// /|}
Usage
$ ./test
sdlkjds
lkasf
sfal
asf host2
sdflkj
One echo, no loops, and all grepping done in parallel, with GNU Parallel:
echo "$MyLog" | parallel -k --tee --pipe 'grep -m 1 -B 3 -A 1 {}' ::: "${hosts[#]}"
The -k keeps the output in order.
The --tee and the --pipe ensure that the stdin is duplicated to all processes.
The processes that are run in parallel are enclosed in single quotes.
printf your string to multiple-line that you can then grep? Something like:
printf '%s\n' "${Hosts[#]}" | grep -m 1 -B 3 -A 1 $h >> /LogOutput
Assuming you're on GNU system. otherwise info grep
From grep --help
grep --help | head -n1
Output
Usage: grep [OPTION]... PATTERN [FILE]...
So according to that you can do.
for h in "${Hosts[#]}" ; do grep -m 1 -B 3 -A 1 "$h" "$MyLog" >> /LogOutput ; done

Use argument twice from standard output pipelining

I have a command line tool which receives two arguments:
TOOL arg1 -o arg2
I would like to invoke it with the same argument provided it for arg1 and arg2, and to make that easy for me, i thought i would do:
each <arg1_value> | TOOL $1 -o $1
but that doesn't work, $1 is not replaced, but is added once to the end of the commandline.
An explicit example, performing:
cp fileA fileA
returns an error fileA and fileA are identical (not copied)
While performing:
echo fileA | cp $1 $1
returns the following error:
usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ... target_directory
any ideas?
If you want to use xargs, the [-I] option may help:
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separa‐
tor is the newline character. Implies -x and -L 1.
Here is a simple example:
mkdir test && cd test && touch tmp
ls | xargs -I '{}' cp '{}' '{}'
Returns an Error cp: tmp and tmp are the same file
The xargs utility will duplicate its input stream to replace all placeholders in its argument if you use the -I flag:
$ echo hello | xargs -I XXX echo XXX XXX XXX
hello hello hello
The placeholder XXX (may be any string) is replaced with the entire line of input from the input stream to xargs, so if we give it two lines:
$ printf "hello\nworld\n" | xargs -I XXX echo XXX XXX XXX
hello hello hello
world world world
You may use this with your tool:
$ generate_args | xargs -I XXX TOOL XXX -o XXX
Where generate_args is a script, command or shell function that generates arguments for your tool.
The reason
each <arg1_value> | TOOL $1 -o $1
did not work, apart from each not being a command that I recognise, is that $1 expands to the first positional parameter of the current shell or function.
The following would have worked:
set - "arg1_value"
TOOL "$1" -o "$1"
because that sets the value of $1 before calling you tool.
You can re-run a shell to perform variable expansion, with sh -c. The -c takes an argument which is command to run in a shell, performing expansion. Next arguments of sh will be interpreted as $0, $1, and so on, to use in the -c. For example:
sh -c 'echo $1, i repeat: $1' foo bar baz will print execute echo $1, i repeat: $1 with $1 set to bar ($0 is set to foo and $2 to baz), finally printing bar, i repeat: bar
The $1,$2...$N are only visible to bash script to interpret arguments to those scripts and won't work the way you want them to. Piping redirects stdout to stdin and is not what you are looking for either.
If you just want a one-liner, use something like
ARG1=hello && tool $ARG1 $ARG1
Using GNU parallel to use STDIN four times, to print a multiplication table:
seq 5 | parallel 'echo {} \* {} = $(( {} * {} ))'
Output:
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
One could encapsulate the tool using awk:
$ echo arg1 arg2 | awk '{ system("echo TOOL " $1 " -o " $2) }'
TOOL arg1 -o arg2
Remove the echo within the system() call and TOOL should be executed in accordance with requirements:
echo arg1 arg2 | awk '{ system("TOOL " $1 " -o " $2) }'
Double up the data from a pipe, and feed it to a command two at a time, using sed and xargs:
seq 5 | sed p | xargs -L 2 echo
Output:
1 1
2 2
3 3
4 4
5 5

Storing a line in a variable

Hi I have the following batch script where I submitted each file to a separate processing as follows:
for file in ../Positive/*.txt_rn; do
bsub <<EOF
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat \$line | awk '{print $1":"$2"-"$3}'`
four=`cat \$line | awk '{print $4}' | cut -d\: -f4`
fasta=\$name".fa"
op=\$name".rs"
echo \$name | xargs samtools faidx /somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa > \$fasta
Process -F \$fasta -M "list_"\$four".txt" -p 0.003 | awk '(\$5 >= 0.67)' > \$op
if [ -s "\$op" ]
then
cat "\$line" >> ../Positive_Strand/$file".cons"
fi
rm \$lne
rm \$op
rm \$fasta
done < $file
EOF
done
I am am somehow unable to store the values of the column from the line (which is in $line variable into the $name and $four variable and hence unable to carry on further processes. Also any suggestions to edit the code for a better version of it would be welcome.
If you change EOF to 'EOF' then you will more properly disable shell interpretation. Your problem is that your back-ticks (`) are not escaped.
I've fixed your indentation and cleaned up some of your code. Note that the syntax highlighting here doesn't understand cat <<'EOF'. If you paste that into vim with highlighting enabled, you'll see that block is all the same color since it's just a string.
bsub_helper() {
cat <<'EOF'
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat $line | awk '{print $1":"$2"-"$3}'`
four=`cat $line | awk '{print $4}' | cut -d: -f4`
fasta="$name.fa"
op="$name.rs"
genome="/somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa"
echo $name | xargs samtools faidx "$genome" > "$fasta"
Process -F "$fasta" -M "list_$four.txt" -p 0.003 | awk '($5 >= 0.67)' > "$op"
if [ -s "$op" ]
then
cat "$line" >> "../Positive_Strand/$file.cons"
fi
rm "$lne" "$op" "$fasta"
EOF
echo " done < \"$1\""
}
for file in ../Positive/*.txt_rn; do
bsub_helper "$file" |bsub
done
I created a helper function because I needed to get the input in two commands. I am assuming that $file is the only variable in that block that you want interpreted. I also surrounded that variable (among others) with quotes so that the code can support file names with spaces in them. The final line of the helper has nested double quotes for this reason.
I left your echo $name | xargs … line alone because it's so odd. Without quotes around $name, xargs will take each whitespace-separated entry as its own file. With quotes, xargs will only supply one (likely invalid) file name to samtools.
If $name is a single file, try:
samtools faidx "$genome" "$name" > "$fasta"
If $name is multiple files and none of them have spaces, try:
samtools faidx "$genome" $name > "$fasta"
The only reason to use xargs here would be if you have too much content for one command line, but if you're running echo $name | xargs then you'll run into the same problem.

Bash Xargs Sleep (Multiple Command Line Arguments)

Ok so I have the following script that updates Route43 DNS entries. Unfortunately there is a limit to the number of calls per second you can make so I need to make the final Xargs command sleep for about a second between each iteration.
I've tried a couple of things like ' {../cli53 blah; sleep 10; } ' and I cant seem to get it to work. Does anyone have any suggestions please:
#!/bin/bash
set root='dirname $0'
ec2-describe-instances -O ******* -W ******* --region eu-west-1 |
perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/
and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sName\s+(\S+)/
and print "$1 $dns\n"' |
perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' |
grep -v 'i-' | xargs --verbose -n 4 /usr/local/bin/cli53 rrcreate -x 5 contoso.com
Edit: Thanks Etan for the Answer. Here is my solution for anyone else that needs it:
I had to include the -I %variable% switch into the xargs statement aswel to make sure that the feed in was passed as parameters to cli53 but it all looks to be working nicely now.
#!/bin/bash
set root='dirname $0'
ec2-describe-instances -O ******* -W ******* --region eu-west-1 |
perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/
and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sName\s+(\S+)/
and print "$1 $dns\n"' |
perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' |
grep -v '^i-' |
xargs --verbose -n 4 -I myvar /bin/sh -c '{ /usr/local/bin/cli53 rrcreate -x 5 contoso.com 'myvar'; sleep 1; printf "\n\n"; }'
The simplest solution would be to simply put the cli53 and sleep calls in a script and use xargs to execute the script.
If you don't want to do that you should be able to do what you were trying to do with this:
... | xargs ... /bin/sh -c '{ /usr/local/bin/cli53 ... "$#"; sleep 10; }' -

Resources