Escaping parentheses in perl backticks for bash command - bash

I have been using backticks for years but this is the first time I have tried using a command with a parentheses. I am getting an error that I cannot figure out.
I have tried putting in double quotes and escaping with the \ in multiple places, but nothing seems to work. Any help would be appreciated.
COMMAND
the $file5 and $file6 are perl variables, not bash.
#array = `/usr/bin/join -j 1 -t, <(cat $file5 | awk -F, '{print \$3","\$1}' | sort) <( cat $file6 | awk -F, '{print \$3","\$1}' | sort) `
ERROR:
AH01215: sh: -c: line 0: syntax error near unexpected token `(', referer:

Backticks use /bin/sh, and while <( ... ) is something recognized by bash, it's not recognized by the bourne shell. If you use backticks, you will need to use
my $bash_cmd = ...;
my #lines = `bash -c $bash_cmd`;
Building sh and bash shell commands can be done using String::ShellQuote.
use String::ShellQuote qw( shell_quote );
my $file5_quoted = shell_quote($file5);
my $file6_quoted = shell_quote($file6);
my $awk_cmd = shell_quote("awk", "-F,", '{print $3","$1}');
my $bash_cmd = '/usr/bin/join -j 1 -t,'
. " <( $awk_cmd $file5_quoted | sort )"
. " <( $awk_cmd $file6_quoted | sort )";
my $sh_cmd = shell_quote("bash", "-c", $bash_cmd);
my #lines = `$sh_cmd`;
We can use IPC::System::Simple's capturex to avoid launching more shells than needed, as well as to provide error checking. To do this, replace the last two lines of the above with the following:
use IPC::System::Simple qw( capturex );
my #lines = capturex("bash", "-c", $bash_cmd);

One workaround is to create a shell script that accepts the two filenames from perl, processes the join using those two input files, and return the result to the perl array.
#1. Create join.sh that contains these four lines:
cat $1 | awk -F, '{print $3","$1}' | sort > 1.out
cat $2 | awk -F, '{print $3","$1}' | sort > 2.out
/usr/bin/join -j 1 -t, 1.out 2.out
rm 1.out 2.out
#2. Modify your perl statement to call join.sh as follows:
#array=`join.sh $file5 $file6`;

Related

I'm facing an error while converting my bash comand to shell script syntax error in shell script

#!/bin/bash
set -o errexit
set -o nounset
#VAF_and_IGV_TAG
paste <(grep -v "^#" output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf | cut -f-5) \
<(grep -v "^#" output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf | cut -f10-| cut -d ":" -f2,3) |
sed 's/:/\t/g' |
sed '1i chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP'|
awk 'BEGIN{FS=OFS="\t"}{sub(/,/,"\t",$6);print}' \
> output/"$1"/"$1"_Variant_Annotation/"$1"_VAF.tsv
My above code ends up with a syntax error if I run this in the terminal without using the variable it shows no syntax error
sh Test.sh S1 Test.sh: 6: Test.sh: Syntax error: "(" unexpected
paste <(grep -v "^#" output/S1/S1_Variant_Filtering/S1_GATK_filtered.vcf | cut -f-5) \
<(grep -v "^#" output/S1/S1_Variant_Filtering/S1_GATK_filtered.vcf | cut -f10-| cut -d ":" -f2,3) |
sed 's/:/\t/g' |
sed '1i chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP'|
awk 'BEGIN{FS=OFS="\t"}{sub(/,/,"\t",$6);print}' \
> output/S1/S1_Variant_Annotation/S1_VAF.ts
My vcf file looks like this: https://drive.google.com/file/d/1HaGx1-3o1VLCrL8fV0swqZTviWpBTGds/view?usp=sharing
You cannot use <(command) process substitution if you are trying to run this code under sh. Unfortunately, there is no elegant way to avoid a temporary file (or something even more horrid) but your paste command - and indeed the entire pipeline - seems to be reasonably easy to refactor into an Awk script instead.
#!/bin/sh
set -eu
awk -F '\t' 'BEGIN { OFS=FS;
print "chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP' }
!/#/ { p=$0; sub(/^([^\t]*\t){9}/, "", p);
sub(/^[:]*:/, "", p); sub(/:.*/, "", p);
sub(/,/, "\t", p);
s = sprintf("%s\t%s\t%s\t%s\t%s\t%s", $1, $2, $3, $4, $5, p);
gsub(/:/, "\t", s);
print s
}' output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf \
> output/"$1"/"$1"_Variant_Annotation/"$1"_VAF.tsv
Without access to the VCF file, I have been unable to test this, but at the very least it should suggest a general direction for how to proceed.
sh does not support bash process substitution <(). The easiest way to port it is to write out two temporary files, and remove them via when via a trap when done. The better option is use a tool that is sufficiently powerful (i.e. sed) to do the filtering and manipulation required:
#!/bin/sh
header="chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP"
field_1_to_5='\(\([^\t]*\t\)\{5\}\)' # \1 to \2
field_6_to_8='\([^\t]*\t\)\{4\}[^:]*:\([^,]*\),\([^:]*\):\([^:]*\).*' # \3 to \6
src="output/${1}/${1}_Variant_Filtering/${1}_GATK_filtered.vcf"
dst="output/${1}/${1}_Variant_Variant_Annotation/${1}_VAF.tsv"
sed -n \
-e '1i '"$header" \
-e '/^#/!s/'"${field_1_to_5}${field_6_to_8}"'/\1\4\t\5\t\6/p' \
"$src" > "$dst"
If you are using awk (or perl, python etc) just port the script to that language instead.
As an aside, all those repeated $1 suggest you should rework your file naming standard.

shell script in a here-document used as input to ssh gives no result

I am piping a result of grep to AWK and using the result as a pattern for another grep inside EOF (not sure whats the terminology there), but the AWK gives me blank results. Below is part of the bash script that gave me issues.
ssh "$USER"#logs << EOF
zgrep $wgr $loc$env/app*$date* | awk -F":" '{print $5 "::" $7}' | awk -F"," '{print $1}' | sort | uniq | while read -r rid ; do
zgrep $rid $loc$env/app*$date*;
done
EOF
I am really drawing a blank here beacuse of no error and Im out of ideas.
Samples:
I am greping log files that looks like below:
app-server.log.2020010416.gz:2020-01-04 16:00:00,441 INFO [redacted] (redacted) [rid:12345::12345-12345-12345-12345-12345,...
I am interested in rid and I can grep that in logs again:
zgrep $rid $loc$env/app*$date*
loc, env and date are working properly, but they are outside of EOF.
The script as a whole connects to ssh and goes out properly but I am getting no result.
The immediate problem is that the dollar signs are evaluated by the local shell because you don't (and presumably cannot) quote the here document (because then $wqr and $loc etc will also not be expanded by the shell).
The quick fix is to backslash the dollar signs, but in addition, I see several opportunities to get rid of inelegant or wasteful constructs.
ssh "$USER"#logs << EOF
zgrep "$wgr" "$loc$env/app"*"$date"* |
awk -F":" '{v = \$5 "::" \$7; split(v, f, /,/); print f[1]}' |
sort -u | xargs -I {} zgrep {} "$loc$env"/app*"$date"*
EOF
If you want to add decorations around the final zgrep, probably revert to the while loop you had; but of course, you need to escape the dollar sign in that, too:
ssh "$USER"#logs << EOF
zgrep "$wgr" "$loc$env/app"*"$date"* |
awk -F":" '{v = \$5 "::" \$7; split(v, f, /,/); print f[1]}' |
sort -u |
while read -r rid; do
echo Dancing hampsters "\$rid" more dancing hampsters
zgrep "\$rid" "$loc$env"/app*"$date"*
done
EOF
Again, any unescaped dollar sign is evaluated by your local shell even before the ssh command starts executing.
Could you please try following. Fair warning I couldn't test it since lack of samples. By doing this approach we need not to escape things while doing ssh.
##Configure/define your shell variables(wgr, loc, env, date, rid) here.
printf -v var_wgr %q "$wgr"
printf -v var_loc %q "$loc"
printf -v var_env %q "$env"
printf -v var_date %q "$date"
ssh -T -p your_pass user#"$host" "bash -s $var_str" <<'EOF'
# retrieve it off the shell command line
zgrep "$var_wgr $var_loc$var_env/app*$var_date*" | awk -F":" '{print $5 "::" $7}' | awk -F"," '{print $1}' | sort | uniq | while read -r rid ; do
zgrep "$rid $var_loc$var_env/app*$date*";
done
EOF

Parse file to .aliasrc

I want to transform a string given in this form:
xyx some commands
into this form:
alias xyz="some commands"
I tried different combinations in the terminal. It seems (i'm not sure) that it worked once, but never when i run this from the script. I've read somewhere that this is a variable problem.
Alias for readability:
alias first="sed 's/\s.*//'"
alias rest="sed 's/\S*\s*//'"
cat f_in | tee -a >(one=$(first)) >(two=$(rest)) | tee >(awk '{print "alias "$1"=\""$2"\""}' > f_out )
I used awk in this way to parse "cat f_in" into "print". It doesn't work. Then, i used "awk -v" but it still doesn't work too. How to redirect variable $one and $two into awk:
{one=$(first) === first | read -r one }?
Is this what you're trying to do:
$ echo 'xyx some commands' |
awk '{var=$1; sub(/^[^[:space:]]+[[:space:]]+/,""); printf "alias %s=\"%s\"\n", var, $0}'
alias xyx="some commands"
$ echo 'xyx some commands' |
sed 's/\([^[:space:]]*\)[[:space:]]*\(.*\)/alias \1="\2"/'
alias xyx="some commands"

foreach: grep backtick in for-loop

How would one grep backtick from files in a for-loop.
I would like to run grep for a pattern '`define'. The pattern works in standalone grep command but fails in for-loop.
foreach xxx ( `grep -r '`define' $idirectory --no-filename | sed -e 's ; //.* ; ; ' -e 's ; #.* ; ; ' -e 's ; ^\s* ; ; ' | grep -v ^$ | sort -n | awk '{print $2}' | uniq -d`)
echo $xxx
end
The backticks are conflicting in the for-loop.
regards
Srisurya
Simply, don't use ' and escape the backtick with backshlash.
So, the next didn't works:
grep -r '`def' *
and prints
No matching command
But this:
grep -r \`def *
works and prints
ewdwedwe `define`
So, simiarly for your script, the next works (file btick.tcsh):
#!/bin/tcsh
set greparg = \`def
foreach xxx ( `grep -l $greparg *` )
echo ===$xxx===
end
and pruduces the next result
===btick.tcsh===
===btick1.txt===
===btick2.txt===
the content of btick.txt files:
btick1 `def`
This is an alternate solution.
Use of ASCII code for grep argument
grep -rP '\x60define' $idirectory
where \x60 is the ascii code for "`"
You should not use old and outdated back ticks, use parentheses like this $(code)
Try this:
for xxx in $(some code $(som more code)
echo "$xxx"
done
Nesting and back tics makes it complicated, it need to be escaped. Compare this to:
listing=`ls -l \`cat filenames.txt\``
vs
listing=$(ls -l $(cat filenames.txt))

Assigning deciles using bash

I'm learning bash, and here's a short script to assign deciles to the second column of file $1.
The complicating bit is the use of awk within the script, leading to ambiguous redirects when I run the script.
I would have gotten this done in SAS by now, but like the idea of two lines of code doing the job.
How can I communicate the total number of rows (${N}) to awk within the script? Thanks.
N=$(wc -l < $1)
cat $1 | sort -t' ' -k2gr,2 | awk '{$3=int((((NR-1)*10.0)/"${N}")+1);print $0}'
You can set an awk variable from the command line using -v.
N=$(wc -l < "$1" | tr -d ' ')
sort -t' ' -k2gr,2 "$1" | awk -v n=$N '{$3=int((((NR-1)*10.0)/n)+1);print $0}'
I added tr -d to get rid of the leading spaces that wc -l puts in its result.

Resources