I'm facing an error while converting my bash comand to shell script syntax error in shell script - bash

#!/bin/bash
set -o errexit
set -o nounset
#VAF_and_IGV_TAG
paste <(grep -v "^#" output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf | cut -f-5) \
<(grep -v "^#" output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf | cut -f10-| cut -d ":" -f2,3) |
sed 's/:/\t/g' |
sed '1i chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP'|
awk 'BEGIN{FS=OFS="\t"}{sub(/,/,"\t",$6);print}' \
> output/"$1"/"$1"_Variant_Annotation/"$1"_VAF.tsv
My above code ends up with a syntax error if I run this in the terminal without using the variable it shows no syntax error
sh Test.sh S1 Test.sh: 6: Test.sh: Syntax error: "(" unexpected
paste <(grep -v "^#" output/S1/S1_Variant_Filtering/S1_GATK_filtered.vcf | cut -f-5) \
<(grep -v "^#" output/S1/S1_Variant_Filtering/S1_GATK_filtered.vcf | cut -f10-| cut -d ":" -f2,3) |
sed 's/:/\t/g' |
sed '1i chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP'|
awk 'BEGIN{FS=OFS="\t"}{sub(/,/,"\t",$6);print}' \
> output/S1/S1_Variant_Annotation/S1_VAF.ts
My vcf file looks like this: https://drive.google.com/file/d/1HaGx1-3o1VLCrL8fV0swqZTviWpBTGds/view?usp=sharing

You cannot use <(command) process substitution if you are trying to run this code under sh. Unfortunately, there is no elegant way to avoid a temporary file (or something even more horrid) but your paste command - and indeed the entire pipeline - seems to be reasonably easy to refactor into an Awk script instead.
#!/bin/sh
set -eu
awk -F '\t' 'BEGIN { OFS=FS;
print "chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP' }
!/#/ { p=$0; sub(/^([^\t]*\t){9}/, "", p);
sub(/^[:]*:/, "", p); sub(/:.*/, "", p);
sub(/,/, "\t", p);
s = sprintf("%s\t%s\t%s\t%s\t%s\t%s", $1, $2, $3, $4, $5, p);
gsub(/:/, "\t", s);
print s
}' output/"$1"/"$1"_Variant_Filtering/"$1"_GATK_filtered.vcf \
> output/"$1"/"$1"_Variant_Annotation/"$1"_VAF.tsv
Without access to the VCF file, I have been unable to test this, but at the very least it should suggest a general direction for how to proceed.

sh does not support bash process substitution <(). The easiest way to port it is to write out two temporary files, and remove them via when via a trap when done. The better option is use a tool that is sufficiently powerful (i.e. sed) to do the filtering and manipulation required:
#!/bin/sh
header="chr\tstart\tend\tref\talt\tNormal_DP_VCF\tTumor_DP_VCF\tDP"
field_1_to_5='\(\([^\t]*\t\)\{5\}\)' # \1 to \2
field_6_to_8='\([^\t]*\t\)\{4\}[^:]*:\([^,]*\),\([^:]*\):\([^:]*\).*' # \3 to \6
src="output/${1}/${1}_Variant_Filtering/${1}_GATK_filtered.vcf"
dst="output/${1}/${1}_Variant_Variant_Annotation/${1}_VAF.tsv"
sed -n \
-e '1i '"$header" \
-e '/^#/!s/'"${field_1_to_5}${field_6_to_8}"'/\1\4\t\5\t\6/p' \
"$src" > "$dst"
If you are using awk (or perl, python etc) just port the script to that language instead.
As an aside, all those repeated $1 suggest you should rework your file naming standard.

Related

syntax error near unexpected token near `('

Command below does not run from script:
zcat *|cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if (\$1 =="\"word\"" || \$1 =="\"word2\""){printf "\n%s",\$0}else{printf "%s",\$0}}' |
grep -i "resultCode>00000" | wc -l
Error:
./script.sh: command substitution: line 8: syntax error near unexpected token `('
./script.sh: command substitution: line 8: `ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *|cut -d"," -f1,2 | tr -d "\r" | awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){printf "\n\%s",$0}else{printf "\%s",$0}}'| grep -i "resultCode>00000" | wc -l''
How should i fix syntax error near unexpected token?
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" &&
zcat *|cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){
printf "\n\%s",$0}else{printf "\%s",$0}}'|
grep -i "resultCode>00000" | wc -l''
There's a mountain of syntax errors here. First off, you can't nest single quotes like this: ''''. That's two single-quoted empty strings next to each other, not single quotes inside single quotes. In fact, there is no way to have single quotes inside single quotes. (It is possible to get them there by other means, e.g. by switching to double quotes.)
If you don't have any particular reason to run all of these commands remotely, the simplest fix is probably to just run the zcat in SSH, and have the rest of the pipeline run locally. If the output from zcat is massive, there could be good reasons to avoid sending it all over the SSH connection, but let's just figure out a way to fix this first.
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *' |
cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){
printf "\n\%s",$0}else{printf "\%s",$0}}'|
grep -i "resultCode>00000" | wc -l
But of course, you can replace grep | wc -l with grep -c, and probably refactor all of the rest into your Awk script.
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *' |
awk -F "," '$1 ~ /^\"(word|word2)\"$/ { printf "\n%s,%s", $1, $2; next }
{ printf "%s,%s", $1, $2 }
END { printf "\n" }' |
grep -ic "resultCode>0000"
The final grep can probably also be refactored into the Awk script, but without more knowledge of what your expected input looks like, I would have to guess too many things. (This already rests on some possibly incorrect assumptions.)
If you want to run all of this remotely, the second simplest fix is probably to pass the script as a here document to SSH.
ssh -t user#ip <<\:
cd "$(ls -td path/* | tail -n1)" &&
zcat * |
awk -F "," '$1 ~ /^\"(word|word2)\"$/ { printf "\n%s,%s", $1, $2; next }
{ printf "%s,%s", $1, $2 } END { printf "\n" }' |
grep -ic "resultCode>00000"
:
where again my refactoring of your Awk script may or may not be an oversimplification which doesn't do exactly what your original code did. (In particular, removing DOS carriage returns from the end of the line seems superfluous if you are only examining the first two fields of the input; but perhaps there can be lines which only have two fields, which need to have the carriage returns trimmed. That's easy in Awk as such; sub(/\r/, "").)

Escaping parentheses in perl backticks for bash command

I have been using backticks for years but this is the first time I have tried using a command with a parentheses. I am getting an error that I cannot figure out.
I have tried putting in double quotes and escaping with the \ in multiple places, but nothing seems to work. Any help would be appreciated.
COMMAND
the $file5 and $file6 are perl variables, not bash.
#array = `/usr/bin/join -j 1 -t, <(cat $file5 | awk -F, '{print \$3","\$1}' | sort) <( cat $file6 | awk -F, '{print \$3","\$1}' | sort) `
ERROR:
AH01215: sh: -c: line 0: syntax error near unexpected token `(', referer:
Backticks use /bin/sh, and while <( ... ) is something recognized by bash, it's not recognized by the bourne shell. If you use backticks, you will need to use
my $bash_cmd = ...;
my #lines = `bash -c $bash_cmd`;
Building sh and bash shell commands can be done using String::ShellQuote.
use String::ShellQuote qw( shell_quote );
my $file5_quoted = shell_quote($file5);
my $file6_quoted = shell_quote($file6);
my $awk_cmd = shell_quote("awk", "-F,", '{print $3","$1}');
my $bash_cmd = '/usr/bin/join -j 1 -t,'
. " <( $awk_cmd $file5_quoted | sort )"
. " <( $awk_cmd $file6_quoted | sort )";
my $sh_cmd = shell_quote("bash", "-c", $bash_cmd);
my #lines = `$sh_cmd`;
We can use IPC::System::Simple's capturex to avoid launching more shells than needed, as well as to provide error checking. To do this, replace the last two lines of the above with the following:
use IPC::System::Simple qw( capturex );
my #lines = capturex("bash", "-c", $bash_cmd);
One workaround is to create a shell script that accepts the two filenames from perl, processes the join using those two input files, and return the result to the perl array.
#1. Create join.sh that contains these four lines:
cat $1 | awk -F, '{print $3","$1}' | sort > 1.out
cat $2 | awk -F, '{print $3","$1}' | sort > 2.out
/usr/bin/join -j 1 -t, 1.out 2.out
rm 1.out 2.out
#2. Modify your perl statement to call join.sh as follows:
#array=`join.sh $file5 $file6`;

shell script in a here-document used as input to ssh gives no result

I am piping a result of grep to AWK and using the result as a pattern for another grep inside EOF (not sure whats the terminology there), but the AWK gives me blank results. Below is part of the bash script that gave me issues.
ssh "$USER"#logs << EOF
zgrep $wgr $loc$env/app*$date* | awk -F":" '{print $5 "::" $7}' | awk -F"," '{print $1}' | sort | uniq | while read -r rid ; do
zgrep $rid $loc$env/app*$date*;
done
EOF
I am really drawing a blank here beacuse of no error and Im out of ideas.
Samples:
I am greping log files that looks like below:
app-server.log.2020010416.gz:2020-01-04 16:00:00,441 INFO [redacted] (redacted) [rid:12345::12345-12345-12345-12345-12345,...
I am interested in rid and I can grep that in logs again:
zgrep $rid $loc$env/app*$date*
loc, env and date are working properly, but they are outside of EOF.
The script as a whole connects to ssh and goes out properly but I am getting no result.
The immediate problem is that the dollar signs are evaluated by the local shell because you don't (and presumably cannot) quote the here document (because then $wqr and $loc etc will also not be expanded by the shell).
The quick fix is to backslash the dollar signs, but in addition, I see several opportunities to get rid of inelegant or wasteful constructs.
ssh "$USER"#logs << EOF
zgrep "$wgr" "$loc$env/app"*"$date"* |
awk -F":" '{v = \$5 "::" \$7; split(v, f, /,/); print f[1]}' |
sort -u | xargs -I {} zgrep {} "$loc$env"/app*"$date"*
EOF
If you want to add decorations around the final zgrep, probably revert to the while loop you had; but of course, you need to escape the dollar sign in that, too:
ssh "$USER"#logs << EOF
zgrep "$wgr" "$loc$env/app"*"$date"* |
awk -F":" '{v = \$5 "::" \$7; split(v, f, /,/); print f[1]}' |
sort -u |
while read -r rid; do
echo Dancing hampsters "\$rid" more dancing hampsters
zgrep "\$rid" "$loc$env"/app*"$date"*
done
EOF
Again, any unescaped dollar sign is evaluated by your local shell even before the ssh command starts executing.
Could you please try following. Fair warning I couldn't test it since lack of samples. By doing this approach we need not to escape things while doing ssh.
##Configure/define your shell variables(wgr, loc, env, date, rid) here.
printf -v var_wgr %q "$wgr"
printf -v var_loc %q "$loc"
printf -v var_env %q "$env"
printf -v var_date %q "$date"
ssh -T -p your_pass user#"$host" "bash -s $var_str" <<'EOF'
# retrieve it off the shell command line
zgrep "$var_wgr $var_loc$var_env/app*$var_date*" | awk -F":" '{print $5 "::" $7}' | awk -F"," '{print $1}' | sort | uniq | while read -r rid ; do
zgrep "$rid $var_loc$var_env/app*$date*";
done
EOF

awk exec command for every line and keep columns

I have a large dataset files with two columns like
AS জীৱবিজ্ঞানবিভাগ
AS চেতনাদাস
AS বৈকল্পিক
and I want to run my command on the second column, store the result and get the output with the same column formatting:
AS jibvigyanvibhag
AS chetanadas
AS baikalpik
where my command is this pipe:
echo "$0" | indictrans -s asm -t eng --ml --build-lookup
So I'm doing like
awk -v OFS="\t" '{ print "echo "$2" | indictrans -s asm -t eng --ml --build-lookup" | "/bin/sh"}' in.txt > out.txt
but this will not preserve the columns, it just prints out the first column like this
jibvigyanvibhag
chetanadas
baikalpik
My solution was the following
awk -v OFS="\t" '{ "echo "$2" | indictrans -s asm -t eng --ml --build-lookup" | getline RES; print $1,$2,RES}' in.txt > out.txt
that will print out
AS জীৱবিজ্ঞানবিভাগ jibvigyanvibhag
AS চেতনাদাস chetanadas
AS বৈকল্পিক baikalpik
Now I want to put parametrize the command, but the escape looks odd here:
"echo "$0" | indictrans -s $SOURCE -t $TARGET --ml --build-lookup"
and it does not work. How to correctly exec this command and escape the parameters?
[UPDATE]
This is a partial solution I came out inspired by the suggested one
#!/bin/bash
SOURCE=asm
TARGET=eng
IN=$2
OUT=$3
awk -v OFS="\t" '{
CMD = "echo "$2" | indictrans -s asm -t eng --ml --build-lookup"
CMD | getline RES
print $1,RES
close(CMD)
}' $IN > $OUT
I still cannot get rid of the variables, it seems that I cannot define with -v as usual like
awk -v OFS="\t" -v source=$SOURCE -v target=$TARGET '{
CMD = "echo "$2" | indictrans -s source -t target --ml --build-lookup"
...
NOTES.
The indictrans process handles the stdin and writes to stdout in this way:
for line in ifp:
tline = trn.convert(line)
ofp.write(tline)
# close files
ifp.close()
ofp.close()
where
ifp = codecs.getreader('utf8')(sys.stdin)
ofp = codecs.getwriter('utf8')(sys.stdout)
so it takes one line from stdin, processes the data with some library trn.convert and writes the results to stdout without any parallelism.
For this reason (lack of parallelism in terms of multiline input) the performances are bound by the size of the dataset (number of rows).
An example input two column dataset (1K rows) is available here. An example sample is
KN ಐಕ್ಯತೆ ಕ್ಷೇಮಾಭಿವೃದ್ಧಿ ಸಂಸ್ಥೆ ವಿಜಯಪುರ
KN ಹೊರಗಿನ ಸಂಪರ್ಕಗಳು
KN ಮಕ್ಕಳ ಸಾಹಿತ್ಯ ಮತ್ತು ಸಾಂಸ್ಖ್ರುತಿಕ ಕ್ಷೇತ್ರದಲ್ಲಿ ಸೇವೆ ಸಲ್ಲಿಸುತ್ತಿರುವ ಸಂಸ್ಠೆ ಮಕ್ಕಳ ಲೋಕ
while the example script based on the last accepted answer is here
Don't invoke shells with awk. The shell itself avoids treating data as if it were code unless explicitly instructed to do otherwise -- but when you use system() or popen(), as the awk code is doing here, everything passed as an argument is parsed in a context where data is able to escape its quoting and be treated as code.
Simple approach: One indictrans per line
If you need a separate copy of indictrans for each line to be executed, use:
while read -r col1 rest; do
printf '%s\t%s\n' "$col1" "$(indictrans -s asm -t eng --ml --build-lookup <<<"$rest")"
done <in.txt >out.txt
Fast Approach: One indictrans processing all lines
If indictrans generates one line of output per line of input, you can do even better, by pasting together one stream with all the first columns and a second string with the translations of the remainder of the lines, thus requiring only one copy of indictrans to be run:
#!/usr/bin/env bash
# ^^^^- not compatible with /bin/sh
paste <(<in.txt awk '{print $1}') \
<(<in.txt sed -E 's/^[^[:space:]]*[[:space:]]//' \
| indictrans -s asm -t eng --ml --build-lookup) \
>out.txt
You can pipe column 2 to your command and change it with command's output like below in awk.
{
cmd = "echo "$2" | indictrans -s asm -t eng --ml --build-lookup"
cmd | getline $2
close(cmd)
} 1
If SOURCE and TARGET are awk variables
{
cmd = "echo "$0" | indictrans -s "SOURCE" -t "TARGET" --ml --build-lookup"
cmd
close(cmd)
}

awk command has different behaviors when executing the exact same code. Why?

I have created a little shellscript that is capable of receiving a list of values such as "MY_VAR_NAME=var_value MY_VAR_NAME2=value2 ...", separated by spaces only. There should be also the possibility to use values such as MY_VAR_NAME='' or MY_VAR_NAME= (nothing).
These values are then used to change the value inside a environment variables file, for example, MY_VAR_NAME=var_value would make the script change the MY_VAR_NAME value inside the .env file to var_value, without changing anything else about the file.
The env file has the following configuration:
NODE_ENV=development
APP_PATH=/media
BASE_URL=http://localhost:3000
ASSETS_PATH=http://localhost:3000
USE_CDN=false
APP_PORT=3000
WEBPACK_PORT=8080
IS_CONNECTED_TO_BACKEND=false
SHOULD_BUILD=false
USE_REDUX_TOOL=false
USE_LOG_OUTPUT_AS_JSON=false
ACCESS_KEY_ID=
SECRET_ACCESS_KEY=
BUCKET_NAME=
BASE_PATH=
MIX_PANEL_KEY=
RDSTATION_KEY=
RESOURCE_KEY=
SHOULD_ENABLE_INTERCOM=false
SHOULD_ENABLE_GTM=false
SHOULD_ENABLE_UTA=false
SHOULD_ENABLE_WOOTRIC=false
I have debugged my script, and found out that this is the point where sometimes it has a problem
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
This piece of code sometimes outputs to .envtemp the proper result, while sometimes it just outputs nothing, making .envtemp empty
The complete code i am using is the following:
function change_value(){
VAR_NAME=$1
VAR_VALUE=$2
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
ls -l -a .env*
}
function manage_env(){
for VAR in $#
do
var_name=`echo $VAR | awk -F '=' '{print $1}'`
var_value=`echo $VAR | awk -F '=' '{print $2}'`
change_value $var_name $var_value
done
}
function main(){
manage_env $#
cat .envtemp > .env
exit 0
}
main $#
Here is an example script for recreating the error. It does not happen every time, and when it happens, it is not always with the same input.
#!/bin/bash
ENV_MANAGER_INPUT="NODE_ENV=production BASE_URL=http://qa.arquivei.com.br ASSETS_PATH=https://d4m6agb781hapn.cloudfront.net USE_CDN=true WEBPACK_PORT= IS_CONNECTED_TO_BACKEND=true ACCESS_KEY_ID= SECRET_ACCESS_KEY= BUCKET_NAME=frontend-assets-dev BASE_PATH=qa"
cp .env.dist .env
#Removes comment lines. The script needs a .envtemp file.
cat .env.dist | grep -v '#' | grep -v '^$' > .envtemp
./jenkins_env_manager.sh ${ENV_MANAGER_INPUT}
Have you tried use two files:
mv .envtemp .envtemp.tmp
cat .envtemp.tmp | awk ... | tee .envtemp

Resources