using awk and printf in for loop bash - bash

How can I use awk and printf in for loop?
Here is my code
for fileName in /home/BamFiles/sample*
do
sampleIds=${fileName##*/}
for Bam in /home/BamFiles/sample*/*.bam
do
samtools idxstats $Bam | awk '{i+=$3+$4} END {printf("%s\t%d",$sampleIds Bam)}'
done
done
I get the the following error
awk: fatal: not enough arguments to satisfy format string
`%d %s'
^ ran out for this one
Expected output is
sample1 52432
sample2 32909
sample3 54000
sample5 45890
Thanks

Awk can never be able to expand a shell variable. You are also trying to pass only a single argument to printf (nothing is passed for %d.
Perhaps you want it this way:
samtools idxstats "$Bam" | awk -v "file=$fileName" '{i+=$3+$4} END {printf("%s\t%d\n", file, i)}'
Note that embedding a variable's value directly to awk's code may be possible by using double-quotes but is not recommended:
samtools idxstats "$Bam" | awk "{i+=\$3+\$4} END {printf(\"%s\\t%d\", $file, i)}"
Suggestion:
shopt -s nullglob
for sample in /home/BamFiles/sample*; do
for bam in "$sample"/*.bam; do
samtools idxstats "$bam"
done | awk -v sample="${sample##*/}" '{ i += $3 + $4 } END { printf("%s\t%d\n", sample, i) }'
done
Or
shopt -s nullglob
for sample in /home/BamFiles/sample*; do
for bam in "$sample"/*.bam; do
samtools idxstats "$bam" | \
awk -v sample="${sample##*/}" -v bam="${bam##*/}" \
'{ i += $3 + $4 } END { printf("%s\t%s\t%d\n", sample, bam, i) }'
done
done

Related

awk command has different behaviors when executing the exact same code. Why?

I have created a little shellscript that is capable of receiving a list of values such as "MY_VAR_NAME=var_value MY_VAR_NAME2=value2 ...", separated by spaces only. There should be also the possibility to use values such as MY_VAR_NAME='' or MY_VAR_NAME= (nothing).
These values are then used to change the value inside a environment variables file, for example, MY_VAR_NAME=var_value would make the script change the MY_VAR_NAME value inside the .env file to var_value, without changing anything else about the file.
The env file has the following configuration:
NODE_ENV=development
APP_PATH=/media
BASE_URL=http://localhost:3000
ASSETS_PATH=http://localhost:3000
USE_CDN=false
APP_PORT=3000
WEBPACK_PORT=8080
IS_CONNECTED_TO_BACKEND=false
SHOULD_BUILD=false
USE_REDUX_TOOL=false
USE_LOG_OUTPUT_AS_JSON=false
ACCESS_KEY_ID=
SECRET_ACCESS_KEY=
BUCKET_NAME=
BASE_PATH=
MIX_PANEL_KEY=
RDSTATION_KEY=
RESOURCE_KEY=
SHOULD_ENABLE_INTERCOM=false
SHOULD_ENABLE_GTM=false
SHOULD_ENABLE_UTA=false
SHOULD_ENABLE_WOOTRIC=false
I have debugged my script, and found out that this is the point where sometimes it has a problem
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
This piece of code sometimes outputs to .envtemp the proper result, while sometimes it just outputs nothing, making .envtemp empty
The complete code i am using is the following:
function change_value(){
VAR_NAME=$1
VAR_VALUE=$2
cat .envtemp | awk -v var_value="$VAR_VALUE" \
-v var_name="$VAR_NAME" \
-F '=' '$0 !~ var_name {print $0} $0 ~ var_name {print $1"="var_value}' | tee .envtemp
ls -l -a .env*
}
function manage_env(){
for VAR in $#
do
var_name=`echo $VAR | awk -F '=' '{print $1}'`
var_value=`echo $VAR | awk -F '=' '{print $2}'`
change_value $var_name $var_value
done
}
function main(){
manage_env $#
cat .envtemp > .env
exit 0
}
main $#
Here is an example script for recreating the error. It does not happen every time, and when it happens, it is not always with the same input.
#!/bin/bash
ENV_MANAGER_INPUT="NODE_ENV=production BASE_URL=http://qa.arquivei.com.br ASSETS_PATH=https://d4m6agb781hapn.cloudfront.net USE_CDN=true WEBPACK_PORT= IS_CONNECTED_TO_BACKEND=true ACCESS_KEY_ID= SECRET_ACCESS_KEY= BUCKET_NAME=frontend-assets-dev BASE_PATH=qa"
cp .env.dist .env
#Removes comment lines. The script needs a .envtemp file.
cat .env.dist | grep -v '#' | grep -v '^$' > .envtemp
./jenkins_env_manager.sh ${ENV_MANAGER_INPUT}
Have you tried use two files:
mv .envtemp .envtemp.tmp
cat .envtemp.tmp | awk ... | tee .envtemp

Why am I not able to store bash output to shell?

I have the following script:
#!/bin/bash
…code setting array ids, etc…
for i in "${!ids[#]}" ; do
echo "#${ids[i]}_${pos[i]}_${wild[i]}_${sub[i]}"
curl -sS "http://www.uniprot.org/uniprot/"${ids[i]}".fasta";
done |
sed '/^>/ d' |
sed -r 's/[#]+/>/g' |
perl -npe 'chomp if ($.!=1 && !s/^>/\n>/)' > $id.pph.fasta
However the results will not store in the file. I can output the result to the terminal and store in file by doing:
./myscript > result.txt
However I want to do this within the script and output to file outside the loop.
Add
exec 1>result.txt
to the top of the script, and all output will be redirected.
Here is a variation of your script:
#!/bin/sh
for i in ${!ids[*]}
do
echo ">${ids[i]}_${pos[i]}_${wild[i]}_${sub[i]}"
curl -Ss www.uniprot.org/uniprot/${ids[i]}.fasta
done |
awk '
/>/ {if (z++) printf RS; print; printf RS; getline; next}
1
END {printf RS}
' ORS= > $id.pph.fasta

How can I specify a row in awk in for loop?

I'm using the following awk command:
my_command | awk -F "[[:space:]]{2,}+" 'NR>1 {print $2}' | egrep "^[[:alnum:]]"
which successfully returns my data like this:
fileName1
file Name 1
file Nameone
f i l e Name 1
So as you can see some file names have spaces. This is fine as I'm just trying to echo the file name (nothing special). The problem is calling that specific row within a loop. I'm trying to do it this way:
i=1
for num in $rows
do
fileName=$(my_command | awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]])"
echo "$num $fileName"
$((i++))
done
But my output is always null
I've also tried using awk -v record=$i and then printing $record but I get the below results.
f i l e Name 1
EDIT
Sorry for the confusion: rows is a variable that list ids like this 11 12 13
and each one of those ids ties to a file name. My command without doing any parsing looks like this:
id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3
I can only use the id field to run a the command that I need, but I want to use the File Info field to notify the user of the actual File that the command is being executed against.
I think your $i does not expand as expected. You should quote your arguments this way:
fileName=$(my_command | awk -F "[[:space:]]{2,}+" "NR==$i {print \$2}" | egrep "^[[:alnum:]]")
And you forgot the other ).
EDIT
As an update to your requirement you could just pass the rows to a single awk command instead of a repeatitive one inside a loop:
#!/bin/bash
ROWS=(11 12)
function my_command {
# This function just emulates my_command and should be removed later.
echo " id File Info OS
11 File Name1 OS1
12 Fi leNa me2 OS2
13 FileName 3 OS3"
}
awk -- '
BEGIN {
input = ARGV[1]
while (getline line < input) {
sub(/^ +/, "", line)
split(line, a, / +/)
for (i = 2; i < ARGC; ++i) {
if (a[1] == ARGV[i]) {
printf "%s %s\n", a[1], a[2]
break
}
}
}
exit
}
' <(my_command) "${ROWS[#]}"
That awk command could be condensed to one line as:
awk -- 'BEGIN { input = ARGV[1]; while (getline line < input) { sub(/^ +/, "", line); split(line, a, / +/); for (i = 2; i < ARGC; ++i) { if (a[1] == ARGV[i]) {; printf "%s %s\n", a[1], a[2]; break; }; }; }; exit; }' <(my_command) "${ROWS[#]}"
Or better yet just use Bash instead as a whole:
#!/bin/bash
ROWS=(11 12)
while IFS=$' ' read -r LINE; do
IFS='|' read -ra FIELDS <<< "${LINE// +( )/|}"
for R in "${ROWS[#]}"; do
if [[ ${FIELDS[0]} == "$R" ]]; then
echo "${R} ${FIELDS[1]}"
break
fi
done
done < <(my_command)
It should give an output like:
11 File Name1
12 Fi leNa me2
Shell variables aren't expanded inside single-quoted strings. Use the -v option to set an awk variable to the shell variable:
fileName=$(my_command | awk -v i=$i -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]])"
This method avoids having to escape all the $ characters in the awk script, as required in konsolebox's answer.
As you already heard, you need to populate an awk variable from your shell variable to be able to use the desired value within the awk script so thi:
awk -F "[[:space:]]{2,}+" 'NR==$i {print $2}' | egrep "^[[:alnum:]]"
should be this:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
Also, though, you don't need awk AND grep since awk can do anything grep van do so you can change this part of your script:
awk -v i="$i" -F "[[:space:]]{2,}+" 'NR==i {print $2}' | egrep "^[[:alnum:]]"
to this:
awk -v i="$i" -F "[[:space:]]{2,}+" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
and you don't need a + after a numeric range so you can change {2,}+ to just {2,}:
awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}'
Most importantly, though, instead of invoking awk once for every invocation of my_command, you can just invoke it once for all of them, i.e. instead of this (assuming this does what you want):
i=1
for num in rows
do
fileName=$(my_command | awk -v i="$i" -F "[[:space:]]{2,}" '(NR==i) && ($2~/^[[:alnum:]]/){print $2}')
echo "$num $fileName"
$((i++))
done
you can do something more like this:
for num in rows
do
my_command
done |
awk -F '[[:space:]]{2,}' '$2~/^[[:alnum:]]/{print NR, $2}'
I say "something like" because you don't tell us what "my_command", "rows" or "num" are so I can't be precise but hopefully you see the pattern. If you give us more info we can provide a better answer.
It's pretty inefficient to rerun my_command (and awk) every time through the loop just to extract one line from its output. Especially when all you're doing is printing out part of each line in order. (I'm assuming that my_command really is exactly the same command and produces the same output every time through your loop.)
If that's the case, this one-liner should do the trick:
paste -d' ' <(printf '%s\n' $rows) <(my_command |
awk -F '[[:space:]]{2,}+' '($2 ~ /^[::alnum::]/) {print $2}')

How can I print the duplicates in a file only once?

I have an input file that contains:
123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple
i want the output to be:
The following numbers are repeated:
123
543
is there a way to get this output using awk; i'm writing the script in solaris , bash
sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d
awk -vFS=',' \
'{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; } \
END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile
There are solutions with sort/uniq/cut as well (see above).
If you can live without awk, you can use this to get the repeating numbers:
cut -d, -f 1 my_file.txt | sort | uniq -d
Prints
123
543
Edit: (in response to your comment)
You can buffer the output and decide if you want to continue. For example:
out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
echo "The following numbers are repeated: $out"
exit
fi
# continue...
This script will print only the number of the first column that are repeated more than once:
awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file
Or in a bit shorter form:
awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file
If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file
In your main script you can do:
awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1
Or in a more readable format:
awk -F, '
a[$1]++==1{
dup=1
}
END{
if (dup) {
printf "The following numbers are repeated: ";
for (i in a)
if (a[i]>1)
printf "%s ",i;
print "";
exit(-1)
}
}
' file || exit -1

shell scripting: nested subshell ++

More than a problem, this is a request for "another way to do this".
Actually, if I want to use the result of a previous command in another one, I use:
R1=$("cat somefile | awk '{ print $1 }'" )
myScript -c $R1 -h123
then, a "better way" is:
myScript -c $("cat somefile | awk '{ print $1 }'" ) -h123
But what if I have to use the result several times? Let's say: using several times $R1, well the 2 options are:
Option 1
R1=$("cat somefile | awk '{ print $1}'")
myScript -c $R1 -h123 -x$R1
option 2
myScript -c $("cat somefile | awk '{ print $1 }'" ) -h123 -x $("cat somefile | awk '{ print $1 }'" )
Do you know another way to "store" the result of a previous command/script and use it as a argument into another command/script?
Sure, there are other ways. They're just not better ways.
First, you could store the answer in a file, and then cat the contents of the file multiple times.
Second, you could pass the results to a bash function like:
callMyScript() {
myScript -c "$1" -h123 -x "$1"
}
invoked thusly:
callMyScript "$(awk '{ print $1; }' somefile)"
which is almost precisely identical to just saving off into a local variable.
So you're interested in using awk? You could have awk generate the line for you, and have bash run it:
eval $(awk '{ printf "myScript -c %s -h123 -x %s\n", $1, $1; }' somefile)
but now we're just getting silly, and even that's no different conceptually from simply saving off into a variable.
My advice: Use the variable.
Another not perfect way:
source script #1, then when script #2 runs all of the variables declared in #1 are available to #2.
#!/bin/ksh
. myScript -c $("cat somefile | awk '{ print $1 }'" ) -h123
myScript2
Your best choice is a wrapper script that keeps all your variables for you, as everyone else already noted.
In bash the source builtin command is the same as the 'dot'

Resources