Using Array With Awk [duplicate] - bash

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 7 years ago.
I am using an array of values, and I want to look for those values using awk and output to file. In the awk line if I replace the first "$i" with the numbers themselves, the script works, but when I try to use the variable "$i" the script no longer works.
declare -a arr=("5073770" "7577539")
for i in "${arr[#]}"
do
echo "$i"
awk -F'[;\t]' '$2 ~ "$i"{sub(/DP=/,"",$15); print $15}' $INPUT >> "$i"
done
The file I'm looking at contains many lines like the following:
chr12 3356475 . C A 76.508 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=3;CIGAR=1X;DP=3;DPB=3;DPRA=0;EPP=9.52472;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=8.76405;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=111;QR=0;RO=0;RPP=9.52472;RPPR=0;RUN=1;SAF=3;SAP=9.52472;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp GT:DP:RO:QR:AO:QA:GL 1/1:3:0:0:3:111:-10,-0.90309,0

Pass the value $i to awk using -v:
awk -F'[;\t]' -v var="$i" '$2 ~ var{sub(/DP=/,"",$15); print $15}' $INPUT >> "$i"

awk will have no idea what the value of the shell's $i is unless you explicitly pass it into awk as a variable
awk -F'[;\t]' -v "VAR=${i}" '$2 ~ VAR {....
I expect the result you see is because 'i' is undefined and treated as zero
which makes your test '$2 ~ $0 {...

You can avoid awk and do this in BASH itself:
arr=("5073770" "7577539" "3356475")
for i in "${arr[#]}"; do
while IFS='['$'\t'';]' read -ra arr; do
[[ ${arr[1]} == *$i* ]] && { s="${arr[14]}"; echo "${s#DP=}"; }
done < "$INPUT"
done

Related

Search equality in a certain field with AWK [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 1 year ago.
I am trying to get the name out of /etc/passwd using awk to search only in the 5th field of every row, and then to cut some part of that line and print it out.
This is what I wrote but it doesn't seems to work:
for iter in "$#";
do cat /etc/passwd | awk -F ":" '$5==$iter' | cut -d":" -f6;
done;
concerning the delimiter syntax, everything should be fine I guess?
so my problem is in the $5==$iter, I assume.
How can I change that $5==$iter to - if the 5th field of that row contains my $iter var, then cut and so on..
Sorry for the ignorance, I am a beginner :)
Thanks in advance.
See How do I use shell variables in an awk script?
-v should be used to pass shell variables into awk. Also, there's no reason to use either cat or cut here:
for iter in "$#"; do
awk -F: -v iter="$iter" '$5==iter { print $6 }' </etc/passwd
done
As Charles Duffy commented, your code would be more efficient if it didn't need to read /etc/passwd every pass. And while this particular loop probably doesn't need to be optimized (after all, /etc/passwd is typically not that long and most OS's would cache the file anyway after the first read), it would be interesting to see an awk script read the file only once.
That said, here's another implementation where awk is only invoked once:
printf "%s\n" "$#" | awk -F: '
NR == FNR { etc_passwd[ $5 ] = $6; next }
{ print $0 , etc_passwd[ $0 ] }
' /etc/passwd /dev/stdin
The NR == FNR condition is an idiom that causes its associated command only to be executed for the first file in the list of files that follows the awk script (that is, for the reading of /etc/passwd).
You can also do everything in bash, example:
#!/bin/bash
declare -A passwd # declare a associative array
# build the associative array "passwd" with the
# 5th field as a "key" and 6th field as "value"
while IFS=$':\n' read -a line; do # emulate awk to extract fields
[[ -n "${line[4]}" ]] || continue # avoid blank "keys"
passwd["${line[4]}"]=${line[5]} # in bash, arrays starting in "0"
done < /etc/passwd
for iter in "$#"; do
if [ ${passwd[$iter] + 'x'} ]; then
echo ${passwd[$iter]}
fi
done
(This version doesn't get into accout mĂșltiples values for 5th field)
here is a better version that can handle blank values as well, ike./script.sh '':
while IFS=$':\n' read -a line; do
for iter in "$#"; do
if [ "$iter" == "${line[4]}" ]; then
echo ${line[5]}
continue
fi
done
done < /etc/passwd
A pure awk solution could be:
#!/usr/bin/awk -f
BEGIN {
FS = ":"
for ( i = 1; i < ARGC; i++ ) {
args[ARGV[i]] = 1
delete ARGV[i]
}
ARGV[1] = "/etc/passwd"
}
($5 in args) { print $6 }
and you could call as ./script.awk -f 'param1' 'param2'.

Adding similar lines in bash [duplicate]

This question already has answers here:
Sort keys and Sum their values in bash
(4 answers)
sum of column in text file using shell script
(4 answers)
How can I sum values in column based on the value in another column?
(5 answers)
Closed 4 years ago.
I have a file with below records:
$ cat sample.txt
ABC,100
XYZ,50
ABC,150
QWE,100
ABC,50
XYZ,100
Expecting the output to be:
$ cat output.txt
ABC,300
XYZ,150
QWE,100
I tried the below script:
PREVVAL1=0
SUM1=0
cat sam.txt | sort > /tmp/Pos.part
while read line
do
VAL1=$(echo $line | awk -F, '{print $1}')
VAL2=$(echo $line | awk -F, '{print $2}')
if [ $VAL1 == $PREVVAL1 ]
then
SUM1=` expr $SUM + $VAL2`
PREVVAL1=$VAL1
echo $VAL1 $SUM1
else
SUM1=$VAL2
PREVVAL1=$VAL1
fi
done < /tmp/Pos.part
I want to get some one liner command to get the required output. Wanted to avoid the while loop concept. I want to just add the numbers where the first column is same and show it in a single line.
awk -F, '{a[$1]+=$2} END{for (i in a) print i FS a[i]}' sample.txt
Output
QWE,100
XYZ,150
ABC,300
The first part is executed for each line and creates an associative array. The END part prints this array.
It's an awk one-liner:
awk -F, -v OFS=, '{sum[$1]+=$2} END {for (key in sum) print key, sum[key]}' sample.txt > output.txt
sum[$1] += $2 creates an associative array whose keys are the first field and values are the corresponding sums.
This can also be done easily enough in native bash. The following uses no external tools, no subshells and no pipelines, and is thus far faster (I'd place money on 100x the throughput on a typical/reasonable system) than your original code:
declare -A sums=( )
while IFS=, read -r name val; do
sums[$name]=$(( ${sums[$name]:-0} + val ))
done
for key in "${!sums[#]}"; do
printf '%s,%s\n' "$key" "${sums[$key]}"
done
If you want to, you can make this a one-liner:
declare -A sums=( ); while IFS=, read -r name val; do sums[$name]=$(( ${sums[$name]:-0} + val )); done; for key in "${!sums[#]}"; do printf '%s,%s\n' "$key" "${sums[$key]}"; done

How to get the 'variable' line from file? [duplicate]

This question already has answers here:
Bash tool to get nth line from a file
(22 answers)
Closed 7 years ago.
This is my script. It print every row in the file with the number of row.
Next i want to read which row user choosed and save it to some variable.
I=1
for ROW in $(cat file.txt)
do
echo "$I $ROW"
I=`expr $I + 1`
done
read var
awk 'FNR = $var {print}' file.txt
Then i want to to print / save the chosen row into the file.
How can I do this ?
when i echo $var it shows me properly the number. But when i'm trying to use this variable in awk, it print every line.
How to read the 'var' line from file?
And moreover, how to save this line in other variable?
Example file.txt
1 line1
2 line2
3 line3
4 line4
when i tap 3 i want to read third line from file.
Try this:
cat -n file.txt; read var; line="$(sed -n ${var}p file)"; echo "$line"
With more focus on Dryingsoussage's version:
#!/bin/bash
file="file.txt"
declare -i counter=0 # set integer attribute
var=0
while read -r line; do
counter=counter+1
printf "%d %s\n" "$counter" "$line"
done < "$file"
# check for number and greater-than 0 and less-than-or-equal $counter
until [[ $var =~ ^[0-9]+$ ]] && [[ $var -gt 0 ]] && [[ $var -le $counter ]]; do
read -p "Enter line number:" var
done
awk -v var="$var" 'FNR==var {print}' "$file"
You cannot use $varname inside ' ' they will not be resolved.
look at this other post it should help you:
How to use shell variables in an awk script
cat -n file.txt
read var
row="$(awk -v tgt="$var" 'NR==tgt{print;exit}' file.txt)"
First: You cannot use $var in a single quotes, as echo '$var' would be plain $var, no its value.
Second: You used = (assignment) operator instead of == (equality) operator.
Third: You don't have to write { print } if you want the line to be printed. You can write nothing instead.
Fourth: As was explained in the deleted comment below - do not allow bash expanding the variables in the awk script code, as it can lead to code injection.
So conclusion is:
awk -v var="$var" 'FNR == var' file.txt
should do what you want.

How to cut and assign the string to a dynamic array inside the for loop

This is what i have done to perform this function but I am not getting what i want.
#!/bin/sh
DIRECTIONPART1=4-7-9
for (( i=1; i<=3; i++ ))
do
x=`echo $DIRECTIONPART1| awk -F'-' '{print $i}'`
myarray[$i]=$x
done
for (( c=1; c<=3; c++ ))
do
echo ${myarray[$c]}
done
Problem we realised at this step
x=`echo $DIRECTIONPART1| awk -F'-' '{print $i}'`
Please help me in getting the result
This is what i get :
4-7-9
4-7-9
4-7-9
But I want this:
4
7
9
you are right with line of problem. The problem is that you cant use $i as variable in print. I have tried little workaround which worked for me:
x=`echo $DIRECTIONPART1| awk -F '-' -v var=$i '{print $var }'`
in all it looks like:
#!/bin/sh
DIRECTIONPART1=4-7-9
for (( i=1; i<=3; i++ ))
do
x=`echo $DIRECTIONPART1| awk -F '-' -v var=$i '{print $var }'`
myarray[$i]=$x
done
for (( c=1; c<=3; c++ ))
do
echo ${myarray[$c]}
done
with expected output:
# sh test.sh
4
7
9
#
The simplest portable way to get the desired output is to use $IFS (in a subshell):
#!/bin/sh
DIRECTIONPART1=4-7-9
(IFS=- && echo $DIRECTIONPART1)
The shell array would not work portably, as POSIX, ksh, and bash do not
agree on arrays. POSIX doesn't have any; ksh and bash use different syntax.
If you really want an array, I would suggest to do the entire thing in awk:
#!/bin/sh
DIRECTIONPART1=4-7-9
awk -v v=${DIRECTIONPART1} 'BEGIN {
n=split(v,a,"-")
for (i=1;i<=n;i++) {
print a[i]
}
}'
This will produce one line for each value in the string:
4
7
9
And if you want bash arrays, drop the #!/bin/sh, and do something like this:
#!/bin/bash
DIRECTIONPART1=4-7-9
A=( $(IFS=- && echo $DIRECTIONPART1) )
for ((i=0;i<=${#A[#]};i++))
do
echo ${A[i]}
done
Calling awk multiple times, or even once, is not the right thing to do. Use the bash built-in read to populate the array.
# Note that the quotes here are only necessary to
# work around a bug that was fixed in bash 4.3. It
# doesn't hurt to use them in any version, though.
$ IFS=- read -a myarray <<< "$DIRECTIONPART_1"
$ printf '%s\n' "${myarray[#]}"
4
7
9
[akshay#localhost tmp]$ bash test.sh
#!/usr/bin/env bash
DIRECTIONPART1=4-7-9
# Create array
IFS='-' read -a array <<< "$DIRECTIONPART1"
#To access an individual element:
echo "${array[0]}"
#To iterate over the elements:
for element in "${array[#]}"
do
echo "$element"
done
#To get both the index and the value:
for index in "${!array[#]}"
do
echo "$index ${array[index]}"
done
Output
[akshay#localhost tmp]$ bash test.sh
4
4
7
9
0 4
1 7
2 9
OR
[akshay#localhost tmp]$ cat test1.sh
#!/usr/bin/env bash
DIRECTIONPART1=4-7-9
array=(${DIRECTIONPART1//-/ })
for index in "${!array[#]}"
do
echo "$index ${array[index]}"
done
Output
[akshay#localhost tmp]$ bash test1.sh
0 4
1 7
2 9

pulling information out of a string in shell script

I am having trouble pulling out the information I need from a string in my shell script. I have read and tried to come up with the correct awk or sed command to do it, but I just can't figure it out. Hopefully you guys can help.
Lets say I have a string as follows:
["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]
Now what I want to do is pull out all of these properties into individual arrays of strings. For example:
I would like to have an array of ids 2817262 2262 28182
an array of name somename somename somename
an array of hasproperty false false true
Can anyone help me come up with the commands I need to pull this out. Also keep in mind the string will likely be much longer than this, so if we can not make it specific to 3 cases that would be helpful. Thanks so much in advance.
You could use grep.
grep -oP '"ids":\K\d+' file
Example:
$ echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' | grep -oP '"ids":\K\d+'
2817262
2262
28182
Since it is tagged with awk
awk '{while(x=match($0,/"ids":([^,]+)/,a)){print a[1];$0=substr($0,x+RLENGTH)}}' file
This just keeps matching any ids then changing the line to contain only what is after the id.
Output
2817262
2262
28182
Could also do this(inspired by Wintermutes comment on another answer)
awk -v RS=",|]" 'sub(/^.*"ids":/,"")' file
The grep solution is beautiful. You question was tagged awk. The awk solution is ugly:
echo '["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]' \
| awk '{split(substr($0,2,length($0)-2),x,",");
for(i=0;i<length(x);i++) {split(x[i],a,":");
if(a[1]=="\"ids\"") print a[1],a[2]}}'
Output:
"ids" 2817262
"ids" 2262
"ids" 28182
Please choose the grep solution as the correct answer.
Here is a pure bash solution (long-winded, isn't it? I tend to agree with #chepner):
str='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,
"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,
"isvalid":true,"name":"somename","hasproperty":true]'
#Remove [ ]
str=${str/[/}
str=${str/]/}
declare -a ids
declare -a names
declare -a properties
oldIFS="$IFS"
IFS=','
for record in $str
do
type=${record%%:*}
value=${record##*:}
if [[ $type == \"ids\" ]]
then
ids[ids_i++]="$value"
elif [[ $type == \"name\" ]]
then
names[names_i++]="$value"
elif [[ $type == \"hasproperty\" ]]
then
properties[properties_i++]="$value"
else
echo "Ignored type: '$type'" >&2
fi
done
IFS="$oldIFS"
echo "ids: ${ids[#]}"
echo "names: ${names[#]}"
echo "properties: ${properties[#]}"
The only thing going for it is that there are no child processes.
awk 'BEGIN {
Field = 1
Index = 0
}
{
gsub( /[][]/,"")
gsub( /"[a-z]*":/, "")
FS=","
while ( Field < NF) {
ThisID[ Index]=$Field
ThisName[ Index]=$(Field + 2)
ThisProperty [ Index]=$(Field + 3)
Index+=1
Field+=4
}
}
END {
for ( Iter=0;Iter<Index;Iter+=1) printf( "%s ", ThisID[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisName[Iter])
printf "\n"
for ( Iter=0;Iter<Index;Iter++) printf( "%s ", ThisProperty[Iter])
printf "\n"
}' YourFile
still to assign your array to your favorite variable
unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
eval ${i%:*}[$n]=${i#*:}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)
The above will produce 4 arrays (ids,isvalid,name,hasproperty). If you need not isvalid just add:
unset n
string='["ids":2817262,"isvalid":true,"name":"somename","hasproperty":false,"ids":2262,"isvalid":false,"name":"somename","hasproperty":false,"ids":28182,"isvalid":true,"name":"somename","hasproperty":true]'
while IFS=',' read -ra line
do
((n++))
for i in "${line[#]//\"/}"
do
[ "${i%:*}" != "isvalid" ] && eval ${i/:/[$n]=}
done
done < <(sed 's/[][]//g;s/,"ids/\n"ids/g' <<<$string)
Given your posted input, if all you wanted was the list of each type of item then this is all you'd need:
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^ids/{print $2}' file
2817262
2262
28182
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^name/{print $2}' file
somename
somename
somename
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^hasproperty/{print $2}' file
false
false
true
$ awk -v RS=, -F: '{gsub(/[[\]"\n]/,"")} /^isvalid/{print $2}' file
true
false
true
but it's extremely unlikely that this is the right way to approach your problem. As I mentioned in a comment, edit your question to provide more information if you'd like some real help with it.

Resources