Unix AWK field seperator finding sum of one field group by other

Unix AWK field seperator finding sum of one field group by other - bash

I am using below awk command which is returning me unique value of parameter $11 and occurrence of it in the file as output separated by commas. But along with that I am looking for sum of parameter $14(last value) in the output. Please help me on it.
sample string in file
EXSTAT|BNK|2014|11|05|15|29|46|23169|E582754245|QABD|S|000|351
$14 is last value 351
bash-3.2$ grep 'EXSTAT|' abc.log|grep '|S|' |
awk -F"|" '{ a[$11]++ } END { for (b in a) { print b"," a[b] ; } }'
QDER,3
QCOL,1
QASM,36
QBEND,23
QAST,3
QGLBE,30
QCD,30
TBENO,1
QABD,9
QABE,5
QDCD,5
TESUB,1
QFDE,12
QCPA,3
QADT,80
QLSMR,6
bash-3.2$ grep 'EXSTAT|' abc.log
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226

Just add another associative array:
awk -F"|" '{a[$11]++;c[$11]+=$14}END{for(b in a){print b"," a[b]","c[b]}}'
tested below:
> cat temp
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226
> awk -F"|" '{a[$11]++;c[$11]+=$14}END{for(b in a){print b"," a[b]","c[b]}}' temp
QGLBE,1,424
QADT,20,5510
QASM,1,1592
QCD,1,373
>
also check the test here

You need not use grep for searching the file if it contains EXSTAT the awk can do that for you as well.
For example:
awk 'BEGIN{FS="|"; OFS=","} $1~EXSTAT && $12~S {sum[$11]+=$14; count[$11]++}END{for (i in sum) print i,count[i],sum[i]}' abc.log
for the input file abc.log with contents
EXSTAT|BNK|2014|11|05|15|29|03|23146|E582754222|QGLBE|S|000|424
EXSTAT|BNK|2014|11|05|15|29|05|23147|E582754223|QCD|S|000|373
EXSTAT|BNK|2014|11|05|15|29|12|23148|E582754224|QASM|S|000|1592
EXSTAT|BNK|2014|11|05|15|29|13|23149|E582754225|QADT|S|000|660
EXSTAT|BNK|2014|11|05|15|29|14|23150|E582754226|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|14|23151|E582754227|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|15|23152|E582754228|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|15|23153|E582754229|QADT|S|000|258
EXSTAT|BNK|2014|11|05|15|29|17|23154|E582754230|QADT|S|000|261
EXSTAT|BNK|2014|11|05|15|29|18|23155|E582754231|QADT|S|000|263
EXSTAT|BNK|2014|11|05|15|29|18|23156|E582754232|QADT|S|000|250
EXSTAT|BNK|2014|11|05|15|29|19|23157|E582754233|QADT|S|000|270
EXSTAT|BNK|2014|11|05|15|29|19|23158|E582754234|QADT|S|000|264
EXSTAT|BNK|2014|11|05|15|29|20|23159|E582754235|QADT|S|000|245
EXSTAT|BNK|2014|11|05|15|29|20|23160|E582754236|QADT|S|000|241
EXSTAT|BNK|2014|11|05|15|29|21|23161|E582754237|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|21|23162|E582754238|QADT|S|000|229
EXSTAT|BNK|2014|11|05|15|29|22|23163|E582754239|QADT|S|000|234
EXSTAT|BNK|2014|11|05|15|29|22|23164|E582754240|QADT|S|000|237
EXSTAT|BNK|2014|11|05|15|29|23|23165|E582754241|QADT|S|000|254
EXSTAT|BNK|2014|11|05|15|29|23|23166|E582754242|QADT|S|000|402
EXSTAT|BNK|2014|11|05|15|29|24|23167|E582754243|QADT|S|000|223
EXSTAT|BNK|2014|11|05|15|29|24|23168|E582754244|QADT|S|000|226
it will give an output as
QASM,1,1592
QGLBE,1,424
QADT,20,5510
QCD,1,373
What it does?
'BEGIN{FS="|"; OFS=","} excecuted before the input file is processed. It sets FS, input field seperator as | and OFS output field seperator as ,
$1~EXSTAT && $12~S{sum[$11]+=$14; count[$11]++} action is for each line
$1~EXSTAT && $12~S checks if first field is EXSTAT and 12th field is S
sum[$11]+=$14 array sum of field $14 indexed by $11
count[$11]++ array count indexed by $11
END{for (i in sum) print i,count[i],sum[i]}' excecuted at end of file, prints the content of the arrays

You can use a second array.
awk -F"|" '/EXSTAT\|/&&/\|S\|/{a[$11]++}/EXSTAT\|/{s[$11]+=$14}\
END{for(b in a)print b","a[b]","s[b];}' abc.log
Explanation
/EXSTAT\|/&&/\|S\|/{a[$11]++} on lines that contain both EXSTAT| and |S|, increment a[$11].
/EXSTAT\|/ on lines containing EXSTAT| add $14 to s[$11]
END{for(b in a)print b","a[b]","s[b];} print out all keys in array a, values of array a, and values of array s, separated by commas.

#!awk -f
BEGIN {
FS = "|"
}
$1 == "EXSTAT" && $12 == "S" {
foo[$11] += $14
}
END {
for (bar in foo)
printf "%s,%s\n", bar, foo[bar]
}

Related

Merge rows with same value and every 100 lines in csv file using command

I have a csv file like below:
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
...
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
...
I want combine the csv file to new csv file like below:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
http://www.z.com/4
...
http://www.z.com/100
",flower
"http://www.z.com/101
http://www.z.com/102
http://www.z.com/103
http://www.z.com/104
...
http://www.z.com/200
",flower
I want keep the first column every cell have max 100 lines http url.
Column two same value will appear in corresponding cell.
Is there a very simple command pattern to achieve this idea ？
I used command below:
awk '{if(NR%100!=0)ORS="\t";else ORS="\n"}1' test.csv > result.csv

$ awk -F, '$2!=p || n==100 {if(NR!=1) print "\"," p; printf "\""; p=$2; n=0}
{print $1; n+=1} END {print "\"," p}' test.csv
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
",flower
First set the field separator to the comma (-F,). Then:
If the second field changes ($2!=p) or if we already printed 100 lines in the current batch (n==100):
if it is not the first line, print a double quote, a comma, the previous second field and a newline,
print a double quote,
store the new second field in variable p for later comparisons,
reset line counter n.
For all lines print the first field and increment line counter n.
At the end print a double quote, a comma and the last value of second field.

1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
' Input_file
2nd solution: In case your Input_file is NOT sorted with 2nd column then try following sort + awk code.
sort -t, -k2 Input_file |
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
'
Output will be as follows:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3",flower

Given:
cat file
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
Here is a two pass awk to do this:
awk -F, 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
If you want to print either at the change of the $2 value or at some fixed line interval (like 100) you can do:
awk -F, -v n=100 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR || FNR%n==0{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
Either prints:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4"
,apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3"
,flower

Ignore delimiters in quotes and excluding columns dynamically in csv file

I have awk command to read the csv file with | sperator. I am using this command as part of my shell script where the columns to exclude will be removed from the output. The list of columns are input as 1 2 3
Command Reference: http://wiki.bash-hackers.org/snipplets/awkcsv
awk -v FS='"| "|^"|"$' '{for i in $test; do $(echo $i=""); done print }' test.csv
$test is 1 2 3
I want to print $1="" $2="" $3="" in front of print all columns. I am getting this error
awk: {for i in $test; do $(echo $i=""); done {print }
awk: ^ syntax error
This command is working properly which prints all the columns
awk -v FS='"| "|^"|"$' '{print }' test.csv
File 1
"first"| "second"| "last"
"fir|st"| "second"| "last"
"firtst one"| "sec|ond field"| "final|ly"
Expected output if I want to exclude the column 2 and 3 dynamically
first
fir|st
firtst one
I need help to keep the for loop properly.

With GNU awk for FPAT:
$ awk -v FPAT='"[^"]+"' '{print $1}' file
"first"
"fir|st"
"firtst one"
$ awk -v flds='1' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"first"
"fir|st"
"firtst one"
$ awk -v flds='2 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"second" "last"
"second" "last"
"sec|ond field" "final|ly"
$ awk -v flds='3 1' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"last" "first"
"last" "fir|st"
"final|ly" "firtst one"
If you don't want your output fields separated by a blank char then set OFS to whatever you do want with -v OFS='whatever'. If you want to get rid of the surrounding quotes you can use gensub() (since we're using gawk anyway) or substr() on every field, e.g.:
$ awk -v OFS=';' -v flds='1 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", substr($(f[i]),2,length($(f[i]))-2), (i<n?OFS:ORS)}' file
first;last
fir|st;last
firtst one;final|ly
$ awk -v OFS=';' -v flds='1 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", gensub(/"/,"","g",$(f[i])), (i<n?OFS:ORS)}' file
first;last
fir|st;last
firtst one;final|ly

In GNU awk (for FPAT):
$ test="2 3" # fields to exclude in bash var $test
$ awk -v t="$test" ' # taken to awk var t
BEGIN { # first
FPAT="([^|]+)|( *\"[^\"]+\")" # instead of FS, use FPAT
split(t,a," ") # process t to e:
for(i in a) # a[1]=2 -> e[2], etc.
e[a[i]]
}
{
for(i=1;i<=NF;i++) # for each field
if((i in e)==0) { # if field # not in e
gsub(/^\"|\"$/,"",$i) # remove leading and trailing "
b=b (b==""?"":OFS) $i # put to buffer b
}
print b; b="" # putput and reset buffer
}' file
first
fir|st
firtst one
FPAT is used as FS can't handle separator in quotes.

Vikram, if your actual Input_file is DITTO same as shown sample Input_file then following may help you in same. I will add explanation shortly too here(tested this with GNU awk 3.1.7 little old version of awk).
awk -v num="2,3" 'BEGIN{
len=split(num, val,",")
}
{while($0){
match($0,/.[^"]*/);
if(substr($0,RSTART,RLENGTH+1) && substr($0,RSTART,RLENGTH+1) !~ /\"\| \"/ && substr($0,RSTART,RLENGTH+1) !~ /^\"$/ && substr($0,RSTART,RLENGTH+1) !~ /^\" \"$/){
array[++i]=substr($0,RSTART,RLENGTH+1)
};
$0=substr($0,RLENGTH+1);
};
for(l=1;l<=len;l++){
delete array[val[l]]
};
for(j=1;j<=length(array);j++){
if(array[j]){
gsub(/^\"|\"$/,"",array[j]);
printf("%s%s",array[j],j==length(array)?"":" ")
}
};
print "";
i="";
delete array
}' Input_file
EDIT1: Adding a code with explanation too here.
awk -v num="2,3" 'BEGIN{ ##creating a variable named num whose value is comma seprated values of fields which you want to nullify, starting BEGIN section here.
len=split(num, val,",") ##creating an array named val here whose delimiter is comma and creating len variable whose value is length of array val here.
}
{while($0){ ##Starting a while loop here which will run for a single line till that line is NOT getting null.
match($0,/.[^"]*/);##using match functionality which will look for matches from starting to till a " comes into match.
if(substr($0,RSTART,RLENGTH+1) && substr($0,RSTART,RLENGTH+1) !~ /\"\| \"/ && substr($0,RSTART,RLENGTH+1) !~ /^\"$/ && substr($0,RSTART,RLENGTH+1) !~ /^\" \"$/){##So RSTATR and RLENGTH are the variables which will be set when a regex is having a match in line/variable passed into match function. In this if condition I am checking 1st: value of substring of RSTART,RLENGTH+1 should not be NULL. 2nd: Then checking this substring should not be having " pipe space ". 3rd condition: Checking if substring is NOT equal to a string which starts from " and ending with it. 4th condition: Checking here if substring is NOT equal to ^" space "$, if all conditions are TRUE then do following actions.
array[++i]=substr($0,RSTART,RLENGTH+1) ##creating an array named array whose index is variable i with increasing value of i and its value is substring of RSTART to till RLENGTH+1.
};
$0=substr($0,RLENGTH+1);##Now removing the matched part from current line which will decrease the length of line and avoid the while loop to become as infinite.
};
for(l=1;l<=len;l++){##Starting a loop here once while above loop is done which runs from starting of variable l=1 to value of len.
delete array[val[l]] ##Deleting here those values which we want to REMOVE from OPs request, so removing here.
};
for(j=1;j<=length(array);j++){##Start a for loop from the value of j=1 till the value of lengthh of array.
if(array[j]){ ##Now making sure array value whose index is j is NOT NULL, if yes then perform following statements.
gsub(/^\"|\"$/,"",array[j]); ##Globally substituting starting " and ending " with NULL in value of array value.
printf("%s%s",array[j],j==length(array)?"":" ") ##Now printing the value of array and secondly printing space or null depending upon if j value is equal to array length then print NULL else print space. It is because we don not want space at the last of the line.
}
};
print ""; ##Because above printf will NOT print a new line, so printing a new line.
i=""; ##Nullifying variable i here.
delete array ##Deleting array here.
}' Input_file ##Mentioning Input_file here.

bash - In a csv file, get the last column of all duplicate and concatenate to first occurence

I have a sorted csv file, where several entries are duplicates, except for the last column. How can I concatenate all the last columns to the first occurrence of each entry ?
Input:
Test1,123,somestuff
Test1,123,differentstuff
Test2,345,otherstuff
Output:
Test1,123,somestuff, differentstuff
Test2,345,otherstuff
EDIT:
Obtaining the last column is easy (cut -d, -f3 test.csv); now I need to add it to every first occurence of an entry.

Use awk utility:
awk -F, '{ k=$1 FS $2; a[k] = (k in a)? a[k] FS $3 : $3 }
END{ for(i in a) print i,a[i] }' OFS=',' csvfile
The output:
Test1,123,somestuff,differentstuff
Test2,345,otherstuff
-F, - field separator
k=$1 FS $2 - associative array key (grouping records by the first 2 field values)

Compare two columns of different files and add new column if it matches

I would like to compare the first two columns of two files, if matched need to print yes else no.
input.txt
123,apple,type1
123,apple,type2
456,orange,type1
6567,kiwi,type2
333,banana,type1
123,apple,type2
qualified.txt
123,apple,type4
6567,kiwi,type2
output.txt
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
I was using the below command for split the data, and then i will add one more column based on the result.
Now the the input.txt has duplicate(1st column) so the below method is not working, also the file size was huge.
Can we get the output.txt in awk one liner?
comm -2 -3 input.txt qualified.txt

$ awk -F, 'NR==FNR {a[$1 FS $2];next} {print $0 FS (($1 FS $2) in a?"yes":"no")}' qual input
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
Explained:
NR==FNR { # for the first file
a[$1 FS $2];next # aknowledge the existance of qualified 1st and 2nd field pairs
}
{
print $0 FS ($1 FS $2 in a?"yes":"no") # output input row and "yes" or "no"
} # depending on whether key found in array a
No need to redefine the OFS as $0 isn't modified and doesn't get rebuilt.

You can use awk logic for this as below. Not sure why do you mention one-liner awk command though.
awk -v FS="," -v OFS="," 'FNR==NR{map[$1]=$2;next} {if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1' qualified.txt input.txt
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes
The logic is
The command FNR==NR parses the first file qualified.txt and stores the entries in column 1 and 2 in first file with first column being the index.
Then for each of the line in 2nd file {if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1 the entry in column 1 does not match the array, append the no string and yes otherwise.
-v FS="," -v OFS="," are for setting input and output field separators

It looks like all you need is:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next} {print $0, ($1 in a ? "yes" : "no")}' qualified.txt output.txt

Iterate through list in bash and run multiple grep commands

I would like to iterate through a list and grep for the items, then use awk to pull out important information from each grep result. (This is the way I thought to do it, but awk and grep aren't necessary if there is a better way).
The input file contains a number of lines that looks similar to this:
chr1 12345 . A G 3e-12 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1X;
I have a number of locations that should match some part of the second column.
locList="123, 789"
And for each matching location I would like to get the information from columns 4 and 5 and write them to an output file with the corresponding location.
So the output for the above list should be:
123 A G
Something like this is what I'm thinking:
for i in locList; do
grep i inputFile.txt | awk '{print $2,$4,$5}'
done

Invoking grep/awk once per location will be highly inefficient. You want to invoke a single command that will do your parsing. For example, awk:
awk -v locList="12345 789" '
BEGIN {
# parse the location list, and create an array where
# the locations are the array indexes
n = split(locList, a)
for (i=1; i<=n; i++) locations[a[i]] = 1
}
$2 in locations {print $2, $4, $5}
' file
revised requirements
awk -v locList="123 789" '
BEGIN { n = split(locList, patterns) }
{
for (i=1; i<=n; i++) {
if ($2 ~ "^" patterns[i]) {
print $2, $4, $5
break
}
}
}
' file
The ~ operator is the regular expression matching operator.
That will output 12345 A G from your sample input. If you just want to output 123 A G then print patterns[i] instead of $2.

awk -v locList='123|789' '$2~"^("locList")" {print $2,$4,$5}' file
or if you prefer:
locList='123, 789'
awk -v locList="^(${locList//, /|})" '$2~locList {print $2,$4,$5}' file
or whatever other permutation you like. The point is you don't need a loop at all - just create a regexp from the list of numbers in locList and test that regexp once.

What I would do :
locList="123 789"
for i in $locList; do awk -vvar=$i '$2 ~ var{print $4, $5}' file; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unix AWK field seperator finding sum of one field group by other - bash

#!awk -f BEGIN { FS = "|" } $1 == "EXSTAT" && $12 == "S" { foo[$11] += $14 } END { for (bar in foo) printf "%s,%s\n", bar, foo[bar] }

Related

Merge rows with same value and every 100 lines in csv file using command

Ignore delimiters in quotes and excluding columns dynamically in csv file

bash - In a csv file, get the last column of all duplicate and concatenate to first occurence

Compare two columns of different files and add new column if it matches

Iterate through list in bash and run multiple grep commands

Categories

Resources