Match column 1 of CSV, and then check if column 2 matches - bash

I currently have a Bash script that scrapes particular info from access logs and writes them to a CSV in the following format:
0004F2426702,75.214.224.151,16/Apr/2020
0004F2426702,75.214.224.151,17/Apr/2020
0004F2426702,75.214.224.151,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020
00085D19F072,75.214.224.151,16/Apr/2020
00085D20A469,75.214.224.151,16/Apr/2020
0018B9FFDD58,75.214.224.151,16/Apr/2020
64167F801BF5,81.97.142.178,16/Apr/2020
64167F801BF5,95.97.142.178,18/Apr/2020
0004F2426702,80.111.224.252,19/Apr/2020
But, now I am stuck!
I want to match on column 1 (the MAC address), and then check to see if column two matches. If not, print all the lines where column 1 matched.
The purpose of this script is to spot if the source IP has changed.

Using my favorite tool, GNU datamash to do most of the work of grouping and counting the data:
$ datamash -st, -g1,2 unique 3 countunique 3 < input.csv | awk 'BEGIN {FS=OFS=","} $NF > 1 { NF--; print }'
0004F2426702,75.214.224.151,16/Apr/2020,17/Apr/2020,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020,19/Apr/2020
Pure awk:
$ awk 'BEGIN { FS = OFS = SUBSEP = "," }
{ if (++seen[$1,$2] == 1) dates[$1,$2] = $3; else dates[$1,$2] = dates[$1,$2] "," $3 }
END { for (macip in seen) if (seen[macip] > 1) print macip, dates[macip] }' input.csv
0004F2426702,75.214.224.151,16/Apr/2020,17/Apr/2020,18/Apr/2020
0004F2426702,80.111.224.252,18/Apr/2020,19/Apr/2020

Related

awk to get first column if the a specific number in the line is greater than a digit

I have a data file (file.txt) contains the below lines:
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=22:00,dom=sss.co.uk,user2=lis
I'm expecting to get the first column ($1) only if the ETA= number is greater than 15, like here I will have 2nd and 3rd line first column only is expected.
345
456
I tried like cat file.txt | awk -F [,TPF=]' '{print $1}' but its print whole line which has ETA at the end.
Using awk
$ awk -F"[=, ]" '{for (i=1;i<NF;i++) if ($i=="ETA") if ($(i+1) > 15) print $1}' input_file
345
456
With your shown samples please try following GNU awk code. Using match function of GNU awk where I am using regex (^[0-9]+).*ETA=([0-9]+):[0-9]+ which creates 2 capturing groups and saves its values into array arr. Then checking condition if 2nd element of arr is greater than 15 then print 1st value of arr array as per requirement.
awk '
match($0,/(^[0-9]+).*\<ETA=([0-9]+):[0-9]+/,arr) && arr[2]+0>15{
print arr[1]
}
' Input_file
I would harness GNU AWK for this task following way, let file.txt content be
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=02:00,dom=sss.co.uk,user2=lis
then
awk 'substr($0,index($0,"ETA=")+4,2)+0>15{print $1}' file.txt
gives output
345
Explanation: I use String functions, index to find where is ETA= then substr to get 2 characters after ETA=, 4 is used as ETA= is 4 characters long and index gives start position, I use +0 to convert to integer then compare it with 15. Disclaimer: this solution assumes every row has ETA= followed by exactly 2 digits.
(tested in GNU Awk 5.0.1)
Whenever input contains tag=value pairs as yours does, it's best to first create an array of those mappings (v[]) below and then you can just access the values by their tags (names):
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
v["ETA"]+0 > 15 {
print $1
}
$ awk -f tst.awk file
345
456
With that approach you can trivially enhance the script in future to access whatever values you like by their names, test them in whatever combinations you like, output them in whatever order you like, etc. For example:
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
(v["pro"] ~ /b/) && (v["ETA"]+0 > 15) {
print $1, v["team"], v["dom"]
}
$ awk -f tst.awk file
345,abc,sbc.int
456,efg,sss.co.uk
Think about how you'd enhance any other solution to do the above or anything remotely similar.
It's unclear why you think your attempt would do anything of the sort. Your attempt uses a completely different field separator and does not compare anything against the number 15.
You'll also want to get rid of the useless use of cat.
When you specify a column separator with -F that changes what the first column $1 actually means; it is then everything before the first occurrence of the separator. Probably separately split the line to obtain the first column, space-separated.
awk -F 'ETA=' '$2 > 15 { split($0, n, /[ \t]+/); print n[1] }' file.txt
The value in $2 will be the data after the first separator (and up until the next one) but using it in a numeric comparison simply ignores any non-numeric text after the number at the beginning of the field. So for example, on the first line, we are actually literally checking if 12:00, team=xyz,user1=tom,dom=dby.com is larger than 15 but it effectively checks if 12 is larger than 15 (which is obviously false).
When the condition is true, we split the original line $0 into the array n on sequences of whitespace, and then print the first element of this array.
Using awk you could match ETA= followed by 1 or more digits. Then get the match without the ETA= part and check if the number is greater than 15 and print the first field.
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4)+0 > 15) print $1
}' file
Output
345
456
If the first field should start with a number:
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4) > 15)+0 print $1
}' file

Use an array created using awk as a variable in another awk script

I am trying to use awk to extract data using a conditional statement containing an array created using another awk script.
The awk script I use for creating the array is as follows:
array=($(awk 'NR>1 { print $1 }' < file.tsv))
Then, to use this array in the other awk script
awk var="${array[#]}" 'FNR==1{ for(i=1;i<=NF;i++){ heading[i]=$i } next } { for(i=2;i<=NF;i++){ if($i=="1" && heading[i] in var){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt
However, when I run this, the following error occurs.
awk: fatal: cannot open file 'foo' for reading (No such file or directory)
I've already looked at multiple posts on why this error occurs and on how to correctly implement a shell variable in awk, but none of these have worked so far. However, when removing the shell variable and running the script it does work.
awk 'FNR==1{ for(i=1;i<=NF;i++){ heading[i]=$i } next } { for(i=2;i<=NF;i++){ if($i=="1"){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt
I really need that conditional statement but don't know what I am doing wrong with implementing the bash variable in awk and would appreciate some help.
Thx in advance.
That specific error messages is because you forgot -v in front of var= (it should be awk -v var=, not just awk var=) but as others have pointed out, you can't set an array variable on the awk command line. Also note that array in your code is a shell array, not an awk array, and shell and awk are 2 completely different tools each with their own syntax, semantics, scopes, etc.
Here's how to really do what you're trying to do:
array=( "$(awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv)" )
awk -v xyz="${array[*]}" '
BEGIN{ split(xyz,tmp,RS); for (i in tmp) var[tmp[i]] }
... now use `var` as you were trying to ...
'
For example:
$ cat file.tsv
col1 col2
a b c d e
f g h i j
$ cat -T file.tsv
col1^Icol2
a b^Ic d e
f g h^Ii j
$ awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv
a b
f g h
$ array=( "$(awk 'BEGIN{FS=OFS="\t"} NR>1 { print $1 }' < file.tsv)" )
$ awk -v xyz="${array[*]}" '
BEGIN {
split(xyz,tmp,RS)
for (i in tmp) {
var[tmp[i]]
}
for (idx in var) {
print "<" idx ">"
}
}
'
<f g h>
<a b>
It's easier and more efficient to process both files in a single awk:
edit: fixed issues in comment, thanks #EdMorton
awk '
FNR == NR {
if ( FNR > 1 )
var[$1]
next
}
FNR == 1 {
for (i = 1; i <= NF; i++)
heading[i] = $i
next
}
{
for (i = 2; i <= NF; i++)
if ( $i == "1" && heading[i] in var) {
outFile = heading[i] ".txt"
print ">kmer" (NR-1) "\n" $1 >> (outFile)
close(outFile)
}
}
' file.tsv input.txt
You might store string in variable, then use split function to turn that into array, consider following simple example, let file1.txt content be
A B C
D E F
G H I
and file2.txt content be
1
3
2
then
var1=$(awk '{print $1}' file1.txt)
awk -v var1="$var1" 'BEGIN{split(var1,arr)}{print "First column value in line number",$1,"is",arr[$1]}' file2.txt
gives output
First column value in line number 1 is A
First column value in line number 3 is G
First column value in line number 2 is D
Explanation: I store output of 1st awk command, which is then used as 1st argument to split function in 2nd awk command. Disclaimer: this solutions assumes all files involved have delimiter compliant with default GNU AWK behavior, i.e. one-or-more whitespaces is always delimiter.
(tested in gawk 4.2.1)

How to assign awk result variable to an array and is it possible to use awk inside another awk in loop

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:
id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.
I need to format it this way: name and surname must start with a capital letter
add an email record that consists of the first letter of the name and full surname in lowercase
create a new csv with records from the old csv with corrected fields.
I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).
#!/bin/bash
input="$HOME/test.csv"
exec 0<$input
while read line; do
awk -v FPAT='"[^"]*"|[^,]*' '{
...
}' $input)
done
inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):
#it doesn't work
IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv
# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ?
# as an example:
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
$5="${name_surname[0]}' '${name_surname[1]}"
email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='#domain'
$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv
how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?
function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[#]})
echo $result >> new.csv
This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
n = split($5,name,/\s*/)
$7 = tolower(substr(name[1],1,1) name[n]) "#example.com"
print
}
' "${#:--}"
$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname#example.com,
2,1,,,name Surname,department1,nsurname#example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname#example.com,
I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.
Completely working answer by Ed Morton.
If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file
#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|\"[^\"]*\""
}
(NR == 1) {print} #header of csv
(NR > 1) {
if (length($0) > 1) { #exclude empty lines
count = 0
n = split($5,name,/\s*/)
email_local_part = tolower(substr(name[1],1,1) name[n])
#array stores emails from csv file
a[i++] = email_local_part
#find amount of occurrences of the same email address
for (el in a) {
ret=match(a[el], email_local_part)
if (ret == 1) { count++ }
}
#add number of occurrence to email address
if (count == 1) { $7 = email_local_part "#abc.com" }
else { --count; $7 = email_local_part count "#abc.com" }
print
}
}
' "${#:--}" > new.csv

Splitting a large, complex one column file into several columns with awk

I have a text file produced by some commercial software, looking like below. It consists in brackets delimited sections, each of which counts several million elements but the exact value changes from one case to another.
(1
2
3
...
)
(11
22
33
...
)
(111
222
333
...
)
I need to achieve an output like:
1; 11; 111
2; 22; 222
3; 33; 333
... ... ...
I found a complicated way that is:
perform sed operations to get
1
2
3
...
#
11
22
33
...
#
111
222
333
...
use awk as follows to split my file in several sub-files
awk -v RS="#" '{print > ("splitted-" NR ".txt")}'
remove white spaces from my subfiles again with sed
sed -i '/^[[:space:]]*$/d' splitted*.txt
join everything together:
paste splitted*.txt > out.txt
add a field separator (defined in my bash script)
awk -v sep=$my_sep 'BEGIN{OFS=sep}{$1=$1; print }' out.txt > formatted.txt
I feel this is crappy as I loop over million lines several time.
Even if the return time is quite OK (~80sec), I'd like to find a full awk solution but can't get to it.
Something like:
awk 'BEGIN{RS="(\\n)"; OFS=";"} { print something } '
I found some related questions, especially this one row to column conversion with awk, but it assumes a constant number of lines between brackets which I can't do.
Any help would be appreciated.
With GNU awk for multi-char RS and true multi dimensional arrays:
$ cat tst.awk
BEGIN {
RS = "(\\s*[()]\\s*)+"
OFS = ";"
}
NR>1 {
cell[NR][1]
split($0,cell[NR])
}
END {
for (rowNr=1; rowNr<=NF; rowNr++) {
for (colNr=2; colNr<=NR; colNr++) {
printf "%6s%s", cell[colNr][rowNr], (colNr<NR ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
1; 11; 111
2; 22; 222
3; 33; 333
...; ...; ...
If you know you have 3 columns, you can do it in a very ugly way as following:
pr -3ts <file>
All that needs to be done then is to remove your brackets:
$ pr -3ts ~/tmp/f | awk 'BEGIN{OFS="; "}{gsub(/[()]/,"")}(NF){$1=$1; print}'
1; 11; 111
2; 22; 222
3; 33; 333
...; ...; ...
You can also do it in a single awk line, but it just complicates things. The above is quick and easy.
This awk program does the full generic version:
awk 'BEGIN{r=c=0}
/)/{r=0; c++; next}
{gsub(/[( ]/,"")}
(NF){a[r++,c]=$1; rm=rm>r?rm:r}
END{ for(i=0;i<rm;++i) {
printf a[i,0];
for(j=1;j<c;++j) printf "; " a[i,j];
print ""
}
}' <file>
Could you please try following once, considering that your actual Input_file is same as shown samples.
awk -v RS="" '
{
gsub(/\n|, /,",")
}
1' Input_file |
awk '
{
while(match($0,/\([^\)]*/)){
value=substr($0,RSTART+1,RLENGTH-2)
$0=substr($0,RSTART+RLENGTH)
num=split(value,array,",")
for(i=1;i<=num;i++){
val[i]=val[i]?val[i] OFS array[i]:array[i]
}
}
for(j=1;j<=num;j++){
print val[j]
}
delete val
delete array
value=""
}' OFS="; "
OR(above script is considering that numbers inside (...) will be constant, now adding script which will working even field numbers of not equal inside (....).
awk -v RS="" '
{
gsub(/\n/,",")
gsub(/, /,",")
}
1' Input_file |
awk '
{
while(match($0,/\([^\)]*/)){
value=substr($0,RSTART+1,RLENGTH-2)
$0=substr($0,RSTART+RLENGTH)
num=split(value,array,",")
for(i=1;i<=num;i++){
val[i]=val[i]?val[i] OFS array[i]:array[i]
max=num>max?num:max
}
}
for(j=1;j<=max;j++){
print val[j]
}
delete val
delete array
}' OFS="; "
Output will be as follows.
1; 11; 111
2; 22; 222
3; 33; 333
Explanation: Adding explanation for above code here.
awk -v RS="" ' ##Setting RS(record separator) as NULL here.
{ ##Starting BLOCK here.
gsub(/\n/,",") ##using gsub to substitute new line OR comma with space with comma here.
gsub(/, /,",")
}
1' Input_file | ##Mentioning 1 will be printing edited/non-edited line of Input_file. Using | means sending this output as Input to next awk program.
awk ' ##Starting another awk program here.
{
while(match($0,/\([^\)]*/)){ ##Using while loop which will run till a match is FOUND for (...) in lines.
value=substr($0,RSTART+1,RLENGTH-2) ##storing substring from RSTART+1 to till RLENGTH-1 value to variable value here.
$0=substr($0,RSTART+RLENGTH) ##Re-creating current line with substring valeu from RSTART+RLENGTH till last of line.
num=split(value,array,",") ##Splitting value variable into array named array whose delimiter is comma here.
for(i=1;i<=num;i++){ ##Using for loop which runs from i=1 to till value of num(length of array).
val[i]=val[i]?val[i] OFS array[i]:array[i] ##Creating array val whose index is value of variable i and concatinating its own values.
}
}
for(j=1;j<=num;j++){ ##Starting a for loop from j=1 to till value of num here.
print val[j] ##Printing value of val whose index is j here.
}
delete val ##Deleting val here.
delete array ##Deleting array here.
value="" ##Nullifying variable value here.
}' OFS="; " ##Making OFS value as ; with space here.
NOTE: This should work for more than 3 values inside (...) brackets also.
awk 'BEGIN { RS = "\\s*[()]\\s*"; FS = "\\s*" }
NF > 0 {
maxCol++
if (NF > maxRow)
maxRow = NF
for (row = 1; row <= NF; row++)
a[row,maxCol] = $row
}
END {
for (row = 1; row <= maxRow; row++) {
for (col = 1; col <= maxCol; col++)
printf "%s", a[row,col] ";"
print ""
}
}' yourFile
output
1;11;111;
2;22;222;
3;33;333;
...;...;...;
Change FS= "\\s*" to FS = "\n*" when you also want to allow spaces inside your fields.
This script supports columns of different lengths.
When benchmarking also consider replacing [i,j] with [i][j] for GNU awk. I'm unsure which one is faster and did not benchmark the script myself.
Here is the Perl one-liner solution
$ cat edouard2.txt
(1
2
3
a
)
(11
22
33
b
)
(111
222
333
c
)
$ perl -lne ' $x=0 if s/[)(]// ; if(/(\S+)/) { #t=#{$val[$x]};push(#t,$1);$val[$x++]=[#t] } END { print join(";",#{$val[$_]}) for(0..$#val) }' edouard2.txt
1;11;111
2;22;222
3;33;333
a;b;c
I would convert each section to a row and then transpose after, e.g. assuming you are using GNU awk:
<infile awk '{ gsub("[( )]", ""); $1=$1 } 1' RS='\\)\n\\(' OFS=';' |
datamash -t';' transpose
Output:
1;11;111
2;22;222
3;33;333
...;...;...

using awk to lookup and insert data

In my continuing crusade not to use MS Excel, I'd like to process some data, send it to a file, and then insert some records from a separate file into a third file using field $1 as the index. Is this possible?
I have data like this:
2600,foo,stack,1,04/02/2015,ACH Payment,ACH Settled,1500
2600,foo,stack,2,04/06/2015,Credit Card Sale,Settled,100
2600,foo,stack,3,04/07/2015,Credit Card Sale,Settled,157.13
2600,foo,stack,4,04/07/2015,ACH Credit,ACH Settled,.03
I have this to group it:
cat group.awk
#!/usr/bin/awk -f
BEGIN {
OFS = FS = ","
}
NR > 1 {
arr[$1 OFS $2 OFS $3]++
}
END {
for (key in arr)
print key, arr[key]
}
The group makes it like this:
2600,foo,stack,4
Simple multiplication is applied to fields 5, 6 and 7 where applicable--depends on fields 3.
In this example we can say the finished record looks like this:
2600,foo,stack,4,.2,19.8
Now in a separate file, I have this data:
2600,registered user,5hPASLJlHlgJR4AQc9sZQ==
basic flow is:
awk -f group.awk data.csv | awk -f math.awk > finished.csv
Then use awk (if it can do this) to look up field $1 in finished.csv and find corresponding record above in the separate file(bill.csv) and print to a third file or insert into bill.csv.
Expected output in third file(bill.csv):
x,y,,1111111,2600,,,,,,,19.8,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a
x,y,,1111111,RS,z,a will be pre-populated to I only need to insert three new records.
Is this something awk can accomplish?
Edit
Field $3 is the accountID that sets the multiplication on 5, 6 and 7.
Here's the idea:
bill.awk:
NR>1{if($3=="stack" && $4>199) $5=$4*0.03;
if($3=="stack" && $4<200) $5=$4*0.05
if($3=="user") $5=$4*.01
}1
total.awk:
awk -F, -v OFS="," 'NR>1{if($3=="stack" && $5<20) $6=20-$5;
if($3=='stack && $5>20) $6=0;}1'
This part is working and final output is like above:
2600,foo,stack,4,.2,19.8
4*.05 = .2 & 20 - .2 = 19.8
But the minimium charge is $20
So we'll correct it:
4*.05 = .2 & 20 - .2 = 20
Extra populated fields came from a separate file (bill.csv) and I need to fill in 20 to the correct record on bill.csv
bill.csv contains everything needed except the 20
before:
x,y,,1111111,2600,,,,,,,,,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a
after:
x,y,,1111111,2600,,,,,,,20,,,registered user,,,,,,,,,,RS,,,N5hPASLJlHlgJR4AQc9sZQ==,z,a
Is this a better explanaiton? Go on the assumption that group.awk, bill.awk and total.awk are working correctly. I just need to extract the correct total for field $1 and put it in bill.csv in the correct spot.
Is maybe this last awk what you need. I´ve tried to understand what you want and I think is just that merging awk way:
For explaininng: We first save the fileA in an array with the first key as the index. Then we search for each line o file B if the field1 is between the indexes of our array, and if it´s, we print all data from two files together
awk -F"," 'BEGIN {while (getline < "record.dat"){ a[$1]=$0; }} {if($1 in a){ print a[$1]","$0}}' file.dat
2600,foo,stack,4,10,10.4,2600,registered user,5hPASLJlHlgJR4AQc9sZQ==
This is the kind of solution you need:
$ cat fileA
2600,foo,stack,1,04/02/2015,ACH Payment,ACH Settled,1500
2600,foo,stack,2,04/06/2015,Credit Card Sale,Settled,100
2600,foo,stack,3,04/07/2015,Credit Card Sale,Settled,157.13
2600,foo,stack,4,04/07/2015,ACH Credit,ACH Settled,.03
2600,foo,stack,5,04/09/2015,ACH Payment,ACH Settled,147.10
$ cat fileB
2600,registered user,5hPASLJlHlgJR4AQc9sZQ==
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR{
cnts[$1][$2FS$3]++
next
}
{
for (val in cnts[$1]) {
cnt = cnts[$1][val]
print $1, val, cnt, cnt*2.5, $2, $3
}
}
$ awk -f tst.awk fileA fileB
2600,foo,stack,5,12.5,registered user,5hPASLJlHlgJR4AQc9sZQ==
but until you update your question we can't provide any more concrete help than that.
The above uses GNU awk 4.* for true 2D arrays.

Resources