I have a file with multiple delimiters, I m looking to compare the value after the first / when read from right left with another file.
code :-
awk -F'[/|]' NR==FNR{a[$3]; next} ($1 in a )' file1 file2 > output
cat file1
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
BXC/DEF/OTTA|fed|92374|
AVD/customer|FST|8736481|
FFS/T6TT/BOSTON|money|18922|
GTS/trust/YYYY|opt|62376|
XXY/IJSH/trust|opt|62376|
cat file2
customer
trust
expected output :-
AAB/BBC/customer|fed|12931|
/customer|fed|12931|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|
$ awk -F\| ' # just use one FS
NR==FNR {
a[$1]
next
}
{
n=split($1,t,/\//) # ... and use split to the 1st field
if(t[n] in a) # and compare the last split part
print
}' file2 file1
Output:
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|
If you use this [/|] you will have 2 delimiters and you will not know what the value after the last pipe was.
Reading your question, you want to compare the first value after the last slash without pipe chars.
If there has to be a / present in the string, you can set that as the field separator and check if there are at least 2 fields using NF > 1
Then take the last field using $NF, split on | and check if the first part is present in one of the values of file2 which are stored in array a
$cat file1
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
BXC/DEF/OTTA|fed|92374|
AVD/customer|FST|8736481|
FFS/T6TT/BOSTON|money|18922|
GTS/trust/YYYY|opt|62376|
XXY/IJSH/trust|opt|62376|
customer
Example code
awk -F/ '
NR==FNR {a[$1];next}
NF > 1 {
split($NF, t, "|")
if(t[1] in a) print
}
' file2 file1
Output
AAB/BBC/customer|fed|12931|
/customer|fed|982311|
AVD/customer|FST|8736481|
XXY/IJSH/trust|opt|62376|
I can't find a solution.
So here is the problem.
Result should be 100 rows (File1) with contents from File2 repeating 25 times.
What I want is to join the contents even though the number of rows is not equal. Keep repeating including lines from File2 until number of rows from File1 is met.
File1:
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
File2:
A1,B11
A2,B22
A3,B33
A4,B44
What I want is to combine the files in the following to have the following expected result:
File3:
test1#domain.com,A1,B12
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
Note here: After it finishes with the 4 rows from File2, start again from first line, then repeat.
test5#domain5.com,A1,B12
test6#domain6.com,A2,B22
test7#domain7.com,A3,B33
test8#domain8.com,A4,B44
The example in your question isn't clear but I THINK this is what you're trying to do:
$ awk -v OFS=',' 'NR==FNR{a[++n]=$0;next} {print $0, a[(FNR-1)%n+1]}' file2 file1
test1#domain.com,A1,B11
test2#domain2.com,A2,B22
test3#domain3.com,A3,B33
test4#domain4.com,A4,B44
test5#domain5.com,A1,B11
test6#domain6.com,A2,B22
The above was run against this input:
$ cat file1
test1#domain.com
test2#domain2.com
test3#domain3.com
test4#domain4.com
test5#domain5.com
test6#domain6.com
$
$ cat file2
A1,B11
A2,B22
A3,B33
A4,B44
Could you please try following.
awk '
BEGIN{
OFS=","
}
FNR==NR{
a[++count]=$0
next
}
{
count_curr++
count_curr=count_curr>count?1:count_curr
print a[count_curr],$0
}
' Input_file2 Input_file1
I have multiple files with different number of columns, i need to do a merge on first file and second file and do a left outer join in awk respective to first file and print all columns in both files matching the first column of both files.
I have tried below codes to get close to my output. But i can't print the ",', where no matching number is found in second file. Below is the code. Join needs sorting and takes more time than awk. My file sizes are big, like 30 million records.
awk -F ',' '{
if (NR==FNR){ r[$1]=$0}
else{ if($1 in r)
r[$1]=r[$1]gensub($1,"",1)}
}END{for(i in r){print r[i]}}' file1 file2
file1
number,column1,column2,..columnN
File2
numbr,column1,column2,..columnN
Output
number,file1.column1,file1.column2,..file1.columnN,file2.column1,file2.column3...,file2.columnN
file1
1,a,b,c
2,a,b,c
3,a,b,c
5,a,b,c
file2
1,x,y
2,x,y
5,x,y
6,x,y
7,x,y
desired output
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,,
5,a,b,c,x,y
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
tail = gensub(/[^,]*,/,"",1)
if ( FNR == 1 ) {
empty = gensub(/[^,]/,"","g",tail)
}
file2[$1] = tail
next
}
{ print $0, ($1 in file2 ? file2[$1] : empty) }
$ awk -f tst.awk file2 file1
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
The above uses GNU awk for gensub(), with other awks it's just one more step to do [g]sub() on the appropriate variable after initially assigning it.
An interesting (to me at least!) alternative you might want to test for a performance difference is:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
tail = gensub(/[^,]*,/,"",1)
idx[$1] = NR
file2[NR] = tail
if ( FNR == 1 ) {
file2[""] = gensub(/[^,]/,"","g",tail)
}
next
}
{ print $0, file2[idx[$1]] }
$ awk -f tst.awk file2 file1
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
but I don't really expect it to be any faster and it MAY even be slower.
you can try,
awk 'BEGIN{FS=OFS=","}
FNR==NR{d[$1]=substr($0,index($0,",")+1); next}
{print $0, ($1 in d?d[$1]:",")}' file2 file1
you get,
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
join to the rescue:
$ join -t $',' -a 1 -e '' -o 0,1.2,1.3,1.4,2.2,2.3 file1.txt file2.txt
Explanation:
-t $',': Field separator token.
-a 1: Do not discard records from file 1 if not present in file 2.
-e '': Missing records will be treated as an empty field.
-o: Output format.
file1.txt
1,a,b,c
2,a,b,c
3,a,b,c
5,a,b,c
file2.txt
1,x,y
2,x,y
5,x,y
6,x,y
7,x,y
Output
1,a,b,c,x,y
2,a,b,c,x,y
3,a,b,c,,
5,a,b,c,x,y
I am fresh to Shell or Bash. I have file1 with one column and about 5000 rows and file2 have five columns with 240k rows. How can I check whether the values of the 5000 rows in file1 within or not the second column of file2?
$wc -l file1
$5188
$wc -l file2
$240,888
You can do this with awk, something like this:
awk 'NR == FNR {a[$2] = $1; next} {if ($2 in a){print(a[$2], $1)}}' file1 file2
Basically you read the first file in and store its contents in an array "a". Then you read the second file and check if the second field of each line is contained within array "a" and print it if it is.
My answer assumes your fields are separated by white space, if they are not you will have to change the separator. So, if your fields are separated by commas, you will need:
awk -F, .....
The above syntax does work, and it can be further simplified as:
awk 'FNR==NR{a[$1]=$2; next} {print $1, a[$1]}' file2 file1