Vlookup using awk command - bash

I have two files in my linux server.
File 1
9190784
9197256
9170546
9184139
9196854
File 2
S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP
I want to write a script or a single line command in bash using awk command, so as whatever the element in File -1 should match the same with column 1 in File -2 and print Column 1, Column2 and Column3. Also if any entry is not found it should print entry from file 1 and print NA in Column 2 and Column 3
Output : it should redirect the output to a new file as below.
new_file
9190784,TGM,AP
9197256,NA,NA
9170546,NA,NA
9184139,AGM,KN
9196854,TGM,AP
I hope the query is understandable. Anyone please help me on the same.

standard join operation with awk
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[$2]=$3 OFS $4; next}
{print $1, (($1 in a)?a[$1]:"NA" OFS "NA")} file2 file1
substring variation (not tested)
$ awk 'BEGIN {FS=OFS=","}
NR==FNR {a[substr($2,1,7)]=$3 OFS $4; next}
{key=substr($1,1,7);
print $1, ((key in a)?a[key]:"NA" OFS "NA")} file2 file1

Does it have to be awk? It's done with join:
Having two files:
echo '9190784
9197256
9170546
9184139
9196854' >file2
echo 'S NO.,Column1,Column2,Column3
72070,9196854,TGM,AP
72071,9172071,BGM,MP
72072,9184139,AGM,KN
72073,9172073,TGM,AP' > file1
One can join the on , as separator on the second field from the first file1 -12 with removed the first header line tail -n +2 and sorted using the second field sort -t, -k2 with the first field from the second file -21 sorted sort.
join -t, -12 -21 -o1.2,1.3,1.4 <(tail -n +2 file1 | sort -t, -k2) <(sort file2)
will output:
9184139,AGM,KN
9196854,TGM,AP

Related

Write specific columns of files into another files, Who can give me a more concise solution?

I have a troublesome problem about writing specific columns of the file into another file, more details are I have the file1 like below, I need to write the first columns exclude the first row to file2 with one line and separated with '|' sign. And now I have a solution by sed and awk, this missing last step inserts into the top of file2, even though I still believe there should be some more concise solution on account of powerful of awk、sed, etc. So, Who can offer me another more concise script?
sed '1d;s/ .//' ./file1 | awk '{printf "%s|", $1; }' | awk '{if (NR != 0) {print substr($1, 1, length($1) - 1)}}'
file1:
col_name data_type comment
aaa string null
bbb int null
ccc int null
file2:
xxx ccc(whatever is this)
The result of file2 should be this :
aaa|bbb|ccc
xxx ccc(whatever is this)
Assuming there's no whitespace in the column 1 data, in increasing length:
sed -i "1i$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')" file2
or
ed file2 <<END
1i
$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')
.
wq
END
or
{ awk 'NR > 1 {print $1}' file1 | paste -sd '|'; cat file2; } | sponge file2
or
mapfile -t lines < <(tail -n +2 file1)
col1=( "${lines[#]%%[[:blank:]]*}" )
new=$(IFS='|'; echo "${col1[*]}"; cat file2)
echo "$new" > file2
This might work for you (GNU sed):
sed -z 's/[^\n]*\n//;s/\(\S*\).*/\1/mg;y/\n/|/;s/|$/\n/;r file2' file1
Process file1 "wholemeal" by using the -z command line option.
Remove the first line.
Remove all columns other than the first.
Replace newlines by |'s
Replace the last | by a newline.
Append file2.
Alternative using just command line utils:
tail +2 file1 | cut -d' ' -f1 | paste -s -d'|' | cat - file2
Tail file1 from line 2 onwards.
Using the results from the tail command, isolate the first column using a space as the column delimiter.
Using the results from the cut command, serialize each line into one, delimited by |',s.
Using the results from the paste, append file2 using the cat command.
I'm learning awk at the moment.
awk 'BEGIN{a=""} {if(NR>1) a = a $1 "|"} END{a=substr(a, 1, length(a)-1); print a}' file1
Edit: Here's another version that uses an array:
awk 'NR > 1 {a[++n]=$1} END{for(i=1; i<=n; ++i){if(i>1) printf("|"); printf("%s", a[i])} printf("\n")}' file1
Here is a simple Awk script to merge the files as per your spec.
awk '# From the first file, merge all lines except the first
NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
# We are in the second file; add a newline after data from first file
FNR == 1 { printf "\n" }
# Simply print all lines from file2
1' file1 file2
The NR==FNR condition is true when we are reading the first input file: The overall line number NR is equal to the line number within the current file FNR. The final 1 is a common idiom for printing all input lines which make it this far into the script (the next in the first block prevent lines from the first file to reaching this far).
For conciseness, you can remove the comments.
awk 'NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
FNR == 1 { printf "\n" } 1' file1 file2
Generally speaking, Awk can do everything sed can do, so piping sed into Awk (or vice versa) is nearly always a useless use of sed.

Compare columns in two text files and match lines

I want to compare the second column (delimited by a whitespace) in file1:
n01443537/n01443537_481.JPEG n01443537
n01629819/n01629819_420.JPEG n01629819
n02883205/n02883205_461.JPEG n02883205
With the second column (delimited by a whitespace) in file2:
val_8447.JPEG n09256479
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
val_8480.JPEG n03089624
If there is a match, I would like to print out the corresponding line of file2.
Desired output in this example:
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
I tried the following, but the output file is empty:
awk -F' ' 'NR==FNR{c[$2]++;next};c[$2] > 0' file1.txt file2.txt > file3.txt
Also tried this, but the result was the same (empty output file):
awk 'NR==FNR{a[$2];next}$2 in a' file1 file2 > file3.txt
GNU join exists for this purpose.
join -o "2.1 2.2" -j 2 <(sort -k 2 file1) <(sort -k 2 file2)
Using awk:
awk 'FNR==NR{a[$NF]; next} $NF in a' file1 file2
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
Here is a grep alternative with process substitution:
grep -f <(awk '{print " " $NF "$"}' file1) file2
Using print " " $NF "$" to create a regex like " n01443537$" so that we match only last column in grep.

sed: to merge contents of one CSV to another based on variable match

Attempting to correlate GEO IP data from one CSV to an access log of another.
Sample lines of data:
CSV1
Bob,App1,8-Jan-15,8.8.8.8
April,App3,2-Jan-15,5.5.5.5
George,App2,1-Feb-15,8.8.8.8
CSV2
8.8.8.8,US,United States,CA,California,Mountain View,94040,America/Los_Angeles
5.5.5.5,US,United States,FL,Florida,Miami
I want to search CSV1 for any IP listed in in CSV2 and append fields 1,2,4 to CSV1 when the IP matches.
So far I have, but I'm getting errors I believe at the SED portion.
#!/bin/bash
for LINE in $( cat CSV2 | awk -F',' '{print $1 "," $2 "," $4}' )
do
$IP = $( echo $LINE | cut -d, -f1 )
sed -i.bak "s/"$IP/\""$LINE\"" CSV1
done
Desired Output:
Bob,App1,8-Jan-15,8.8.8.8,United States,CA
Dawn,App3,2-Jan-15,5.5.5.5,United States,FL
George,App2,1-Feb-15,8.8.8.8,United States,CA
Using the join command:
$ join -t , -1 4 -2 1 -o 1.1,1.2,1.3,1.4,2.3,2.4 <(sort -t, -k4,4 CSV1) <(sort -t, CSV2)
Bob,App1,8-Jan-15,8.8.8.8,United States,CA
Using sort is overkill here, but for >1 line files, join requires the files to be sorted on the join key
With awk
$ awk -F, -v OFS=, 'NR == FNR {a[$1] = $3 OFS $4; next} $4 in a {print $0, a[$4]}' CSV2 CSV1
Bob,App1,8-Jan-15,8.8.8.8,United States,CA

Values missing in awk

My Input files :
file1
231|35000
234|15000
242|60000
254|12313
345|50000
435|24300
file2
1|madhan|retl|231|tcs
2|vaisakh|retl|234|tcs
4|sam|ins|242|infy
5|tina|bfs|254|tcs
3|ram|bfs|345|infy
6|subbu|bfs|435|infy
Ouput :
Trying to get
col1 , col2 of file1 and col2 of file2 based on common column(col1 of file1 and col4 of file2)
My code :
awk 'BEGIN { FS="|";} NR==FNR{a[$1] = $2;next} ($4 in a) {print $2 "|" $4 "|" a[$1]} ' file_1 file_2
O/p i got:
madhan|231|
vaisakh|234|
sam|242|
tina|254|
ram|345|
subbu|435|
Can you help why last col is coming as spaces
Try something like:
join -t '|' -1 1 -2 4 file1 file2 | awk -F'|' '{print $1 "|" $2 "|" $4}'
Join on field 1 from file1 and field 4 on file 2 and extract fields what you need using awk.
This should do:
awk -F\| 'FNR==NR {a[$1]=$0;next} {for (i in a) if (i==$4) print a[i]"|"$2}' file1 file2
231|35000|madhan
234|15000|vaisakh
242|60000|sam
254|12313|tina
345|50000|ram
435|24300|subbu
It store file1 in array a using first field as index.
Then it test index in first file against fourth field in file2.
If they are equal, print data from file1 and second field from file2.
It is coming up blank because the key does not exist in the array. You are storing first column of file1 as key which is 4th column of file2.
$ awk '
BEGIN { FS=OFS="|" }
NR==FNR { a[$1]=$2; next }
($4 in a) { print $2, $4, a[$4] }
' file1 file2
madhan|231|35000
vaisakh|234|15000
sam|242|60000
tina|254|12313
ram|345|50000
subbu|435|24300
If you need the order stated in your requested O/P then
$ awk 'BEGIN {FS=OFS="|"}NR==FNR{a[$4]=$2;next} ($1 in a) {print $0, a[$1]}' file2 file1
231|35000|madhan
234|15000|vaisakh
242|60000|sam
254|12313|tina
345|50000|ram
435|24300|subbu

using AWK to join certain lines of two files and then sort the merged file

I use the following AWK command to sort the contents after the first 24 lines of a file:
awk 'NR <= 24; NR > 24 {print $0 | "sort -k3,3 -k4,4n"}' file > newFile
Now I want to join two files first (now simply discard the first 24 lines for both files) and then sort the merged file. Is there any way to do it without generating a temporary merged file?
awk 'FNR > 24' file1 file2 | sort -k3,3 -k4,4n > newFile
FNR is the file record number (resets to 1 for the first line of each file). If you insist on having the sort inside the awk script, you can use:
awk 'FNR > 24 { print $0 | "sort -k3,3 -k4,4n" }' file1 file2 > newFile
but I prefer the shell to do my piping.

Resources