Compare two files and combine different columns of two files together into a single file using shell - shell

I have two files file1.txt and file2.txt.
file1.txt
Amal=123=amal#gmail.com
Anil=342=anil#gmail.com
Ajith=548=ajith#gmail.com
Aravind=998=arav#gmail.com
file2.txt
Anil=Active
Amal=Active
Ajith=Inactive
Aravind=Active
Midhun=Active
I need to add an extra column in file1.txt from file2.txt mentioning whether each of them is active or inactive and also remove lines from file2.txt which are not present in file1.txt.(for example, Midhun is not present in file1.txt. So i need to remove midhun from file2.txt)
My output file should be
output.txt
Amal=123=Active
Anil=342=Active
Ajith=548=Inactive
Aravind=998=Active
I tried the following. But it is not working.
while IFS= read -r line
do
key=`echo $line | awk -F "=" '{print $1}'` < file1.txt
key2=`echo $line | awk -F "=" '{print $2}'` < file1.txt
value=`echo $line | awk -F "=" '{print $2}'` < file2.txt
echo "$key=$key2=$value"
done

EDIT: Since OP changed his requirement so adding this solution now.
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
a[$1]=$2
next
}
($1 in a){
$3=""
sub(/=$/,"")
print $0,a[$1]
}
' Input_file2 Input_file1
This should be a simple task for awk, please try following.
awk 'BEGIN{FS=OFS="="} FNR==NR{a[$1]=$2;next} ($1 in a){print $0,a[$1]}' file2 file1
Explanation: Adding detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section for this program from here.
FS=OFS="=" ##Setting FS and OFS value as = here for all lines.
} ##Closing BLOCK for BEGIN here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file is being read.
a[$1]=$2 ##Creating array a with index $1 and value $2.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition ig $1 of current line(from file1) is present in array a then do following.
print $0,a[$1] ##Printing current line and value of array a with index $1 of current line here.
}
' file2 file1 ##Mentioning Input_file names here.

No need for scripting. Sort the files and then it's a simple join.
join -t= <(sort file1.txt) <(sort file2.txt)
To comply with the OP's update, let's cut only the first two fields of file1:
join -t= <(sort file1.txt | cut -d= -f-2) <(sort file2.txt)

Related

How to compare two files and print the values of both the files which are different

There are 2 files. I need to sort them first and then compare the 2 files and then the difference I need to print the value from File 1 and File 2.
file1:
pair,bid,ask
AED/MYR,3.918000,3.918000
AED/SGD,3.918000,3.918000
AUD/CAD,3.918000,3.918000
file2:
pair,bid,ask
AUD/CAD,3.918000,3.918000
AUD/CNY,3.918000,3.918000
AED/MYR,4.918000,4.918000
Output should be:
pair,inputbid,inputask,outputbid,outtputask
AED/MYR,3.918000,3.918000,4.918000,4.918000
The only difference in 2 files is AED/MYR with different bid/ask rates. How can I print difference value from file 1 and file 2.
I tried using below commands:
nawk -F, 'NR==FNR{a[$1]=$4;a[$2]=$5;next} !($4 in a) || !($5 in a) {print $1 FS a[$1] FS a[$2] FS $4 FS $5}' file1 file2
Result output as below:
pair,bid,ask,bid,ask
AUD/CAD,3.918000,3.918000,3.918000,3.918000
AUD/CHF,3.918000,3.918000,3.918000,3.918000
AUD/CNH,3.918000,3.918000,3.918000,3.918000
AUD/CNY,3.918000,3.918000,3.918000,3.918000
AED/MYR,3.918000,3.918000,4.918000,4.918000
We are still not able to get only the difference.
Could you please try following, written and tested in GNU awk with shown samples.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" '
BEGIN{
FS=OFS=","
}
FNR==NR{
arr[$1]=$0
next
}
($1 in arr) && arr[$1]!=$0{
val=$1
$1=""
sub(/^,/,"")
if(!found){
print header
found=1
}
print arr[val],$0
}' Input_file1 Input_file2
Explanation: Adding detailed explanation for above.
awk -v header="pair,inputbid,inputask,outputbid,outtputask" ' ##Starting awk program from here and setting this to header value here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting field separator and output field separator as comma here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read.
arr[$1]=$0 ##Creating arr with index $1 and keep value as current line.
next ##next will skip all further statements from here.
}
($1 in arr) && arr[$1]!=$0{ ##Checking condition if first field is present in arr and its value NOT equal to $0
val=$1 ##Creating val which has current line value in it.
$1="" ##Nullifying irst field here.
sub(/^,/,"") ##Substitute starting , with NULL here.
if(!found){ ##Checking if found is NULL then do following.
print header ##Printing header here only once.
found=1 ##Setting found here.
}
print arr[val],$0 ##Printing arr with index of val and current line here.
}' Input_file1 Input_file2 ##Mentioning Input_files here.
With bash process substitution, then join and then choosing with awk:
# print header
printf "%s\n" "pair,inputbid,inputask,outputbid,outtputask"
# remove first line from both files, then sort them on first field
# then join them on first field and output first 5 fields
join -t, -11 -21 -o1.1,1.2,1.3,2.2,2.3 <(tail -n +2 file1 | sort -t, -k1) <(tail -n +2 file2 | sort -t, -k1) |
# output only those lines, that columns differ
awk -F, '$2 != $4 || $3 != $5'

Compare two files(file1 & file2) and add one column from from file2 to file1 if first column of two files matches

I have two files (file1 and file2)
file1
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/abc/patch142007
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/def/patch143005
DEF=14.3.0.5.SAMPLE2=git.calypso/plugins/gitiles/+/refs/heads/clientpatch/def/patch14300-calib
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/hij/patch120000sp3
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/mno/patch161028
.......(150 lines)
file2
IJK = open
ABC = closed
PQR = closed
DEF = open
HIJ = open
LMN = closed
MNO = closed
PQR = open
......(> 150 lines)
output file
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/abc/patch142007=closed
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/def/patch143005=open
DEF=14.3.0.5.SAMPLE2=git.xyz/plugins/gitiles/+/refs/heads/client/def/patch14300-calib=open
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/client/hij/patch120000sp3=open
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/client/mno/patch161028=closed
I have tried the following script. But it is not giving me any output. Not even printing anything. No errors
while IFS= read -r line
do
key1=`echo $line | awk -F "=" '{print $1}'` < file1
key2=`echo $line | awk -F "=" '{print $2}'` < file1
key3=`echo $line | awk -F "=" '{print $3}'` < file1
key4=`echo $line | awk -F "=" '{print $1}'` < file2
value3=`echo $line | awk -F "=" '{print $2}'` < file2
if [ "$key1" == "$key4" ]; then
echo "$key1=$key2=$key3=$value3"
fi
done
Giving a brief description for how the code should work.
The code should compare first columns of two files(file1 and file2). If each name matches it should give me output file as listed above. Else go to the next line. I should get output if my two files are either in sorted or unsorted format.
Helps will be appreciated. Thank you
Or another approach with awk that stores the file2 values in an array and then appends the correct state to the appropriate line in file1:
awk -F' = ' 'NR==FNR {a[$1]=$2; next} {print $0"="a[$1]}' file2 FS="=" file1
Example Use/Output
$ awk -F' = ' 'NR==FNR {a[$1]=$2; next} {print $0"="a[$1]}' file2 FS="=" file1
ABC=14.2.0.7.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/abc/patch142007=closed
DEF=14.3.0.5.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/def/patch143005=open
DEF=14.3.0.5.SAMPLE2=git.calypso/plugins/gitiles/+/refs/heads/clientpatch/def/patch14300-calib=open
HIJ=12.0.0.0.Sp3.SAMPLE3=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/hij/patch120000sp3=open
MNO=16.1.0.28.SAMPLE=git.xyz/plugins/gitiles/+/refs/heads/clientpatch/mno/patch161028=closed
Could you please try following.
awk '
BEGIN{
OFS="="
}
FNR==NR{
a[$1]=$NF
next
}
($1 in a){
print $0,a[$1]
}
' Input_file2 FS="=" Input_file1
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
OFS="=" ##Setting OFS as = here for all lines.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file2 is being read.
a[$1]=$NF ##Creating an array a with index $1 and value is last field.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if $1 of current line is present in array a then do following.
print $0,a[$1] ##Printing current line and value of array a with index $1.
}
' file2 FS="=" file1 ##Mentioning Input_file file2 and file1 and setting FS="=" for file1 here.

Replace the first column in a file with another column in different file using shell

I have two files file1 and file2
file1
Shyam=123=12.3.4.5=user#gmail.com
Shyam=123=12.2.5.4=user#gmail.com
Joshwa=234=14.3.4.67=user#gmail.com
Anil=879=15.3.4.98=user#gmail.com
Anil=765=15.4.5.65=user#gmail.com
.......
file2
Shyam=ShyamLal
Joshwa=JoshwaSam
Anil=AnilAcharya
....
"=" is mentioned as a seperator in file1 and file2.
I want to update names as given in file2. ie.,Shyam will be replaced with ShyamLal, Joshwa will be replaced with JoshwaSam and Anil will be replaced with AnilAcharya. I don't want to use if-else condition, because I have large number of datas.
My output should be like:
ShyamLal=123=12.3.4.5=user#gmail.com
ShyamLal=123=12.2.5.4=user#gmail.com
JoshwaSam=234=14.3.4.67=user#gmail.com
AnilAcharya=879=15.3.4.98=user#gmail.com
AnilAcharya=765=15.4.5.65=user#gmail.com.
I tried this. But don't know whether I am doing right
while IFS= read -r line
do
key=`echo $line | awk -F "=" '{print $1}'` < file1.txt
value=`echo $line | awk -F "=" '{print $2}' < file2.txt`
cat file1.txt | sed 's/$key/$value/g'
done
How can I proceed?
Could you please try following.
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
a[$1]=$2
next
}
($1 in a){
$1=a[$1]
}
1
' Input_file2 Input_file1
Explanation: Adding detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section here.
FS=OFS="=" ##Setting FS and OFS as = for all lines here.
} ##Closing BLOCK for BEGIN section of this program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file file2 is being read.
a[$1]=$2 ##Creating an array named a with index $1 with value of $2 of current line.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if $1 is present in array a this will be done when Input_file1 is being read.
$1=a[$1] ##Setting $1 to array a value with index $1 of current line.
}
1 ##1 will print edited/non-edited line here.
' file2 file1 ##Mentioning Input_file names here.

Condition on Nth character of string in a Mth column in bash

I have a sample
$ cat c.csv
a,1234543,c
b,1231456,d
c,1230654,e
I need to grep only numbers where 4th character of 2nd column but not be 0 or 1
Output must be
a,1234543,c
I know this only
awk -F, 'BEGIN { OFS = FS } $2 ~/^[2-9]/' c.csv
Is it possible to put a condition on 4th character?
Could you please try following.
awk 'BEGIN{FS=","} substr($2,4,1)!=0 && substr($2,4,1)!=1' Input_file
OR as per Ed site's suggestion:
awk 'BEGIN{FS=","} substr($2,4,1)!~[01]' Input_file
Explanation: Adding a detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS="," ##Setting field separator as comma here.
} ##Closing BLOCK for this program BEGIN section.
substr($2,4,1)!=0 && substr($2,4,1)!=1 ##Checking conditions if 4th character of current line is NOT 0 and 1 then print the current line.
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed or grep):
grep -vE '^([^,]*,){1}[^,]{3}[01]' file
or:
sed -E '/^([^,]*,){1}[^,]{3}[01]/d' file
Replace the 1 for the m'th-1 column and the 3 for the n'th-1 character in that column.
Grep is the answer.
But here is another way using array and variable substitution
test=( $(cat c.csv) ) # load c.csv data to an array
echo ${test[#]//*,???[0-1]*/} # print all items from an array,
# but remove the ones that correspond to this regex *,???[0-1]*
# so 'b,1231456,d' and 'c,1230654,e' from example will be removed
# and only 'a,1234543,c' will be printed
There are many ways to do this with awk. the most literal form would be:
4th character of 2nd column is not 0 or 1
$ awk -F, '($2 !~ /^...[01]/)' file
$ awk -F, '($2 ~ /^...[^01]/)' file
These will also match a line a,abcdefg,b
2nd column is an integer and 4th character is not 0 or 1
$ awk -F, '($2+0==$2) && ($2!~[.]) && ($2 !~ /^...[01]/)'
$ awk -F, '($2 ~ /^[0-9][0-9][0-9][^01][0-9]*$/)'

Compare two columns of two files and display the third column with input if it matching of not in unix

I would like to compare the first two columns of two files file1.txt and file2.txt and if they match to write to another file output.txt with the third column of both file1,file 2 along with details if it matches or not .
file1.txt
ab|2001|name1
cd|2000|name2
ef|2002|name3
gh|2003|name4
file2.txt
xy|2001|name5
cd|2000|name6
ef|2002|name7
gh|2003|name8
output.txt
name1 name5 does not match
name2 name6 matches
name3 name7 matches
name4 name8 matches
Welcome to stack overflow, could you please try following and let me know if this helps you.
awk -F"|" 'FNR==NR{a[$2]=$1;b[$2]=$3;next} ($2 in a) && ($1==a[$2]){print b[$2],$3,"matched properly.";next} {print b[$2],$3,"Does NOT matched."}' file1.txt file2.txt
EDIT: Adding a non-one liner form of solution too here.
awk -F"|" '
FNR==NR{
a[$2]=$1;
b[$2]=$3;
next
}
($2 in a) && ($1==a[$2]){
print b[$2],$3,"matched properly.";
next
}
{
print b[$2],$3,"Does NOT matched."
}
' file1.txt file2.txt
Explanation: Adding explanation for above code.
awk -F"|" ' ##Starting awk program from here and setting field separator as | here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file1.txt is being read.
a[$2]=$1; ##Creating an array with named a whose index is $2 and value is $1.
b[$2]=$3; ##Creating an array named b whose index is $2 and value is $3.
next ##next will skip all further statements from here.
} ##Closing BLOCK for FNR==NR condition here.
($2 in a) && ($1==a[$2]){ ##Checking condition if $2 is in array a AND first field is equal to value of array a value of index $2.
print b[$2],$3,"matched properly."; ##Printing array b value with index $2 and 3rd field with string value matched properly.
next ##next will skip all statements from here.
} ##Closing BLOCK for above condition here.
{
print b[$2],$3,"Does NOT matched." ##Printing value of array b with index $2 and 3rd field with string Does NOT matched here.
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You can use paste and awk to get what you want.
Below solution is assuming the fields in file1 and file2 will be always delimited by "|"
paste -d "|" file1.txt file2.txt | awk -F "|" '{ if( $1 == $4 && $2 == $5 ){print $3, $6, "matches"} else {print $3, $6, "does not match"} }' > output.txt

Resources