Compare two columns of two files and display the third column with input if it matching of not in unix - shell
I would like to compare the first two columns of two files file1.txt and file2.txt and if they match to write to another file output.txt with the third column of both file1,file 2 along with details if it matches or not .
file1.txt
ab|2001|name1
cd|2000|name2
ef|2002|name3
gh|2003|name4
file2.txt
xy|2001|name5
cd|2000|name6
ef|2002|name7
gh|2003|name8
output.txt
name1 name5 does not match
name2 name6 matches
name3 name7 matches
name4 name8 matches
Welcome to stack overflow, could you please try following and let me know if this helps you.
awk -F"|" 'FNR==NR{a[$2]=$1;b[$2]=$3;next} ($2 in a) && ($1==a[$2]){print b[$2],$3,"matched properly.";next} {print b[$2],$3,"Does NOT matched."}' file1.txt file2.txt
EDIT: Adding a non-one liner form of solution too here.
awk -F"|" '
FNR==NR{
a[$2]=$1;
b[$2]=$3;
next
}
($2 in a) && ($1==a[$2]){
print b[$2],$3,"matched properly.";
next
}
{
print b[$2],$3,"Does NOT matched."
}
' file1.txt file2.txt
Explanation: Adding explanation for above code.
awk -F"|" ' ##Starting awk program from here and setting field separator as | here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file1.txt is being read.
a[$2]=$1; ##Creating an array with named a whose index is $2 and value is $1.
b[$2]=$3; ##Creating an array named b whose index is $2 and value is $3.
next ##next will skip all further statements from here.
} ##Closing BLOCK for FNR==NR condition here.
($2 in a) && ($1==a[$2]){ ##Checking condition if $2 is in array a AND first field is equal to value of array a value of index $2.
print b[$2],$3,"matched properly."; ##Printing array b value with index $2 and 3rd field with string value matched properly.
next ##next will skip all statements from here.
} ##Closing BLOCK for above condition here.
{
print b[$2],$3,"Does NOT matched." ##Printing value of array b with index $2 and 3rd field with string Does NOT matched here.
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You can use paste and awk to get what you want.
Below solution is assuming the fields in file1 and file2 will be always delimited by "|"
paste -d "|" file1.txt file2.txt | awk -F "|" '{ if( $1 == $4 && $2 == $5 ){print $3, $6, "matches"} else {print $3, $6, "does not match"} }' > output.txt
Related
awk from file using echo and output to file
A.txt contains: /*333*/ asdfasdfadfg sadfasdfasgadas ### /*555*/ hfawehfihohawe aweihfiwahif aiwehfwwh ### /*777*/ jawejfiawjia ajwiejfjeiie eiuehhawefjj ### B.txt contains: 555 777 I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name). So based on the sample above, the result should be : 555.txt, which contains: /*555*/ hfawehfihohawe aweihfiwahif aiwehfwwh and 777.txt, which contains: /*777*/ jawejfiawjia ajwiejfjeiie eiuehhawefjj I tried this script but it outputs nothing: for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done Thank you in advance
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk. awk ' FNR==NR{ if($0~/^\/\*/){ line=$0 gsub(/^\/\*|\*\/$/,"",line) arr[++count]=$0 arr1[line]=count next } arr[count]=(arr[count]?arr[count] ORS:"") $0 next } ($0 in arr1){ outputFile=$0".txt" print arr[arr1[$0]] >> (outputFile) close(outputFile) } ' file1 file2 Explanation: Adding detailed explanation for above code. awk ' ##Starting awk program from here. FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read. if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following. line=$0 ##Setting $0 to line variable here. gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here. arr[++count]=$0 ##Creating arr with index of ++count and value is $0. arr1[line]=count ##Creating arr1 with index of line and value of count. next ##next will skip all further statements from here. } arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value. next ##next will skip all further statements from here. } ($0 in arr1){ ##checking if current line is present in arr1 then do following. outputFile=$0".txt" ##Creating outputFile with current line .txt value here. print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile. close(outputFile) ##Closing outputFile in backend to avoid too many opened files error. } ' file1 file2 ##Mentioning Input_file names here.
Making a few alterations to your code provides the desired outcome with the example data provided: while read -r f do awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt done < B.txt cat 555.txt /*555*/ hfawehfihohawe aweihfiwahif aiwehfwwh cat 777.txt jawejfiawjia ajwiejfjeiie eiuehhawefjj Does this solve your problem?
Here is another awk solution for this: awk ' FNR == NR { map["/*" $0 "*/"] = $0 next } $0 in map { fn = map[$0] ".txt" } /^###$/ { close(fn) fn = "" } fn {print > fn}' B.txt A.txt
How to compare two files and print the values of both the files which are different
There are 2 files. I need to sort them first and then compare the 2 files and then the difference I need to print the value from File 1 and File 2. file1: pair,bid,ask AED/MYR,3.918000,3.918000 AED/SGD,3.918000,3.918000 AUD/CAD,3.918000,3.918000 file2: pair,bid,ask AUD/CAD,3.918000,3.918000 AUD/CNY,3.918000,3.918000 AED/MYR,4.918000,4.918000 Output should be: pair,inputbid,inputask,outputbid,outtputask AED/MYR,3.918000,3.918000,4.918000,4.918000 The only difference in 2 files is AED/MYR with different bid/ask rates. How can I print difference value from file 1 and file 2. I tried using below commands: nawk -F, 'NR==FNR{a[$1]=$4;a[$2]=$5;next} !($4 in a) || !($5 in a) {print $1 FS a[$1] FS a[$2] FS $4 FS $5}' file1 file2 Result output as below: pair,bid,ask,bid,ask AUD/CAD,3.918000,3.918000,3.918000,3.918000 AUD/CHF,3.918000,3.918000,3.918000,3.918000 AUD/CNH,3.918000,3.918000,3.918000,3.918000 AUD/CNY,3.918000,3.918000,3.918000,3.918000 AED/MYR,3.918000,3.918000,4.918000,4.918000 We are still not able to get only the difference.
Could you please try following, written and tested in GNU awk with shown samples. awk -v header="pair,inputbid,inputask,outputbid,outtputask" ' BEGIN{ FS=OFS="," } FNR==NR{ arr[$1]=$0 next } ($1 in arr) && arr[$1]!=$0{ val=$1 $1="" sub(/^,/,"") if(!found){ print header found=1 } print arr[val],$0 }' Input_file1 Input_file2 Explanation: Adding detailed explanation for above. awk -v header="pair,inputbid,inputask,outputbid,outtputask" ' ##Starting awk program from here and setting this to header value here. BEGIN{ ##Starting BEGIN section of this program from here. FS=OFS="," ##Setting field separator and output field separator as comma here. } FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read. arr[$1]=$0 ##Creating arr with index $1 and keep value as current line. next ##next will skip all further statements from here. } ($1 in arr) && arr[$1]!=$0{ ##Checking condition if first field is present in arr and its value NOT equal to $0 val=$1 ##Creating val which has current line value in it. $1="" ##Nullifying irst field here. sub(/^,/,"") ##Substitute starting , with NULL here. if(!found){ ##Checking if found is NULL then do following. print header ##Printing header here only once. found=1 ##Setting found here. } print arr[val],$0 ##Printing arr with index of val and current line here. }' Input_file1 Input_file2 ##Mentioning Input_files here.
With bash process substitution, then join and then choosing with awk: # print header printf "%s\n" "pair,inputbid,inputask,outputbid,outtputask" # remove first line from both files, then sort them on first field # then join them on first field and output first 5 fields join -t, -11 -21 -o1.1,1.2,1.3,2.2,2.3 <(tail -n +2 file1 | sort -t, -k1) <(tail -n +2 file2 | sort -t, -k1) | # output only those lines, that columns differ awk -F, '$2 != $4 || $3 != $5'
Replace the first column in a file with another column in different file using shell
I have two files file1 and file2 file1 Shyam=123=12.3.4.5=user#gmail.com Shyam=123=12.2.5.4=user#gmail.com Joshwa=234=14.3.4.67=user#gmail.com Anil=879=15.3.4.98=user#gmail.com Anil=765=15.4.5.65=user#gmail.com ....... file2 Shyam=ShyamLal Joshwa=JoshwaSam Anil=AnilAcharya .... "=" is mentioned as a seperator in file1 and file2. I want to update names as given in file2. ie.,Shyam will be replaced with ShyamLal, Joshwa will be replaced with JoshwaSam and Anil will be replaced with AnilAcharya. I don't want to use if-else condition, because I have large number of datas. My output should be like: ShyamLal=123=12.3.4.5=user#gmail.com ShyamLal=123=12.2.5.4=user#gmail.com JoshwaSam=234=14.3.4.67=user#gmail.com AnilAcharya=879=15.3.4.98=user#gmail.com AnilAcharya=765=15.4.5.65=user#gmail.com. I tried this. But don't know whether I am doing right while IFS= read -r line do key=`echo $line | awk -F "=" '{print $1}'` < file1.txt value=`echo $line | awk -F "=" '{print $2}' < file2.txt` cat file1.txt | sed 's/$key/$value/g' done How can I proceed?
Could you please try following. awk ' BEGIN{ FS=OFS="=" } FNR==NR{ a[$1]=$2 next } ($1 in a){ $1=a[$1] } 1 ' Input_file2 Input_file1 Explanation: Adding detailed explanation for above code here. awk ' ##Starting awk program from here. BEGIN{ ##Starting BEGIN section here. FS=OFS="=" ##Setting FS and OFS as = for all lines here. } ##Closing BLOCK for BEGIN section of this program here. FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file file2 is being read. a[$1]=$2 ##Creating an array named a with index $1 with value of $2 of current line. next ##next will skip all further statements from here. } ($1 in a){ ##Checking condition if $1 is present in array a this will be done when Input_file1 is being read. $1=a[$1] ##Setting $1 to array a value with index $1 of current line. } 1 ##1 will print edited/non-edited line here. ' file2 file1 ##Mentioning Input_file names here.
How to run a bash script in a loop
i wrote a bash script in order to pull substrings and save it to an output file from two input files that looks like this: input file 1 >genotype1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa input file 2 gene1 10 20 gene2 40 50 genen x y my script >output_file cat input_file2 | while read row; do echo $row > temp geneName=`awk '{print $1}' temp` startPos=`awk '{print $2}' temp` endPos=`awk '{print $3}' temp` length=$(expr $endPos - $startPos) for i in temp; do echo ">${geneName}" >> genes_fasta awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' input_file1 >> output file done done how can i make it work in a loop for more than one string in the input file 1? new input file looks like this: >genotype1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >genotype2 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb >genotypen... nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn... I would like to have a different out file for every genotype and that the file name would be the genotype name. thank you!
If I'm understanding correctly, would you try the following: awk ' FNR==NR { name[NR] = $1 start[NR] = $2 len[NR] = $3 - $2 count = NR next } /^>/ { sub(/^>/,"") genotype=$0 next } { for (i = 1; i <= count; i++) { print ">" name[i] > genotype print substr($0, start[i], len[i]) >> genotype } close(genotype) }' input_file2 input_file1 input_file1: >genotype1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >genotype2 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb >genotype3 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn Input_file2: gene1 10 20 gene2 40 50 gene3 20 25 [Results] genotype1: >gene1 aaaaaaaaaa >gene2 aaaaaaaaaa >gene3 aaaaa genotype2: >gene1 bbbbbbbbbb >gene2 bbbbbbbbbb >gene3 bbbbb genotype3: >gene1 nnnnnnnnnn >gene2 nnnnnnnnnn >gene3 nnnnn [EDIT] If you want to store the output files to a different directory, please try the following instead: dir="./outdir" # directory name to store the output files # you can modify the name as you want mkdir -p "$dir" awk -v dir="$dir" ' FNR==NR { name[NR] = $1 start[NR] = $2 len[NR] = $3 - $2 count = NR next } /^>/ { sub(/^>/,"") genotype=$0 next } { for (i = 1; i <= count; i++) { print ">" name[i] > dir"/"genotype print substr($0, start[i], len[i]) >> dir"/"genotype } close(dir"/"genotype) }' input_file2 input_file1 The 1st two lines are executed in bash to define and mkdir the destination directory. Then the directory name is passed to awk via -v option Hope this helps.
Could you please try following, where I am assuming that your Input_file1's column which starts with > should be compared with 1st column of Input_file2's first column (since samples are confusing so based on OP's attempt this has been written). awk ' FNR==NR{ start_point[$1]=$2 end_point[$1]=$3 next } /^>/{ sub(/^>/,"") val=$0 next } { print val ORS substr($0,start_point[val],end_point[val]) val="" } ' Input_file2 Input_file1 Explanation: Adding explanation for above code. awk ' ##Starting awk program from here. FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read. start_point[$1]=$2 ##Creating an array named start_point with index $1 of current line and its value is $2. end_point[$1]=$3 ##Creating an array named end_point with index $1 of current line and its value is $3. next ##next will skip all further statements from here. } /^>/{ ##Checking condition if a line starts from > then do following. sub(/^>/,"") ##Substituting starting > with NULL. val=$0 ##Creating a variable val whose value is $0. next ##next will skip all further statements from here. } { print val ORS substr($0,start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val]. val="" ##Nullifying variable val here. } ' Input_file2 Input_file1 ##Mentioning Input_file names here.
awk to input column data from one file to another based on a match
Objective I am trying to fill out $9(booking ref), $10 (client) in file1.csv with information pulled from $2 (booking ref) and $3 (client) of file2.csv using "CAMPAIGN ID" ($5 in file1.csv and $1 in file2.csv). So where I have a match between the two files based on "CAMPAIGN ID" I want to print the columns of file2.csv into the matching rows of file1.csv File1.csv INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,, BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,, BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,, BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,, BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,, BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,, BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,, BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,, BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,, BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,, BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,, BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,, File2.csv CAMPAIGN ID,Booking Ref,client 20572431,ref1,1 21489000,ref2,1 23030494,ref3,1 22973424,ref4,1 23150347,ref5,1 20572149,ref6,1 20578983,ref7,2 22243554,ref8,2 20589101,ref9,3 23077785,ref10,3 21811100,ref11,3 23194965,ref12,3 Desired Output INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,ref1,1 BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,ref9,3 BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,ref3,1 BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,ref4,1 BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,ref2,1 BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,ref5,1 BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,ref12,3 BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,ref7,2 BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,ref8,2 BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,ref6,1 BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,ref10,3 BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,ref11,3 What I've tried From the research I've done on line this appears to be possible using awk and join (How to merge two files using AWK? to get me the closest out of what I found online). I've tried various awk codes I've found online and I can't seem to get it to achieve my goal. below is the code I've been trying to massage and work that gets me the closes. At current the code is set up to try and populate just the booking ref as I presume I can just rinse repeat it for the client column. With this code I was able to get it to populate the booking ref but it required me to move CAMPAIGN ID to $1 and all it did was replace the values. NOTE: The order for file1.csv won't sync with file2.csv. All rows may be in a different order as shown in this example. current code awk -F"," -v OFS=',' 'BEGIN { while (getline < "fil2.csv") { f[$1] = $2; } } {print $0, f[$1] }' file1.csv Can someone confirm where I'm going wrong with this code as I've tried altering the columns in this - and the file - without success? Maybe it's just how I'm understanding the code itself.
Like this: awk 'BEGIN{FS=OFS=","} NR==FNR{r[$1]=$2;c[$1]=$3;next} NR>1{$9=r[$5];$10=c[$5]} 1' \ file2.csv file1.csv Explanation in multi line form: # Set input and output field delimiter to , BEGIN{ FS=OFS="," } # Total row number is the same as the row number in file # as long as we are reading the first file, file2.csv NR==FNR{ # Store booking ref and client id indexed by campaign id r[$1]=$2 c[$1]=$3 # Skip blocks below next } # From here code runs only on file1.csv NR>1{ # Set booking ref and client id according to the campaign id # in field 5 $9=r[$5] $10=c[$5] } # Print the modified line of file1.csv (includes the header line) { print }
Could you please try following. awk ' BEGIN{ FS=OFS="," print " print "INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client" } FNR==NR && FNR>1{ val=$1 $1="" sub(/^,/,"") a[val]=$0 next } ($5 in a) && FNR>1{ sub(/,*$/,"") print $0,a[$5] } ' file2.csv file1.csv Explanation: Adding explanation for above code. awk ' ##Starting awk program from here. BEGIN{ ##Starting BEGIN section of code here. FS=OFS="," ##Setting FS(field separator) and OFS(output field separator)as comma here. print "INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client" } ##Closing BEGIN section of this program now. FNR==NR && FNR>1{ ##Checking condition FNR==NR which will be true when file2.csv is being read. val=$1 ##Creating variable val whose value is $1 here. $1="" ##Nullifying $1 here. sub(/^,/,"") ##Substitute initial comma with NULL in this line. a[val]=$0 ##Creating an array a whose index is val and value is $0. next ##next will skip all further statements from here. } ##Closing BLOCK for condition FNR==NR here. ($5 in a) && FNR>1{ ##Checking if $5 is present in array a this condition will be checked when file1.csv is being read. sub(/,*$/,"") ##Substituting all commas at last of line with NULL here. print $0,a[$5] ##Printing current line and value of array a with index $5 here. } ##Closing BLOCK for above ($5 in a) condition here. ' file2.csv file1.csv ##Mentioning Input_file names here. Output will be as follows. INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,ref1,1 BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,ref9,3 BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,ref3,1 BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,ref4,1 BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,ref2,1 BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,ref5,1 BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,ref12,3 BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,ref7,2 BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,ref8,2 BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,ref6,1 BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,ref10,3 BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,ref11,3