Comparing the data from one file to another and print the output - bash

I have three files of name - File1, File2 and File3. The data of the three files is shown below:
File1:
// Class of "A2" of type "ONE".
// Class of "A3" of type "ONE".
// Class of "D1" of type "TWO".
// Class of "D2" of type "TWO".
// Class of "D3" of type "FOUR".
// Class of "D6" of type "FIVE."
File2:
#CLASS_NAMES = ("one",
"two",
"three",);
#CLASS_LIST_NAMES = ("ONE.A1",
"ONE.A2",
"ONE.A3",
"TWO.D1",
"TWO.D2");
File3:
D3
D4
D5
I need to check in File1 Class "D3" is present in the File2 of #CLASS_LIST_NAMES or not.
If it is not present in File2 of #CLASS_LIST_NAMES then I need to check in File3 if D3 is present there or not.
If D3 is present in File3 then the output should be as PASS and if it not present in both File2 and File3 the output should be FAIL.
Similarly I need to check for all the Class list-(A2, A3, D1, D2....) from File1 if they are present in the File2 of #CLASS_LIST_NAMES or not and if they are not present in File2, I need to verify with File3 and print the output as PASS or FAIL.
I tried the below code:
#!/bin/bash
sed -n '/#CLASS_LIST_NAMES =/,/)/p' File2
I'm stuck at here, can anyone tell me what need to be done next.
Deisred_Output: As from File1 - D6 is not found in both File2 and File3 it should print as FAIL. The output should be like below:
Fail: D6 is not found

You can achieve this with grep and awk
Use GNU grep which supports -P option
awk 'NR==FNR{a[$0]; next} !($0 in a){print "Fail: "$0 " is not found"}' <(cat file3 <(grep -Po '(?<=\.)[^"]+' file2)) <(grep -Po '(?<=of ")\w+' file1)
If you want to extract the classnames present only in the #CLASS_LIST_NAMES statement use below one.
awk 'NR==FNR{a[$0]; next} !($0 in a){print "Fail: "$0 " is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) <(grep -Po '(?<=of ")\w+' file1)
If the no of spaces in the file1 are not consistent, you can process using awk
# expects the 4th column is the variable, input format shouldn't change
awk 'NR==FNR{a[$0]; next} {gsub("\"","",$4)} !($4 in a){print "Fail: "$4" is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) file1
# alternate way using FPAT if the position of actual field can change, but it occurs first between double quotes
awk 'NR==FNR{a[$0]; next} {gsub("\"","",$1)} !($1 in a){print "Fail: "$1" is not found"}' <(cat file3 <(sed -n '/#CLASS_LIST_NAMES/,/;$/p' | grep -Po '(?<=\.)[^"]+' file2)) FPAT="\"[^ \"]+" file1

Related

Write specific columns of files into another files, Who can give me a more concise solution?

I have a troublesome problem about writing specific columns of the file into another file, more details are I have the file1 like below, I need to write the first columns exclude the first row to file2 with one line and separated with '|' sign. And now I have a solution by sed and awk, this missing last step inserts into the top of file2, even though I still believe there should be some more concise solution on account of powerful of awk、sed, etc. So, Who can offer me another more concise script?
sed '1d;s/ .//' ./file1 | awk '{printf "%s|", $1; }' | awk '{if (NR != 0) {print substr($1, 1, length($1) - 1)}}'
file1:
col_name data_type comment
aaa string null
bbb int null
ccc int null
file2:
xxx ccc(whatever is this)
The result of file2 should be this :
aaa|bbb|ccc
xxx ccc(whatever is this)
Assuming there's no whitespace in the column 1 data, in increasing length:
sed -i "1i$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')" file2
or
ed file2 <<END
1i
$(awk 'NR > 1 {print $1}' file1 | paste -sd '|')
.
wq
END
or
{ awk 'NR > 1 {print $1}' file1 | paste -sd '|'; cat file2; } | sponge file2
or
mapfile -t lines < <(tail -n +2 file1)
col1=( "${lines[#]%%[[:blank:]]*}" )
new=$(IFS='|'; echo "${col1[*]}"; cat file2)
echo "$new" > file2
This might work for you (GNU sed):
sed -z 's/[^\n]*\n//;s/\(\S*\).*/\1/mg;y/\n/|/;s/|$/\n/;r file2' file1
Process file1 "wholemeal" by using the -z command line option.
Remove the first line.
Remove all columns other than the first.
Replace newlines by |'s
Replace the last | by a newline.
Append file2.
Alternative using just command line utils:
tail +2 file1 | cut -d' ' -f1 | paste -s -d'|' | cat - file2
Tail file1 from line 2 onwards.
Using the results from the tail command, isolate the first column using a space as the column delimiter.
Using the results from the cut command, serialize each line into one, delimited by |',s.
Using the results from the paste, append file2 using the cat command.
I'm learning awk at the moment.
awk 'BEGIN{a=""} {if(NR>1) a = a $1 "|"} END{a=substr(a, 1, length(a)-1); print a}' file1
Edit: Here's another version that uses an array:
awk 'NR > 1 {a[++n]=$1} END{for(i=1; i<=n; ++i){if(i>1) printf("|"); printf("%s", a[i])} printf("\n")}' file1
Here is a simple Awk script to merge the files as per your spec.
awk '# From the first file, merge all lines except the first
NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
# We are in the second file; add a newline after data from first file
FNR == 1 { printf "\n" }
# Simply print all lines from file2
1' file1 file2
The NR==FNR condition is true when we are reading the first input file: The overall line number NR is equal to the line number within the current file FNR. The final 1 is a common idiom for printing all input lines which make it this far into the script (the next in the first block prevent lines from the first file to reaching this far).
For conciseness, you can remove the comments.
awk 'NR == FNR { if (FNR > 1) { printf "%s%s", sep, $1; sep = "|"; } next }
FNR == 1 { printf "\n" } 1' file1 file2
Generally speaking, Awk can do everything sed can do, so piping sed into Awk (or vice versa) is nearly always a useless use of sed.

Why awk does not work in script file while select somthing between two files

In my project, I have two files.
The content of file1 is like:
bme-zhangyl
chem-abbott
chem-hef
chem-lijun
chem-liuch
chem-lix
chem-nisf
chem-quanm
chem-sunli
chem-taohq
chem-wanggc
chem-wangyg
The content of file2 is like:
bme-zhangyl bme-zhangmm
phy-dongert phy-zhangwq
chem-lijun phy-zhangwq
ls-liulj bio-chenw
phy-zhangyb phy-zhangwq
mee-xingw mee-rongym
cs-likm cs-hisao
cs-nany cs-hisao
cs-pengym cs-hisao
chem-quanm cs-hisao
cs-likq cs-hisao
cs-wujx cs-liuyp
mse-mar mse-liangyy
ccse-xiezy ccse-xiezy
maad-chensm maad-wanmp
Now i have a script file, the content of it is like:
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '($1=='"$i"'){print $2}' file2`
echo $groupname
done
But it is unlucky, it displays nothing;
i have tried another way:
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '{if($1=='"$i"')print $2}' file2`
echo $groupname
done
and
#!/bash/sh
for i in $(cat file1)
do
groupname=`awk '{if($1==$i)print $2}' file2`
echo $groupname
done
They are all fail. It seems nothing wrong, who can help me?
The correct output should be:
bme-zhangmm
phy-zhangwq
cs-hisao
Using bare awk:
$ awk 'NR==FNR{a[$1];next}$1 in a{print $2}' file1 file2
Output:
bme-zhangmm
phy-zhangwq
cs-hisao
Explained:
$ awk '
NR==FNR { # has file1 strings to a hash
a[$1]
next
}
$1 in a { # if file2 field 1 keyword was hashed from file1
print $2 # output word from field 2
}' file1 file2
UpdateD: As a script:
#!/bin/sh
awk 'NR==FNR{a[$1];next}$1 in a{print $2}' file1 file2
i have tested:
groupname=`awk '{if($1==" '$i' ") print $2}' UGfrompwdguprst`
it works Ok

Non matching word from file1 to file2

I have two files - file1 & file2.
file1 contains (only words) says-
ABC
YUI
GHJ
I8O
..................
file2 contains many para.
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
...................
I am using below command to get the matching lines which contains word from file1 in file2
grep -Ff file1 file2
(Gives output of lines where words of file1 found in file2)
I also need the words which doesn't match/found in file 2 and unable to find Un-matching word.
Can anyone help in getting below output
YUI
I8O
i am looking one liner command (via grep,awk,sed), as i am using pssh command and can't use while,for loop
You can print only the matched parts with -o.
$ grep -oFf file1 file2
ABC
GHJ
Use that output as a list of patterns for a search in file1. Process substitution <(cmd) simulates a file containing the output of cmd. With -v you can print lines that did not match. If file1 contains two lines such that one line is a substring of another line you may want to add -x (only match whole lines) to prevent false positives.
$ grep -vxFf <(grep -oFf file1 file2) file1
YUI
I8O
Using Perl - both matched/non-matched in same one-liner
$ cat sinw.txt
ABC
YUI
GHJ
I8O
$ cat sin_in.txt
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
$ perl -lne '
BEGIN { %x=map{chomp;$_=>1} qx(cat sinw.txt); $w="\\b".join("\|",keys %x)."\\b"}
print "$&" and delete($x{$&}) if /$w/ ;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
ABC
GHJ
non-matched
I8O
YUI
$
Getting only the non-matched
$ perl -lne '
BEGIN {
%x = map { chomp; $_=>1 } qx(cat sinw.txt);
$w = "\\b" . join("\|",keys %x) . "\\b"
}
delete($x{$&}) if /$w/;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
non-matched
I8O
YUI
$
Note that even a single use of $& variable used to be very expensive for the whole program, in Perl versions prior to 5.20.
Assuming your "words" in file1 are in more than 1 line :
while read line
do
for word in $line
do
if ! grep -q $word file2
then echo $word not found
fi
done
done < file1
For Un-matching words, here's one GNU awk solution:
awk 'NR==FNR{a[$0];next} !($1 in a)' RS='[ \n]' file2 file1
YUI
I8O
Or !($0 in a), it's the same. Since I set RS='[ \n]', every space as line separator too.
And note that I read file2 first, and then file1.
If file2 could be empty, you should change NR==FNR to different file checking methods, like ARGIND==1 for GNU awk, or FILENAME=="file2", or FILENAME==ARGV[1] etc.
Same mechanism for only the matched one too:
awk 'NR==FNR{a[$0];next} $0 in a' RS='[ \n]' file2 file1
ABC
GHJ

Compare columns in two text files and match lines

I want to compare the second column (delimited by a whitespace) in file1:
n01443537/n01443537_481.JPEG n01443537
n01629819/n01629819_420.JPEG n01629819
n02883205/n02883205_461.JPEG n02883205
With the second column (delimited by a whitespace) in file2:
val_8447.JPEG n09256479
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
val_8480.JPEG n03089624
If there is a match, I would like to print out the corresponding line of file2.
Desired output in this example:
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
I tried the following, but the output file is empty:
awk -F' ' 'NR==FNR{c[$2]++;next};c[$2] > 0' file1.txt file2.txt > file3.txt
Also tried this, but the result was the same (empty output file):
awk 'NR==FNR{a[$2];next}$2 in a' file1 file2 > file3.txt
GNU join exists for this purpose.
join -o "2.1 2.2" -j 2 <(sort -k 2 file1) <(sort -k 2 file2)
Using awk:
awk 'FNR==NR{a[$NF]; next} $NF in a' file1 file2
val_68.JPEG n01443537
val_1054.JPEG n01629819
val_1542.JPEG n02883205
Here is a grep alternative with process substitution:
grep -f <(awk '{print " " $NF "$"}' file1) file2
Using print " " $NF "$" to create a regex like " n01443537$" so that we match only last column in grep.

comparing 2 files and extracting elements from file

I have two files. one has list of names (only one column) and the second file is with three columns with names, phone number, country.
What I want is to extract the data of the people whose names are not present in file 1, but only present in file2.
#!/bin/bash
for i in `cat file1 `
do
cat file2 | awk '{ if ($1 != "'$i'") {print $1 "\t" $2 "\t" $3 }}'>>NonResp
done
What I get is a weird result with more data than expected.
Kindly help.
You can do this with grep:
grep -v -F -f file1 file2
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
The files must already be sorted for this to work properly.
Explanation:
=> awk '{print $1}' file2 |
print only the first fileld of file2 and feed it to the next command (|)
=> comm -1 -3 file1 - |
compare file1 and the output of the last command (-) and suppress lines only in file1 (-1) as well as lines in both files (-3); that leaves lines in file2 only and feed this to the next command (|)
=> join file2 -
join the original file2 and the output from the last command (-) and write out the fields fo the matching lines (whitespace between fields is truncated, however)
Testcase:
cat <<EOF >file1
alan
bert
cindy
dave
fred
sunny
ted
EOF
cat <<EOF >file2
bert 01 AU
cindy 03 CZ
ginny 05 CN
ted 07 CH
zorro 09 AG
EOF
awk '{print $1}' file2 | comm -1 -3 file1 - | join file2 -
assuming the field delimiter as "," in file2
awk -F, 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
if "," is not the delimiter ,then simply
awk 'FNR==NR{a[$1];next}!($1 in a)' file1 file2
would be sufficient.

Resources