Find different records between two files using Unix Commands - shell

I have two files with shown records. How can I get the different records using Shell script/Unix Command. I don't want the common records from the both the files.
Thanks

It would have been nice to be able to copy and paste the input file text. Nevertheless:
comm -3 <(sort file1) <(sort file2) | sed 's/^\t//'
or
awk '
{count[$0]++}
END {for (line in count) if (count[line] == 1) print line}
' file1 file2

This should work:
grep -vf File1 File2
grep -vf File2 File1
Thanks for correction #jm666, this is nicer

Related

Finiding common lines for two files using bash

I am trying to compare two files and output a file which consists of common names for both.
File1
1990.A.BHT.s_fil 4.70
1991.H.BHT.s_fil 2.34
1992.O.BHT.s_fil 3.67
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
File2
1990.A.BHT_ScS.dat 1537 -2.21
1993.C.BHT_ScS.dat 1494 1.13
1994.I.BHT_ScS.dat 1545 0.15
1995.K.BHT_ScS.dat 1624 1.15
I want to compare the first parts of the names ** (ex:1990.A.BHT ) ** on both files and output a file which has common names with the values on 2nd column in file1 to file3
ex: file3 (output)
1990.A.BHT.s_fil 4.70
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
I used following codes which uses grep command
while read line
do
grep $line file1 >> file3
done < file2
and
grep -wf file1 file2 > file3
I sort the files before using this script.
But I get an empty file3. Can someone help me with this please?
You need to remove everything starting from _SCS.dat from the lines in file2. Then you can use that as a pattern to match lines in file1.
grep -F -f <(sed 's/_SCS\.dat.*//' file2) file1 > file3
The -F option matches fixed strings rather than treating them as regular expressions.
In your example data, the lines appear to be in sorted order. If you can guarantee that they always are, comm -1 -2 file1 file2 would do the job. If they can be unsorted, do a
comm -1 -2 <(sort file1) <(sort file2)

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this
If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v
Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2
Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

How to search for a pattern having the special characters in awk

I have to match file1 with file2 line by line . But file1 is in below format .If am using ak command to search with the below line in file2 , its throwing error with syntax error at '=' .
File1 :
Country_code=US/base_div_nbr=18/retail_channel_code=1/visit_date=2010-01-02/load_time_stamp=20100102058100
Country_code=US/base_div_nbr=18/retail_channel_code=1/visit_date=2010-01-02/load_time_stamp=20100102091000
Country_code=US/base_div_nbr=18/retail_channel_code=1/visit_date=2010-01-02/load_time_stamp=20100102067000
File2:
Country_code=US/base_div_nbr=18/retail_channel_code=1/visit_date=2010-01-02/load_time_stamp=20100102058100
Country_code=US/base_div_nbr=18/retail_channel_code=1/visit_date=2010-01-02/load_time_stamp=20100102091000
I took total line from file1 as the search pattern to search in file2 using below command:
awk "/$line/ {print ;}" file2
Here file1 , 3 rd record not found in file2 , So I need to know these differences
I am very much new to shell scripting, So please suggest me on this.
This is really a job for comm, assuming you can sort both input files, but if you want to use awk something like this might do it depending on your unstated requirements:
awk 'NR==FNR {file1[NR]=$0; next} $0 != file1[FNR]' file1 file2
If I understand correctly, you want to print the lines that are common to both files. In that case, awk is really not the best tool. You could instead do one of
comm <(sort file1) <(sort file2)
or
grep -Fxf file1 file2
If you really want to do it with awk, you could try
awk 'FNR==NR{a[$0]; next} $0 in a' file1 file2

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.
You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.
You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.
If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

How could I compare two files and remove similar rows in them (bash script)

I have two files of data with similar number of columns. I'd like to save file2 in another file (file3) while I exclude the rows which are existed already in the file1.
grep -v -i -f file1 file2> file3
But the problem is that the space between columns in the file1 is "\t" while in the other one it is just " ". Therefore this command line doesn't work.
Any suggestion??
Thanks folks!
You can convert tabs to spaces on the fly:
grep -vif <(tr '\t' ' ' < file1) file2 > file3
This is process substitution.
Try:
grep -Fxvf file1 file2
Switch meanings available from the grep man page.
grep -v -f is problematic because it searches file2 for each line in file1. With large files it will take a very long time. Try this instead:
comm -13 <(cat file1 | tr '\t' ' ' | sort) <(sort file2)

Resources