Finding common lines in two files that have some blank lines - bash

I got two almost identical files, same amount of lines and it's a code.
I'm trying to create a file of the common lines between these two files and also have blank lines where the lines are different.
I tried using comm, and it works good but doesn't provide me the blank lines I need on the bad lines, it just eliminates the lines and the common file is shorter(line count).
This is what I tried:
comm -1 -2 file1 file2

comm needs sorted files. So, you could use command substitution like this:
comm -12 <(sort file1) <(sort file2)
If you want to skip blank lines (spaces), then:
comm -12 <(grep -Ev '^[ ]+$' file1 | sort) <(grep -Ev '^[ ]+$' file2 | sort)
To skip blank lines that have spaces or tabs:
comm -12 <(grep -Ev $'^[ \t]+$' file1 | sort) <(grep -Ev $'^[ \t]+$' file2 | sort)

Related

Compare what lines are missing in two files

I have two files file1.txt and file2.txt
cat file1.txt
home/user/city/a/1.txt
home/user/state/b/2.txt
home/user/county/d/4.txt
cat file2.txt
/home/user/city/a/1.txt
/home/user/state/b/2.txt
/home/user/county/c/3.txt
I am trying to figure what *.txt files are missing by comparing both the files and printing the full path of the missing file.
Expected output
/home/user/county/c/3.txt
/home/user/county/d/4.txt
comm -3 <(sed 's/^/\//' file1.txt | sort) <(sort file2.txt) | awk '{print $1$2}'
Try diff. It tells you that you have a changed line 2 (2c2) and gives you the corresponding lines as output.
% diff <(sort file1 | sed 's/^/\//') <(sort file2)
2c2
< /home/user/county/d/4.txt
---
> /home/user/county/c/3.txt
(Also consider using comm as it's oftentimes already does the job, as pointed out in the other posts)

Finiding common lines for two files using bash

I am trying to compare two files and output a file which consists of common names for both.
File1
1990.A.BHT.s_fil 4.70
1991.H.BHT.s_fil 2.34
1992.O.BHT.s_fil 3.67
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
File2
1990.A.BHT_ScS.dat 1537 -2.21
1993.C.BHT_ScS.dat 1494 1.13
1994.I.BHT_ScS.dat 1545 0.15
1995.K.BHT_ScS.dat 1624 1.15
I want to compare the first parts of the names ** (ex:1990.A.BHT ) ** on both files and output a file which has common names with the values on 2nd column in file1 to file3
ex: file3 (output)
1990.A.BHT.s_fil 4.70
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
I used following codes which uses grep command
while read line
do
grep $line file1 >> file3
done < file2
and
grep -wf file1 file2 > file3
I sort the files before using this script.
But I get an empty file3. Can someone help me with this please?
You need to remove everything starting from _SCS.dat from the lines in file2. Then you can use that as a pattern to match lines in file1.
grep -F -f <(sed 's/_SCS\.dat.*//' file2) file1 > file3
The -F option matches fixed strings rather than treating them as regular expressions.
In your example data, the lines appear to be in sorted order. If you can guarantee that they always are, comm -1 -2 file1 file2 would do the job. If they can be unsorted, do a
comm -1 -2 <(sort file1) <(sort file2)

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this
If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v
Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2
Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.
You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.
You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.
If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

How could I compare two files and remove similar rows in them (bash script)

I have two files of data with similar number of columns. I'd like to save file2 in another file (file3) while I exclude the rows which are existed already in the file1.
grep -v -i -f file1 file2> file3
But the problem is that the space between columns in the file1 is "\t" while in the other one it is just " ". Therefore this command line doesn't work.
Any suggestion??
Thanks folks!
You can convert tabs to spaces on the fly:
grep -vif <(tr '\t' ' ' < file1) file2 > file3
This is process substitution.
Try:
grep -Fxvf file1 file2
Switch meanings available from the grep man page.
grep -v -f is problematic because it searches file2 for each line in file1. With large files it will take a very long time. Try this instead:
comm -13 <(cat file1 | tr '\t' ' ' | sort) <(sort file2)

Resources