Compare what lines are missing in two files - bash

I have two files file1.txt and file2.txt
cat file1.txt
home/user/city/a/1.txt
home/user/state/b/2.txt
home/user/county/d/4.txt
cat file2.txt
/home/user/city/a/1.txt
/home/user/state/b/2.txt
/home/user/county/c/3.txt
I am trying to figure what *.txt files are missing by comparing both the files and printing the full path of the missing file.
Expected output
/home/user/county/c/3.txt
/home/user/county/d/4.txt

comm -3 <(sed 's/^/\//' file1.txt | sort) <(sort file2.txt) | awk '{print $1$2}'

Try diff. It tells you that you have a changed line 2 (2c2) and gives you the corresponding lines as output.
% diff <(sort file1 | sed 's/^/\//') <(sort file2)
2c2
< /home/user/county/d/4.txt
---
> /home/user/county/c/3.txt
(Also consider using comm as it's oftentimes already does the job, as pointed out in the other posts)

Related

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this
If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v
Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2
Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

Finding common lines in two files that have some blank lines

I got two almost identical files, same amount of lines and it's a code.
I'm trying to create a file of the common lines between these two files and also have blank lines where the lines are different.
I tried using comm, and it works good but doesn't provide me the blank lines I need on the bad lines, it just eliminates the lines and the common file is shorter(line count).
This is what I tried:
comm -1 -2 file1 file2
comm needs sorted files. So, you could use command substitution like this:
comm -12 <(sort file1) <(sort file2)
If you want to skip blank lines (spaces), then:
comm -12 <(grep -Ev '^[ ]+$' file1 | sort) <(grep -Ev '^[ ]+$' file2 | sort)
To skip blank lines that have spaces or tabs:
comm -12 <(grep -Ev $'^[ \t]+$' file1 | sort) <(grep -Ev $'^[ \t]+$' file2 | sort)

Find different records between two files using Unix Commands

I have two files with shown records. How can I get the different records using Shell script/Unix Command. I don't want the common records from the both the files.
Thanks
It would have been nice to be able to copy and paste the input file text. Nevertheless:
comm -3 <(sort file1) <(sort file2) | sed 's/^\t//'
or
awk '
{count[$0]++}
END {for (line in count) if (count[line] == 1) print line}
' file1 file2
This should work:
grep -vf File1 File2
grep -vf File2 File1
Thanks for correction #jm666, this is nicer

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.
You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.
You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.
If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Resources