Compare what lines are missing in two files

Compare what lines are missing in two files - bash

I have two files file1.txt and file2.txt
cat file1.txt
home/user/city/a/1.txt
home/user/state/b/2.txt
home/user/county/d/4.txt
cat file2.txt
/home/user/city/a/1.txt
/home/user/state/b/2.txt
/home/user/county/c/3.txt
I am trying to figure what *.txt files are missing by comparing both the files and printing the full path of the missing file.
Expected output
/home/user/county/c/3.txt
/home/user/county/d/4.txt

comm -3 <(sed 's/^/\//' file1.txt | sort) <(sort file2.txt) | awk '{print $1$2}'

Try diff. It tells you that you have a changed line 2 (2c2) and gives you the corresponding lines as output.
% diff <(sort file1 | sed 's/^/\//') <(sort file2)
2c2
< /home/user/county/d/4.txt
---
> /home/user/county/c/3.txt
(Also consider using comm as it's oftentimes already does the job, as pointed out in the other posts)

Related

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this

If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v

Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2

Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

Finding common lines in two files that have some blank lines

I got two almost identical files, same amount of lines and it's a code.
I'm trying to create a file of the common lines between these two files and also have blank lines where the lines are different.
I tried using comm, and it works good but doesn't provide me the blank lines I need on the bad lines, it just eliminates the lines and the common file is shorter(line count).
This is what I tried:
comm -1 -2 file1 file2

comm needs sorted files. So, you could use command substitution like this:
comm -12 <(sort file1) <(sort file2)
If you want to skip blank lines (spaces), then:
comm -12 <(grep -Ev '^[ ]+$' file1 | sort) <(grep -Ev '^[ ]+$' file2 | sort)
To skip blank lines that have spaces or tabs:
comm -12 <(grep -Ev $'^[ \t]+$' file1 | sort) <(grep -Ev $'^[ \t]+$' file2 | sort)

Find different records between two files using Unix Commands

I have two files with shown records. How can I get the different records using Shell script/Unix Command. I don't want the common records from the both the files.
Thanks

It would have been nice to be able to copy and paste the input file text. Nevertheless:
comm -3 <(sort file1) <(sort file2) | sed 's/^\t//'
or
awk '
{count[$0]++}
END {for (line in count) if (count[line] == 1) print line}
' file1 file2

This should work:
grep -vf File1 File2
grep -vf File2 File1
Thanks for correction #jm666, this is nicer

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.

You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.

You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.

If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!

Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c

comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh

awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Compare what lines are missing in two files - bash

comm -3 <(sed 's/^/\//' file1.txt | sort) <(sort file2.txt) | awk '{print $1$2}'

Related

Diff to get changed line from second file

Finding common lines in two files that have some blank lines

Find different records between two files using Unix Commands

bash delete lines in file containing lines from another file

Searching for Strings

Categories

Resources