How could I compare two files and remove similar rows in them (bash script) - bash

I have two files of data with similar number of columns. I'd like to save file2 in another file (file3) while I exclude the rows which are existed already in the file1.
grep -v -i -f file1 file2> file3
But the problem is that the space between columns in the file1 is "\t" while in the other one it is just " ". Therefore this command line doesn't work.
Any suggestion??
Thanks folks!

You can convert tabs to spaces on the fly:
grep -vif <(tr '\t' ' ' < file1) file2 > file3
This is process substitution.

Try:
grep -Fxvf file1 file2
Switch meanings available from the grep man page.

grep -v -f is problematic because it searches file2 for each line in file1. With large files it will take a very long time. Try this instead:
comm -13 <(cat file1 | tr '\t' ' ' | sort) <(sort file2)

Related

Shell script for merging dotenv files with duplicate keys

Given two dotenv files,
# file1
FOO="X"
BAR="B"
and
# file2
FOO="A"
BAZ="C"
I want to run
$ ./merge.sh file1.env file2.env > file3.env
to get the following output:
# file3
FOO="A"
BAR="B"
BAZ="C"
So far, I used the python-dotenv module to parse the files into dictionaries, merge them and write them back. However, I feel like there should be a simple solution in shell that rids myself of a third-party module for such a basic task.
Answer
Alright, so I ended up using
$ sort -u -t '=' -k 1,1 file1 file2 | grep -v '^$\|^\s*\#' > file3
which omits blank lines and comments. Nevertheless, the proposed awk solution works just as fine.
Another quite simple approach is to use sort:
sort -u -t '=' -k 1,1 file1 file2 > file3
results in a file where the keys from file1 take precedence over the keys from file2.
Using a simple awk script:
awk -F= '{a[$1]=$2}END{for(i in a) print i "=" a[i]}' file1 file2
This stores all key values in the array a and prints the array content when both files are parsed.
The keys that are in file2 override the ones in file1.
To add new values only from file2 and NOT overwrite initial values from file1. Omit spaces from file 2.
grep "\S" file2 >> file1
awk -F "=" '!a[$1]++' file1 > file3

Diff to get changed line from second file

I have two files file1 and file2. I want to print the new line added to file2 using diff.
file1
/root/a
/root/b
/root/c
/root/d
file2
/root/new
/root/new_new
/root/a
/root/b
/root/c
/root/d
Expected output
/root/new
/root/new_new
I looked into man page but there was no any info on this
If you don't need to preserve the order, you could use the comm command like:
comm -13 <(sort file1) <(sort file2)
comm compares 2 sorted files and will print 3 columns of output. First is the lines unique to file1, then lines unique to file2 then lines common to both. You can supress any columns, so we turn of 1 and 3 in this example with -13 so we will see only lines unique to the second file.
or you could use grep:
grep -wvFf file1 file2
Here we use -f to have grep get its patterns from file1. We then tell it to treat them as fixed strings with -F instead of as patterns, match whole words with -w, and print only lines with no matches with -v
Following awk may help you on same. This will tell you all those lines which are present in Input_file2 and not in Input_file1.
awk 'FNR==NR{a[$0];next} !($0 in a)' Input_file1 Input_file2
Try using a combination of diff and sed.
The raw diff output is:
$ diff file1 file2
0a1,2
> /root/new
> /root/new_new
Add sed to strip out everything but the lines beginning with ">":
$ diff file1 file2 | sed -n -e 's/^> //p'
/root/new
/root/new_new
This preserves the order. Note that it also assumes you are only adding lines to the second file.

Find different records between two files using Unix Commands

I have two files with shown records. How can I get the different records using Shell script/Unix Command. I don't want the common records from the both the files.
Thanks
It would have been nice to be able to copy and paste the input file text. Nevertheless:
comm -3 <(sort file1) <(sort file2) | sed 's/^\t//'
or
awk '
{count[$0]++}
END {for (line in count) if (count[line] == 1) print line}
' file1 file2
This should work:
grep -vf File1 File2
grep -vf File2 File1
Thanks for correction #jm666, this is nicer

bash delete lines in file containing lines from another file

file1 contains:
someword0
someword2
someword4
someword6
someword8
someword9
someword1
file2 contains:
someword2
someword3
someword4
someword5
someword7
someword11
someword1
So I wan't to have only lines from file1 which file2 doesn't contains. How can I do this in bash ?
That's the answer:
grep -v -x -f file2 file1
-v for select non-matching lines
-x for matching whole lines only
-f f2 to get patterns from f2.
You can use grep -vf:
grep -vwFf file2 file1
someword0
someword6
someword8
someword9
Check man grep for detailed info on all the grep options used here.
You could use the comm command as well:
comm -23 file1 file2
Explanation:
comm compares two files and prints, in 3 columns, lines unique to file1, file2, and lines in both.
Using the options -2 and -3 (or simply -23) suppresses printing of these columns, so you just get the lines unique to file1.
If your lines are unique, do a left join and filter out lines that exist in both tables.
join <(sort file1) <(sort file2) -o0,2.1 -a1 | awk '!$2'

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Resources