How to compare entries in one file to two files? - shell

I have a file (named file1) which consists of names and their IPs. It looks something like this :-
VM1 2.33.4.22
VM2 88.43.21.34
VM3 120.3.45.66
VM4 99.100.34.5
VM5 111.3.4.66
and i have two files (file2 and file3) which consists solely of IPs.
File 2 consists of:-
120.3.45.66
88.43.21.34
File 3 consists of :-
99.100.34.5
I want to compare file1 to file2 and file3 and get the names and IPs that are not present in file2 and file3. So output would be:
VM1 2.33.4.22
VM5 111.3.4.66
How can i get the desired output?

sed 's/\./\\./g; s/.*/ &$/' file2 file3 | grep -vf - file1
Use sed to turn the entries in files 2 and 3 in to appropriate regexes.
Pipe this regex list to grep, with -f - to get the pattern list from standard input, and -v to print non matching lines in file 1.

You can write a shell script that will do it for you.
#!/bin/sh
cat $1.txt $2.txt > mergedFile.txt
grep -v -f mergedFile.txt $3.txt
You can run the script by using the following command
sh check.sh file2 file3 file1

awk 'NR==FNR { out[$1]=1; next} !out[$2]' <(/bin/cat file2 file3) file1
This uses basically the same thing as the sed solution, using awk instead.

Related

Finiding common lines for two files using bash

I am trying to compare two files and output a file which consists of common names for both.
File1
1990.A.BHT.s_fil 4.70
1991.H.BHT.s_fil 2.34
1992.O.BHT.s_fil 3.67
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
File2
1990.A.BHT_ScS.dat 1537 -2.21
1993.C.BHT_ScS.dat 1494 1.13
1994.I.BHT_ScS.dat 1545 0.15
1995.K.BHT_ScS.dat 1624 1.15
I want to compare the first parts of the names ** (ex:1990.A.BHT ) ** on both files and output a file which has common names with the values on 2nd column in file1 to file3
ex: file3 (output)
1990.A.BHT.s_fil 4.70
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01
I used following codes which uses grep command
while read line
do
grep $line file1 >> file3
done < file2
and
grep -wf file1 file2 > file3
I sort the files before using this script.
But I get an empty file3. Can someone help me with this please?
You need to remove everything starting from _SCS.dat from the lines in file2. Then you can use that as a pattern to match lines in file1.
grep -F -f <(sed 's/_SCS\.dat.*//' file2) file1 > file3
The -F option matches fixed strings rather than treating them as regular expressions.
In your example data, the lines appear to be in sorted order. If you can guarantee that they always are, comm -1 -2 file1 file2 would do the job. If they can be unsorted, do a
comm -1 -2 <(sort file1) <(sort file2)

Shell script for merging dotenv files with duplicate keys

Given two dotenv files,
# file1
FOO="X"
BAR="B"
and
# file2
FOO="A"
BAZ="C"
I want to run
$ ./merge.sh file1.env file2.env > file3.env
to get the following output:
# file3
FOO="A"
BAR="B"
BAZ="C"
So far, I used the python-dotenv module to parse the files into dictionaries, merge them and write them back. However, I feel like there should be a simple solution in shell that rids myself of a third-party module for such a basic task.
Answer
Alright, so I ended up using
$ sort -u -t '=' -k 1,1 file1 file2 | grep -v '^$\|^\s*\#' > file3
which omits blank lines and comments. Nevertheless, the proposed awk solution works just as fine.
Another quite simple approach is to use sort:
sort -u -t '=' -k 1,1 file1 file2 > file3
results in a file where the keys from file1 take precedence over the keys from file2.
Using a simple awk script:
awk -F= '{a[$1]=$2}END{for(i in a) print i "=" a[i]}' file1 file2
This stores all key values in the array a and prints the array content when both files are parsed.
The keys that are in file2 override the ones in file1.
To add new values only from file2 and NOT overwrite initial values from file1. Omit spaces from file 2.
grep "\S" file2 >> file1
awk -F "=" '!a[$1]++' file1 > file3

Compare 2 csv files and delete rows - Shell

I have a 2 csv files. One has several columns, the other is just one column with domains. Simplified data of these files would be
file1.csv:
John,example.org,MyCompany,Australia
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
file2.csv:
example.org
google.es
mysite.uk
The output should be
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
I have tried this solution
grep -v -f file2.csv file1.csv >output-file
Found here
http://www.unix.com/shell-programming-and-scripting/177207-removing-duplicate-records-comparing-2-csv-files.html
But since there is no explanation whatsoever about how the script works, and I suck at shell, I cannot tweak it to make it work for me
A solution for this would be highly appreciated, a solution with some explanation would be awesome! :)
EDIT:
I have tried the line that was suppose to work, but for some reason it does not. Here the output from my terminal. What's wrong with this?
Desktop $ cat file1.csv ; echo
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Desktop $ cat file2.csv ; echo
example.org
google.es
mysite.uk
Desktop $ grep -v -f file2.csv file1.csv
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Why grep doesn't remove the line
John,example.org,MyCompany,Australia
The line you posted, works just fine.
$ grep -v -f file2.csv file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
And here's an explanation. grep will search for a given pattern in a given file and print all lines that match. The simplest example of usage is:
$ grep John file1.csv
John,example.org,MyCompany,Australia
Here we used a simple pattern that matches each character, but you can also use regular expressions (basic, extended, and even perl-compatible ones).
To invert the logic, and print only the lines that do not match, we use the -v switch, like this:
$ grep -v John file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
To specify more than one pattern, you can use the option -e pattern multiple times, like this:
$ grep -v -e John -e Lenny file1.csv
Martha,site.com,ThirdCompany,US
However, if there is a larger number of patterns to check for, we might use the -f file option that will read all patterns from a file specified.
So, when we combine all of those; reading patterns from a file with -f and inverting the matching logic with -v, we get the line you need.
One in awk:
$ awk -F, 'NR==FNR{a[$1];next}($2 in a==0)' file2 file1
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
Explained:
$ awk -F, ' # using awk, comma-separated records
NR==FNR { # process the first file, file2
a[$1] # hash the domain to a
next # proceed to next record
}
($2 in a==0) # process file1, if domain in $2 not in a, print the record
' file2 file1 # file order is important

UNIX - Simple merging of two files as in the input

Input File1:
HELLO
HOW
Input File2:
ARE
YOU
output file should be
HELLO
HOW
ARE
YOU
My input files will be in one folder and my script has to fetch the input files from that folder and merge all the files as in the above given order.
Thanks
You can simply use cat as shown below:
cat file1 file2
or, to concatenate all files in a folder (assuming there are not too many):
cat folder/*
sed '' file1 file2
hope this works fine +
cat:
cat file1 file2 >output
perl:
perl -plne '' file1 file2 >output
awk:
awk '1' file1 file2 >output

Unix: One line bash command to merge 3 files together. extracting only the first line of each

I am having time with my syntax here:
I have 3 files with various content file1 file2 file3 (100+ lines). I am trying to merge them together, but only the first line of each file should be merged. The point is to do it using one line of bash code:
sed -n 1p file1 file2 file3 returns only the first line of file1
You might want to try
head -n1 -q file1 file2 file3.
It's not clear if by merge you mean concatenate or join?
In awk by joining (each first line in the files printed side by side):
$ awk 'FNR==1{printf "%s ",$0}' file1 file2 file3
1 2 3
In awk by concatenating (each first line in the files printed one after another):
$ awk 'FNR==1' file1 file2 file3
1
2
3
I suggest you use head as explained by themel's answer. However, if you insist in using sed you cannot simply pass all files to it, since they are implicitly concatenated and you lose information about what the first line is in each file respectively. So, if you really want to do it in sed, you need bash to help you out:
for f in file1 file2 file3; do sed -n 1p "$f"; done
You can avoid calling external processes by using the read built-in command:
for f in file1 file2 file3; do read l < $f; echo "$l"; done > merged.txt

Resources