egrep -v match lines containing some same text on each line - bash

So I have two files.
Example of file 1 content.
/n01/mysqldata1/mysql-bin.000001
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000004
/n01/mysqldata1/mysql-bin.000005
/n01/mysqldata1/mysql-bin.000006
Example of file 2 content.
/n01/mysqlarch1/mysql-bin.000004
/n01/mysqlarch1/mysql-bin.000001
/n01/mysqlarch2/mysql-bin.000005
So I want to match based only on mysql-bin.00000X and not the rest of the file path in each file as they differ between file1 and file2.
Here's the command I'm trying to run
cat file1 | egrep -v file2
The output I'm hoping for here would be...
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000006
Any help would be much appreciated.

Just compare based on everything from /:
$ awk -F/ 'FNR==NR {a[$NF]; next} !($NF in a)' f2 f1
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000006
Explanation
This reads file2 in memory and then compares with file1.
-F/ set the field separator to /.
FNR==NR {a[$NF]; next} while reading the first file (file2), store every last piece into an array a[]. Since we set the field separator to /, this is the mysql-bin.00000X part.
!($NF in a) when reading the second file (file1) check if the last field (mysql-bin.00000X part) is in the array a[]. If it does not, print the line.
I'm having one problem that I've noticed when testing. If file2 is
empty nothing is returned at all where as I would expected every line
in file1 to be returned. Is this something you could help me with
please? – user2841861.
Then the problem is that FNR==NR matches when reading the second file. To prevent this, just cross check that the "reading into a[] array" action is done on the first file:
awk -F/ 'FNR==NR && argv[1]==FILENAME {a[$NF]; next} !($NF in a)' f2 f1
^^^^^^^^^^^^^^^^^^^^
From man awk:
ARGV
The command-line arguments available to awk programs are stored in an
array called ARGV. ARGC is the number of command-line arguments
present. See section Other Command Line Arguments. Unlike most awk
arrays, ARGV is indexed from zero to ARGC - 1

Related

Extracting lines from 2 files using AWK just return the last match

Im a bit new using AWK and im trying to print lines in a file1 that a specific field exists in a file2. I copied exactly examples that I found here but i dont know why its just printing only the last match of the file1.
File1
58000
72518
94850
File2
58000;123;abc
69982;456;rty
94000;576;ryt
94850;234;wer
84850;576;cvb
72518;345;ert
Result Expected
58000;123;abc
94850;234;wer
72518;345;ert
What Im getting
94850;234;wer
awk -F';' 'NR==FNR{a[$1]++; next} $1 in a' file1 file2
What im doing wrong?
awk (while usable here), isn't the correct tool for the job. grep with the -f option is. The -f file option will read the patterns from file one per-line and search the input file for matches.
So in your case you want:
$ grep -f file1 file2
58000;123;abc
94850;234;wer
72518;345;ert
(note: I removed the trailing '\' from the data file, replace it if it wasn't a typo)
Using awk
If you did want to rewrite what grep is doing using awk, that is fairly simple. Just read the contents of file1 into an array and then for processing records from the second file, just check if field-1 is in the array, if so, print the record (default action), e.g.
$ awk -F';' 'FNR==NR {a[$1]=1; next} $1 in a' file1 file2
58000;123;abc
94850;234;wer
72518;345;ert
(same note about the trailing slash)
Thanks #RavinderSingh13!
The file1 really had some hidden characters and I could see it using cat.
$ cat -v file1
58000^M
72518^M
94850^M
I removed using sed -e "s/\r//g" file1 and the AWK worked perfectly.

Using cut and grep commands in unix

I have a file (file1.txt) with text as:
aaa,,,,,
aaa,10001781,,,,
aaa,10001782,,,,
bbb,10001783,,,,
My file2 contents are:
11111111
10001781
11111222
I need to search second field of file1 in file2 and delete the line from file1 if pattern is matching.So output will be:
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Can I use grep and cut commands for this?
This prints lines from file1.txt only if the second field is not in file2:
$ awk -F, 'FNR==NR{a[$1]=1; next;} !a[$2]' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
How it works
This works by reading file2 and keeping track of all lines seen in an associative array a. Then, lines in file1.txt are printed only if its column 2 is not in a. In more detail:
FNR==NR{a[$1]=1; next;}
When reading file2, set a[$1] to 1 to signal that we have seen the value on this line. We then instruct awk to skip the rest of the commands and start over on the next line.
This section is only run for file2 because file2 is listed first on the command line and FNR==NR only when we are reading the first file listed on the command line. This is because FNR is the number of lines read from the current file and NR is the total number of lines read so far. These two are equal only for the first file.
!a[$2]
When reading file1.txt, a[$2] evaluates to true if column 2 was seen in file2. Since ! is negation, !a[$2] evaluates to true when column 2 was not seen. When this evaluates to true, the line is printed.
Alternative
This is the same logic, expressed in a slightly different style, as suggested in the comments by Tom Fenech:
$ awk -F, 'FNR==NR{a[$1]; next;} !($2 in a)' file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
Soulution with grep
$ grep -vf file2 file1.txt
aaa,,,,,
aaa,10001782,,,,
bbb,10001783,,,,
John1024's awk soulution would be faster for large files though.

Separate and add numbers from an external file with .sh

Question #1
How can I read a column and add each entry from a file using .sh?
Example file:
10000:max:100:1,2:3,4
10001:jill:50:7,8:3,2
10002:fred:300:5,6:7,8
How to use IFS=':' to read that file with a .sh file line by line and add the third part so that it would output the addition e.g. 450
$ ./myProgram myFile.txt
450
A simple awk one-liner command would do this job.
$ awk -F: '{sum+=$3}END{print sum}' file
450
For each line, awk would add the column 3 value to the variable sum. Printing the variable sum at the end will give you the total count. -F: sets the Field Separator value to colon.
It's simple. Try using awk like:
awk -F':' '{sum+=$3} END {print sum}' myfile.txt
Here -F is delimeter where we say fields are delimeted with colon ":" present in file myfile.txt
We add $3 value to sum. And once that's done, we print the value of sum.

Compare and combine two files (shell script)

I have two input files.
I need to compare file1 and file2 and combine the lines that have the same first field in both files. (The rest of the line can be ignored the rest of the line.)
Ideally, I'd like the fields in the output file to be pipe separated.
I'm thinking a simple shell script is all that I need.
file1
0001|14
9934|3
4555|33
file2
0001|coffee|grocery store
0003|gasoline
0005|pickup sticks
9934|protein bars
4555|car
Desired output:
file3
0001|14|0001|coffee|grocery store
9934|protein bars
4555|33|4555|car
Any help would be greatly appreciated.
Using awk:
awk 'BEGIN{FS=OFS="|"} FNR==NR {a[$1]=$0; next} $1 in a {print a[$1], $0}' f1 f2
0001|14|0001|coffee|grocery store
9934|3|9934|protein bars
4555|33|4555|car

awk to compare two files [duplicate]

This question already has answers here:
Fast way of finding lines in one file that are not in another?
(11 answers)
Closed 7 years ago.
I am trying to compare two files and want to print the matching lines... The lines present in the files will be unique
File1.txt
GERMANY
FRANCE
UK
POLLAND
File2.txt
POLLAND
GERMANY
I tried with below command
awk 'BEGIN { FS="\n" } ; NR==FNR{A[$1]++;NEXT}A[$1]' File1.txt File2.txt
but it is printing the matching record twice, I want them to be printed once...
UPDATE
expected output
POLLAND
GERMANY
Current Output
POLLAND
GERMANY
POLLAND
GERMANY
grep together with -f (for file) is best for this:
$ grep -f f1 f2
POLLAND
GERMANY
And in fact, to get exact matches and no regex, use respectively -w and -F:
$ grep -wFf f1 f2
POLLAND
GERMANY
If you really have to do it with awk, then you can use:
$ awk 'FNR==NR {a[$1]; next} $1 in a' f1 f2
POLLAND
GERMANY
FNR==NR is performed when reading the first file.
{a[$1]; next} stores in a[] the lines of the first file and goes to the next line.
$1 in a is evaluated when looping through the second file. It checks if the current line is within the a[] array.
Why wasn't your script working?
Because you used NEXT instead of next. So it was treated as a constant instead of a command.
Also, because the BEGIN { FS="\n" } was wrong, as the default FS is a space and it is ok to be like that. Setting it as a new line was making it misbehave.
Your command should maybe be:
awk 'NR==FNR{A[$1]++;next}A[$1]' file1 file2
You have a stray semi-colon after the closing brace of BEGIN{} and also have "NEXT" in capital letters and have mis-spelled your filename.
Try this one-liner:
awk 'NR==FNR{name[$1]++;next}$1 in name' file1.txt file2.txt
You iterate through first file NR==FNR storing the names in an array called names.
You use next to prevent the second action from happneing until first file is completely stored in array.
Once the first file is complete, you start the next file by checking if it is present in the array. It will print out the name if it exits.
FS is field separator. You don't need to set that to new line. You need RS which is Record Separator to be new line. But we don't do that here because that it the default value.
If you don't have to use awk, a better alternative might be the GNU coreutil, comm. From the man page:
comm -12 file1 file2 Print only lines present in both file1 and file2.

Resources