grep two files but some words not found - bash

I have a list of names in fileA and want to get those lines in fileB.csv that contain a name in the list.
fileA looks like
noah
liam
jacob
mason
william
fileB.csv looks like
noah,1
liam,2
yoyoyo,44
williams,4
william,5
I want to output
noah,1
liam,2
william,5
But I got
noah,1
liam,2
What I did is (sed 's/$/,/' fileA | grep -wf fileA fileB.csv)
or even grep -wf fileA fileB.csv
However, I have no idea why some words not shown up.

You are modifying the fileA with sed so that you can use this modified file as the pattern for grep. If so, you need to do it properly:
grep -wf <(sed 's/$/,/' fileA) fileB.csv
Otherwise, sed ... | grep -wf fileA fileB was just doing the grep command, without taking into account the sed part of cleaning the file.
However, there is no need to add any comma to make this work, since this alone does it:
$ grep -wf fileA fileB
noah,1
liam,2
william,5
Note also that adding a comma will break the matching when using -w:
$ echo "hello,bye" | grep -w "hello"
hello,bye
$ echo "hello,bye" | grep -w "hello,"
$

Related

Cut files from 1 files and grep -v from another files

File1
Ada
Billy
Charles
Delta
Eight
File2
Ada,User,xxx
Beba,User,xxx
Charles,Admin,xxx
I am exuting the following
Acc=`cut -d',' -f1 $PATH/File2
for account in `cat $File1 |grep -v Acc`
do
cat......
sed....
How to correct this?>
Expect output
Check file2 account existing on file1
Ada
Charles
This awk should work for you:
awk -F, 'FNR == NR {seen[$1]; next} $1 in seen' file2 file1
Ada
Charles
If this is not the output you're looking for then edit your question and add your expected output.
Your grep command searches for files which do not contain the string Acc. You need the Flag -f, which causes grep to accept a list of pattern from a file, something like this:
tmpf=/tmp/$$
cut -d',' -f1 File2 >$tmpf
for account in $(grep -f "$tmpf" File1)
do
...
done

Unix/bash :Print filename as first string before each line in a log file

Looking for help on how to get append the name of the file as 1st string in each row of the file.
A file which has only content. I am trying to merge 2 files but with the content should have first string as the name of the file then row 1. etc. Lets consider 2 files with name FileA and FileB. FileA has 2 lines, FileB has 2 lines.
FileA
Tom is Cat
Jerry is Mouse
FileB
Cat is Tom
Mouse is Jerry
Expected Output of merged file
FileA Tom is Cat
FileA Jerry is Mouse
FileB Cat is Tom
FileB Mouse is Jerry.
I am struggling to find a solution to this. Please help
Use sed to substitute the filename at the beginning of each line of the file:
sed 's/^/FileA /' fileA >> mergedFile
sed 's/^/FileB /' fileB >> mergedFile
For an arbitrary number of files you can loop over all the filenames, and construct the sed substitution command dynamically using the variable with the filenames.
while read -r f
do
sed "s|^|$f |" "$f"
done < file.txt > merge.txt
Using awk and brace expansion.
awk '{print FILENAME, $0}' file{A,B} | tee mergefile
file names can be anything if that is not what you have, just put them as argument with awk
awk '{print FILENAME, $0}' filefoo filebar filemore ...
Can be done with grep also if your grep has the -H option/flag
grep -H . fileA fileB
Again filenames can be anything.
Using tee to send the output to stdout and mergefile.
If you prefer ripgrep over grep these two commands produce the same output:
$ grep --with-filename '' File*
$ rg --with-filename --no-heading --no-line-number '' File*
FileA:Tom is Cat
FileA:Jerry is Mouse
FileB:Cat is Tom
FileB:Mouse is Jerry

moving records by getting the specific number's prefix

I want to segregate some records to another file in bash.
I have two files, one that contains all the records which is FILE_A and another that contains possible prefixes (FILE_B) of the numbers contained in FILE_A.
With the values of FILE A and B below, I want to move the records from FILE A that has prefixes contained in FILE_B. Please take note that I must compare 11th to 19th digits of FILE_A only.
To further understand my query, please refer also the output below.
Thank you.
$cat FILE_A
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx755602188xxxx02345
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
xxxxxxxxxx082550123xxxx08234
note: num=11th to 19th digit
file that contains num_prefix
$cat FILE_B
04466358
0446646
02345
08234
note: num_prefix=all the values above
OUTPUT:
cat new_generated_file
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
It is important that the script may only compare the 11th-19th digit of File_A to File_B because the last 5-digit may affect the output.
Like this one:
$ sed 's/^0//' File_B > File_C; grep -f File_C File_A
gives me this output
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx755602188xxxx02345
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
xxxxxxxxxx082550123xxxx08234
(xxxxxxxxxx755602188xxxx02345 and xxxxxxxxxx082550123xxxx08234 were not supposed to be there)
because 08234 02345 (last 5 digits) are both in File_C
You can use:
grep -f <(sed 's/^0//' fileB) fileA
xxxxxxxxxx234575234xxxxx
xxxxxxxxxx044664690xxxxx
xxxxxxxxxx044663581xxxxx
xxxxxxxxxx082340123xxxxx
Update:
sed 's/^0//' fileB > fileC
while read -r f; do
`echo "$f" | cut -c 11-19 | grep -qf fileC` && echo "$f"
done < fileA
Update 2:
sed 's/^0//' fileB > fileC
cut -c 11-19 fileA | grep -f fileC > fileD
grep -F -f fileD fileA
#!/bin/ksh
sed -n '/^*$/ !{s/^0\{1,\}/0*/;s/^/^.{10}/;p;}' fileB > /tmp/CleanPrefix.egrep
egrep -f /tmp/CleanPrefix.egrep fileA
rm /tmp/CleanPrefix.egrep
Should also work for bash (without the first line to stay in bash)
It use a (e)grep regex as selector to avoid interference with surrounding content
Assuming the fileA is always starting with 10 char (any printing char)
Adapt to allow number starting with 0 to be with oy without in the fileA
posix sed compliant so use --posix option on GNU sed

grep "output of cat command - every line" in a different file

Sorry title of this question is little confusing but I couldnt think of anything else.
I am trying to do something like this
cat fileA.txt | grep `awk '{print $1}'` fileB.txt
fileA contains 100 lines while fileB contains 100 million lines.
What I want is get id from fileA, grep that id in a different file-fileB and print that line.
e.g fileA.txt
1234
1233
e.g.fileB.txt
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
Expected output is
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
Getting rid of cat and awk altogether:
grep -f fileA.txt fileB.txt
awk alone can do that job well:
awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' fileA fileB
see the test:
kent$ head a b
==> a <==
1234
1233
==> b <==
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
kent$ awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' a b
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
EDIT
add explanation:
-F'|' #| as field separator (fileA)
'NR==FNR{a[$0];next;} #save lines in fileA in array a
$1 in a #if $1(the 1st field) in fileB in array a, print the current line from FileB
for further details I cannot explain here, sorry. for example how awk handle two files, what is NR and what is FNR.. I suggest that try this awk line in case the accepted answer didn't work for you. If you want to dig a little bit deeper, read some awk tutorials.
If the id's are on distinct lines you could use the -f option in grep as such:
cut -d "|" -f1 < fileB.txt | grep -F -f fileA.txt
The cut command will ensure that only the first field is searched for in the pattern searching using grep.
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line.
The empty file contains zero patterns, and therefore matches nothing.
(-f is specified by POSIX.)

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Resources