Searching for Strings - shell

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!

Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c

comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh

awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Related

Cut files from 1 files and grep -v from another files

File1
Ada
Billy
Charles
Delta
Eight
File2
Ada,User,xxx
Beba,User,xxx
Charles,Admin,xxx
I am exuting the following
Acc=`cut -d',' -f1 $PATH/File2
for account in `cat $File1 |grep -v Acc`
do
cat......
sed....
How to correct this?>
Expect output
Check file2 account existing on file1
Ada
Charles
This awk should work for you:
awk -F, 'FNR == NR {seen[$1]; next} $1 in seen' file2 file1
Ada
Charles
If this is not the output you're looking for then edit your question and add your expected output.
Your grep command searches for files which do not contain the string Acc. You need the Flag -f, which causes grep to accept a list of pattern from a file, something like this:
tmpf=/tmp/$$
cut -d',' -f1 File2 >$tmpf
for account in $(grep -f "$tmpf" File1)
do
...
done

How could I compare two files and remove similar rows in them (bash script)

I have two files of data with similar number of columns. I'd like to save file2 in another file (file3) while I exclude the rows which are existed already in the file1.
grep -v -i -f file1 file2> file3
But the problem is that the space between columns in the file1 is "\t" while in the other one it is just " ". Therefore this command line doesn't work.
Any suggestion??
Thanks folks!
You can convert tabs to spaces on the fly:
grep -vif <(tr '\t' ' ' < file1) file2 > file3
This is process substitution.
Try:
grep -Fxvf file1 file2
Switch meanings available from the grep man page.
grep -v -f is problematic because it searches file2 for each line in file1. With large files it will take a very long time. Try this instead:
comm -13 <(cat file1 | tr '\t' ' ' | sort) <(sort file2)

grep from file doesn't write result to an output file

I'm trying to grep strings from file2.csv using existing strings from file1.csv and write matched lines to result.csv file. A have a following bash script:
cat file1.csv | while read line; do
grep $line ./file2.csv > result.csv
done
But afterall the result.csv is always empty. When I do manual grep from file2.csv everything works fine. What I do wrong?
file1.csv:
15098662745072
15098662745508
file2.csv:
";"0";"15098662745072";"4590";"4590";"
";"0";"15098662745508";"6400";"6400";"
";"0";"15098662745515";"6110";"6110";"
";"0";"15098662745812";"7970";"7970";"
expected result (result.csv):
";"0";"15098662745072";"4590";"4590";"
";"0";"15098662745508";"6400";"6400";"
> keeps overwriting the file. Use >> to append to it.
Instead of using a loop, you can simply use the -f option in grep to make grep read patterns from the file.
grep -f file1.csv file2.csv > result.csv
If you have to use a loop, use the following approach:
while read line; do
grep "$line" ./file2.csv
done < file1.csv > result.csv
You should be using awk for this, not grep, because:
a) grep does not by default look for strings, it looks for regular expressions. You need to use fgrep or grep -F or awk instead of grep to search for strings.
b) You really only want to match the numbers from file1.csv when they appear as a full specific field in file2.csv, not wherever they occur on the line.
awk -F'";"' 'NR==FNR{a[$0];next} $3 in a' file1.csv file2.csv > result.csv

grep "output of cat command - every line" in a different file

Sorry title of this question is little confusing but I couldnt think of anything else.
I am trying to do something like this
cat fileA.txt | grep `awk '{print $1}'` fileB.txt
fileA contains 100 lines while fileB contains 100 million lines.
What I want is get id from fileA, grep that id in a different file-fileB and print that line.
e.g fileA.txt
1234
1233
e.g.fileB.txt
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
Expected output is
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
Getting rid of cat and awk altogether:
grep -f fileA.txt fileB.txt
awk alone can do that job well:
awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' fileA fileB
see the test:
kent$ head a b
==> a <==
1234
1233
==> b <==
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
kent$ awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' a b
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
EDIT
add explanation:
-F'|' #| as field separator (fileA)
'NR==FNR{a[$0];next;} #save lines in fileA in array a
$1 in a #if $1(the 1st field) in fileB in array a, print the current line from FileB
for further details I cannot explain here, sorry. for example how awk handle two files, what is NR and what is FNR.. I suggest that try this awk line in case the accepted answer didn't work for you. If you want to dig a little bit deeper, read some awk tutorials.
If the id's are on distinct lines you could use the -f option in grep as such:
cut -d "|" -f1 < fileB.txt | grep -F -f fileA.txt
The cut command will ensure that only the first field is searched for in the pattern searching using grep.
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line.
The empty file contains zero patterns, and therefore matches nothing.
(-f is specified by POSIX.)

How can I replace lines in a text file with lines from another file based on matching key fields?

input.txt
1,Ram,Fail
2,John,Fail
3,Ron,Success
param.txt (New Input)
1,Sam,Success
2,John,Sucess
Now i want to replace the whole line in input.txt with those present in param.txt .
1st column will act like a primary key.
Output.txt
1,Sam,Success
2,John,Sucess
3,Ron,Success
I tried as
awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' input.txt param.txt > Output.txt
But it is merging the file contents.
This might work for you (GNU sed):
sed 's|^\([^,]*,\).*|/^\1/c\\&|' param.txt | sed -f - input.txt
Explanation:
Convert param.txt into a sed script using the first field as an address to change the line in the input.txt. s|^\([^,]*,\).*|/^\1/c\\&|
Run the script against the input.txt. sed -f - input.txt
This can be done with one call to sort:
sort -t, -k1,1n -us param.txt input.txt
Use a stable numerical sort on the first comma-delimited field, and list param.txt before input.txt so that the correct, newer, lines are preferred when eliminating duplicates.
You could use join(1) to make this work:
$ join -t, -a1 -j1 Input.txt param.txt | sed -E 's/,.*?,.*?(,.*?,.*?)/\1/'
1,Sam,Success
2,John,Sucess
3,Ron,Success
sed as a pipe tail strips fields from Input.txt out of replaced lines.
This will work only if both input files are sorted by first field.
Pure awk isn't really the right tool for the job. If you must use only awk, https://stackoverflow.com/a/5467806/1301972 is a good starting point for your efforts.
However, Unix provides some tools that will help with feeding awk the right input for what you're trying to do.
$ join -a1 -t, <(sort -n input.txt) <(sort -n param.txt) |
awk -F, 'NF > 3 {print $1 "," $4 "," $5; next}; {print}'
Basically, you're feeding awk a single file with the lines joined on the keys from input.txt. Then awk can parse out the fields you want for proper display or for redirection to your output file.
This should work in awk
awk -F"," 'NR==FNR{a[$1]=$0;next} ($1 in a){ print a[$1]; next}1' param.txt input.txt
Test:
$ cat input.txt
1,Ram,Fail
2,John,Fail
3,Ron,Success
$ cat param.txt
1,Sam,Success
2,John,Sucess
$ awk -F"," 'NR==FNR{a[$1]=$0;next} ($1 in a){ print a[$1]; next}1' param.txt input.txt
1,Sam,Success
2,John,Sucess
3,Ron,Success

Resources