Cut words from files based on grep - bash

I have a small bash script as follows :
cat foo.txt | grep "balt" > bar_file
Ideally what I would like to happen is that every word that contains "balt", I would like removed from the foo.txt file. Can I get direction on how to basically move words from one file from another based on whats grepped.

As a side note: There is no need to use cat and pipe its output to grep since you can pass the filename directly to grep which reduces a single process execution.
As for your question you can -o option of grep to get matching words only having balt in them along with \b boundary checking like this:
$ cat foo.txt
abcd baltabcd xyz
xdef abbaltcd xyz
balt
$ grep -o '\b\w*balt\w*\b' foo.txt
baltabcd
abbaltcd
balt
$ grep -o '\b\w*balt\w*\b' foo.txt > bar_file
$ cat bar_file
baltabcd
abbaltcd
balt
$
As you can see grep matches 0 or more word characters present before or after balt and puts that into another file.
Example words were: baltabcd, abbaltcd and balt

Related

variables can't be used in grep pattern matching

I'm trying to use a shell script to get the string from one file, then use it to get the matched sentence.
The script looks like this:
function find_str (){
echo $1
grep -e "\/$1" info.txt
}
for word in $(<./name.TXT); do
#egrep -w "$word" info.txt #this can't work either
find_str $word
done
It turns out that find_str $word cannot match some string like "/WORD1 balabala"
Any suggestion about this short piece of script?
What is the "/$1" part about? Are you looking for literal slashes? And do you need the function?
$ cat > info.txt
testing 123 dog
nothing matches a catalyst
doggerel is not poetry
my cat is a maine coone
$ cat > name.txt
dog cat
With -w and gnu grep, only lines with matching whole words are listed:
$ for word in $(<name.txt); do grep -w "$word" info.txt; done
testing 123 dog
my cat is a maine coone
Without the -w flag, all lines containing a match are listed:
$ for word in $(<name.txt); do grep "$word" info.txt; done
testing 123 dog
doggerel is not poetry
nothing matches a catalyst
my cat is a maine coone

Keep lines in common between two files, but including duplicates [duplicate]

The following command gives me a list of matching expressions:
grep -f /tmp/list Filename* > /tmp/output
The list file is then parsed and used to search Filename* for the parsed string. The results are then saved to output.
How would I output the parsed string from list in the case where there is no match in Filename*?
Contents of the list file could be:
ABC
BLA
ZZZ
HJK
Example Files:
Filename1:5,ABC,123
Filename2:5,ZZZ,342
Result of Running Command:
BLA
HJK
Stack overflow question 2480584 looks like it may be relevant, through the use of an if statement. However I'm not sure how to output the parsed string to the output file. Would require some type of read line?
TIA,
Mic
Obviously, grep -f list Filename* gives all matches of patterns from the file list in the files specified by Filename*, i.e.,
Filename1:5,ABC,123
Filename2:5,ZZZ,342
in your example.
By adding the -o (only print matching expression) and -h (do not print filename) flags, we can turn this into:
ABC
ZZZ
Now you want all patterns from list that are not contained in this list, which can be achieved by
grep -f list Filename* -o -h | grep -f /dev/stdin -v list
where the second grep takes it's patterns from the output of the first and by using the -v flag gives all the lines of file list that do not match those patterns.
This makes it:
$ grep -v "$(cat Filename* | cut -d, -f2)" /tmp/list
BLA
HJK
Explanation
$ cat Filename* | cut -d, -f2
ABC
ZZZ
And then grep -v looks for the inverse matching.

command to count occurrences of word in entire file

I am trying to count the occurrences of a word in a file.
If word occurs multiple times in a line, I will count is a 1.
Following command will give me the output but will fail if line has multiple occurrences of word
grep -c "word" filename.txt
Is there any one liner?
You can use grep -o to show the exact matches and then count them:
grep -o "word" filename.txt | wc -l
Test
$ cat a
hello hello how are you
hello i am fine
but
this is another hello
$ grep -c "hello" a # Normal `grep -c` fails
3
$ grep -o "hello" a
hello
hello
hello
hello
$ grep -o "hello" a | wc -l # grep -o solves it!
4
Set RS in awk for a shorter one.
awk 'END{print NR-1}' RS="word" file
GNU awk allows it to be done in single command with use of multiple piped commands:
awk -v w="word" '$1==w{n++} END{print n}' RS=' |\n' file
cat file | cut -d ' ' | grep -c word
This assumes that all words in the file have spaces between the words. If there's punctuation concatenating the word to itself, or otherwise no spaces on a single line between the word and itself, they'll count as one.
grep word filename.txt | wc -l
grep prints the lines that match, then wc -l prints the number of lines matched

How to select (grep) many different patterns by bash via a pipe?

My task
I have a file A.txt with the following content.
aijdish uhuih
buh iiu hhuih
zhuh hiu
d uhiuhg ui
...
I want to select lines with these words aijdish, d, buh ...
I only know that I can:
cat A.txt | grep "aijdish" > temp.txt
cat A.txt | grep "d" >> temp.txt
cat A.txt | grep "buh" >> temp.txt
...
But I have several thousands of words need to select this time, how can I do this under bash?
Since you have many words you want to look for I suggest putting the pattern into a file and use greps -f option:
$ cat grep-pattern.txt
aijdish
buh
d
$ grep -f grep-pattern.txt inputfile
aijdish uhuih
buh iiu hhuih
d uhiuhg ui
But if you have words like d you might want to add the -w option to match only whole words and not parts of words.
grep -wf grep-pattern.txt inputfile
$ grep -E "aijdish|d|buh" inputfile
aijdish uhuih
buh iiu hhuih
d uhiuhg ui
Store the words to be searched in a file (say a.txt) and then write a script for searching every line in a.txt and matching it in the required file

How to grep return result as the matching term

I would like to return only the first instance (case-insensitive) of the term I used to search (if there's a match), how would I do this?
example:
$ grep "exactly-this"
Binary file /Path/To/Some/Files/file.txt matches
I would like to return the result like:
$ grep "exactly-this"
exactly-this
grep has an inbuilt count argument
You can use the -m option to give a count argument to grep
grep -m 1 "exactly-this"
If you want to avoid the message in case of the binary files,use
grep -a -m 1 "exactly-this"
Note that this will print the word in which the match occurred.Since it is a binary file,the word may span over multiple lines
What you need is the -o option of grep.
From the man page
-o, --only-matching
Prints only the matching part of the lines.
Test:
[jaypal:~/Temp] cat file
This is a file with some exactly this in the middle
with exactly this in the begining
and some at the very end in brackets (exactly this)
[jaypal:~/Temp] grep -o 'exactly this' file
exactly this
exactly this
exactly this
[jaypal:~/Temp] grep -om1 'exactly this' file
exactly this

Resources