searching/extracting specific word not the complete line in unix - shell

I would like to search a word in a file in Unix that should return only the word not the complete line.
For ex:
Sample.text:
Hello abc hi aeabcft 123abc OK
Expected output:
abc
aeabcft
123abc
If I search abc in file Sample.txt using grep, it will return complete line but I want the words that contains abc

You can use grep -Eo with an enhanced regex to search all matching words
grep -Eo '\b[[:alnum:]]*abc[[:alnum:]]*\b' Sample.text
abc
aeabcft
123abc
As per man grep:
-o, --only-matching
Prints only the matching part of the lines.

If you need to also do other processing to the file that grep can't accomplish, you could use Awk for printing only the regex match on the line.
awk -v r="abc" '{m=match($0,r,a)}m{print a[0]}' file
Otherwise I'd just use anubhava's grep -o suggestion as it's shorter and clearer.

Related

Duplicate entries in file

I have a file with content as below,
123 ABC
12345 ABC-test
In the shell script, I need an exact entry instead of two duplicate results, but unable to get the exact entry.
For example:
grep "ABC"
returns both the entries, but I want a specific entry, i.e., if I search for "ABC", I should get only "123 ABC" and not the other entry.
Since you consider words to be whitespace-separated chunks, it is easier to use awk here since it reads lines (records) and splits them into fields (non-whitespace chunks) by default:
awk '$2=="ABC"' file > newfile
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' file > newfile
Here, the first awk will output all lines where the second word is ABC. The second awk outputs all lines with ABC followed/preceded with a whitespace or at start/end of the line.
See the online demo:
#!/bin/bash
s='123 ABC
12345 ABC-test'
awk '$2=="ABC"' <<< "$s"
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' <<< "$s"
Output:
123 ABC
You have to forge proper regex (regular expression) - in this case you want only those lines, where ABC is not surrounded by other characters (is on boundaries):
grep -e '\bABC\b'
should do the work. -e switch enables extended regular expressions in grep. Check also some regex tutorials, i.e. https://www.regular-expressions.info/tutorial.html.

Remove certain characters or keywords from a TXT file in bash

I was wondering if there was a way to remove certain keywords from a text file, say I have a large file with lines saying
My name is John
My name is Peter
My name is Joe
Would there be a way to remove "My name is" without removing the entire line? Could this be done with grep somehow? I tried to find a solution but pretty much all of the ones I came across simply focus on deleting entire lines. Even if I could delete the text up until a certain column, that would fix my issue.
You need a text processing tool like sed or awk to do this, but not grep.
Try this:
sed 's/My name is//g' file
EDIT
Purpose of grep:
$ man grep | grep -A2 DESCRIPTION
DESCRIPTION
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a
match to the given PATTERN. By default, grep prints the matching lines.
With GNU grep:
grep -Po "My name is\K.*" file
Output with a leading white space:
John
Peter
Joe
-P: Interpret PATTERN as a Perl regular expression
-o: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
\K: Remove matched part before \K.
try with one more simple grep.
grep -o '[^ ]*$' Input_file
-o will print only matched part of line, now in regex where it will look for text from last space to till last of the line.
An awk solution which first removes empty
lines and then prints last field.
awk '!/^$/{print $NF}' file
John
Peter
Joe
Using cut:
cut -d' ' -f4 input_file
GNU cut features a complement option, used to remove the area specified with -f. If the input_file had surnames such as "My name is John Doe", the previous code would print "John", and this would print "John Doe":
cut --complement -d' ' -f1-3 input_file
cut needs less memory, compared to other utils:
# these numbers will vary by *nix version and disto...
wc -c `which cut sed awk grep` | head -n -1 | sort -n
43224 /usr/bin/cut
109000 /bin/sed
215360 /bin/grep
662240 /usr/bin/awk

shell script cut from variables

The file is like this
aaa&123
bbb&234
ccc&345
aaa&456
aaa$567
bbb&678
I want to output:(contain "aaa" and text after &)
123
456
I want to do in in shell script,
Follow code be consider
#!/bin/bash
raw=$(grep 'aaa' 1.txt)
var=$(cut -f2 -d"&" "$raw")
echo $var
It give me a error like
cut: aaa&123
aaa&456
aaa$567: No such file or directory
How to fix it? and how to cut (or grep or other) from exist variables?
Many thanks!
With GNU grep:
grep -oP 'aaa&\K.*' file
Output:
123
456
\K: ignore everything before pattern matching and ignore pattern itself
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
Cyrus has my vote. An awk alternative if GNU grep is not available:
awk -F'&' 'NF==2 && $1 ~ /aaa/ {print $2}' file
Using & as the field separator, for lines with 2 fields (i.e. & must be present) and the first field contains "aaa", print the 2nd field.
The error with your answer is that you are treating the grep output like a filename in the cut command. What you want is this:
grep 'aaa.*&' file | cut -d'&' -f2
The pattern means "aaa appears before an &"

Extract all characters after a match - shell script

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'
You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16
One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

How to extract specific lines from a file in bash?

I want to extract the string from a line which starts with a specific pattern from a file in shell script.
For example: I want the strings from lines that start with hello:
hi to_RAm
hello to_Hari
hello to_kumar
bye to_lilly
output should be
to_Hari
to_kumar
Can anyone help me?
sed is the most appropriate tool:
sed -n 's/^hello //p'
Use grep:
grep ^hello file | awk '{print $2}'
^ is to match lines that starts with "hello". This is assuming you want to print the second word.
If you want to print all words except the first then:
grep ^hello file | awk '{$1=""; print $0}'
You could use GNU grep's perl-compatible regexes and use a lookbehind:
grep -oP '(?<=hello ).*'

Resources