Cut specific words matching a pattern from a text file - shell

I am trying to extract words from a text file matching a pattern using shell script. For example if a line contains
This is a sample text to illustrate my scenario text=info id=2342
Second line to illustrate text=sample id=q2312
I want the output to be like
text=info id=2342
text=sample id=q2312
Can somone please tell me how do i achieve it using cut/grep command?

You can do the following:
grep -P -o 'text=\S+ id=\S+'
The -P flag for grep enables the perl regular expression. \S+ will match all non blank space characters, -o outputs only the matched portion.
Assuming you need to get the fields "text" and "id" values. Modify the regular expression as required. For perl regular expressions see here or extended regular expressions see man grep

This grep -oP should work:
grep -oP '\b(\w+=\w+(\s+|$))+' file
text=info id=2342
text=sample id=q2312

With cut you need to process extra (cut from first = and add text=):
echo "text=$(echo "This is a sample text to illustrate my scenario text=info id=2342" | cut -d= -f2-)"
With sed:
echo "This is a sample text to illustrate my scenario text=info id=2342" | sed 's/.* text=/text=/'

Related

grep for exact word in a file containing "."

I have a file named "TestGrep" that contains content as shown below
#!/bin/bash
/ParentFolder/a #email1.com
/ParentFolder/b #email2.com
/ParentFolder/.a #email1.com
/ParentFolder/.b #email2.com
/ParentFolder/ #email3.com
I am using the below grep command
grep -Fw "/ParentFolder/" TestGrep
The output is
/ParentFolder/.a #email1.com
/ParentFolder/.b #email2.com
/ParentFolder/ #email3.com
It is somehow ignoring the dots in the TestGrep file.
I want the output to be shown as below
/ParentFolder/ #email3.com
How can I query using grep command that would just check if the exact string match is done and return output as expected.
Could you please try following. Using -E option of grep here.
grep -E '/ParentFolder/\s+' Input_file
From man grep about -E option of grep:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression
\s+ means looks for spaces one or more occurrences.

Read word after a specific word on the same line dont have space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
Demo.txt
IN=../files/d
out=../files/d
dataload
name
i want to read "d" from above line.
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
this code helping if line having "."
IN=../files/d.txt
so its printing "d"
here we have "d" without "." as end delimeter. So i want to read till end of line.
i/p :
Demo.txt
IN=../files/d
out=../files/d
dataload
name
output looking for:
d
d
code: in bash
You could use GNU grep with PCRE :
grep -oP '/files/\K[^.]+' file
The -P flag makes grep use PCRE, the -o makes it display only the matched part rather than the full line, and the \K in the regex omits what precedes from the displayed matched part.
Alternatively if you don't have access to GNU grep, the following perl command will have the same effect :
perl -nle 'print $& if m{/files/\K[^.]+}' file
Sample run.
This sed variant should work for you:
sed -n '/\/files\// s~.*/files/\([^.]*\).*~\1~p' file
d
d
Minor change from earlier sed is that it doesn't match \. right after first capture group.
When you don't want to think about a single command solution, you can use
grep -Eo "/files/." Demo.txt | cut -d/ -f3

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no
With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no
You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search
I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

Grep for URL parsing - bash script programming

I am trying to learn some bash scripting and i can't understand how to use grep in order to split a URL link for example :
blabla1.com
blabla2.gov
blabla3.fr
I just want to keep com , gov and fr ( without the '.' character) ignore whats before '.'
Thanks in advance ..
Grep is a tool for matching text. You need something else if you want to transform text. If you have the values in question in a bash variable, then what you ask is pretty easy:
authority=blabla.com
# Here's the important bit:
domain=${authority/*./}
echo $domain
The funny syntax in the middle evaluates to the result of a pattern substitution on the value of variable temp.
If you're trying to do this on lines of a file, then the sed program is your friend:
sed 's/.*\.//' < input.file
This is again a pattern substitution, but sed uses regular expression patterns, whereas bash uses shell glob patterns.
grep -E -o '[^.]+$' < input
-o instructs grep to print only the matching part of the line
-E switches on extended regexp which is needed for + quantifier
[^.]+$ means any character which is not a dot at the end of the line
Try this way:
grep -o -E '[a-z]{2,3}\b' input > output
-o, --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
$ cat input
blabla1.com
blabla2.gov
blabla3.fr
$ cat output
com
gov
fr
$ cut -d. -f2 file
com
gov
fr
If that's not all you need, post some more truly representative input and expected output so we can help you find the right solution.

Extract all characters after a match - shell script

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'
You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16
One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

Resources