Extract all characters after a match - shell script - shell

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'

You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16

One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

Related

bash scripting: Can I get sed to output the original line, then a space, then the modified line?

I'm new to Unix in all its forms, so please go easy on me!
I have a bash script that will pipe an ls command with arbitrary filenames into sed, which will use an arbitrary replacement pattern on the files, and then this will be piped into awk for some processing. The catch is, awk needs to know both the original file name and the new one.
I've managed everything except getting the original file names into awk. For instance, let's say my files are test.* and my replacement pattern is 's:es:ar;', which would change every occurrence of "test" to "tart". For testing purposes I'm just using awk to print what it's receiving:
ls "$#" | sed "$pattern" | awk '{printf "0: %s\n1: %s\n2: %s\n", $0,$1,$2}'
where test.* is in $# and the pattern is stored in $pattern.
Clearly, this doesn't get me to where I want to be. The output is obviously
0: tart.c
1: tart.c
2:
If I could get sed to output "test.c tart.c", then I'd have two parameters for awk. I've played around with the pattern to no avail, even hardcoding "test.c" into the replacement. But of course that just gave me amateur results like "ttest.c art.c". Is it possible for sed to remember the input, then work it into the beginning of the output? Do I even have the right ideas? Thanks in advance!
Two ways to change the first t in a b in the duplicated field.
Duplicate (& replays the matched part), change first word and swap (remember 2 strings with a space in between):
echo test.c | sed -r 's/.*/& &/;s/t/b/;s/([^ ]*) (.*)/\2 \1/'
or with more magic (copy original value to buffer, make the change, insert value from buffer as the first line and replace eond of line with a space)
echo test.c | sed 'h;s/t/b/;x;G;s/\n/ /'
Use Perl instead of sed:
echo test.c | perl -lne 'print "$_ ", s/es/ar/r'
-l removes the newline from input and adds it after each print. The /r modifier to the substitution returns the modified string instead of changing the variable (Perl 5.14+ needed).
Old answer, not working for s/t/b/2 or s/.*/replaced/2:
You can duplicate the contents of the line with s/.*/& &/, then just tell sed that it should only apply the second substitution (this works at least in GNU sed):
echo test.c | sed 's/.*/& &/; s/es/ar/2'
$ echo 'foo' | awk '{old=$0; gsub(/o/,"e"); print old, $0}'
foo fee

Read word after a specific word on the same line dont have space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
Demo.txt
IN=../files/d
out=../files/d
dataload
name
i want to read "d" from above line.
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
this code helping if line having "."
IN=../files/d.txt
so its printing "d"
here we have "d" without "." as end delimeter. So i want to read till end of line.
i/p :
Demo.txt
IN=../files/d
out=../files/d
dataload
name
output looking for:
d
d
code: in bash
You could use GNU grep with PCRE :
grep -oP '/files/\K[^.]+' file
The -P flag makes grep use PCRE, the -o makes it display only the matched part rather than the full line, and the \K in the regex omits what precedes from the displayed matched part.
Alternatively if you don't have access to GNU grep, the following perl command will have the same effect :
perl -nle 'print $& if m{/files/\K[^.]+}' file
Sample run.
This sed variant should work for you:
sed -n '/\/files\// s~.*/files/\([^.]*\).*~\1~p' file
d
d
Minor change from earlier sed is that it doesn't match \. right after first capture group.
When you don't want to think about a single command solution, you can use
grep -Eo "/files/." Demo.txt | cut -d/ -f3

Grep for URL parsing - bash script programming

I am trying to learn some bash scripting and i can't understand how to use grep in order to split a URL link for example :
blabla1.com
blabla2.gov
blabla3.fr
I just want to keep com , gov and fr ( without the '.' character) ignore whats before '.'
Thanks in advance ..
Grep is a tool for matching text. You need something else if you want to transform text. If you have the values in question in a bash variable, then what you ask is pretty easy:
authority=blabla.com
# Here's the important bit:
domain=${authority/*./}
echo $domain
The funny syntax in the middle evaluates to the result of a pattern substitution on the value of variable temp.
If you're trying to do this on lines of a file, then the sed program is your friend:
sed 's/.*\.//' < input.file
This is again a pattern substitution, but sed uses regular expression patterns, whereas bash uses shell glob patterns.
grep -E -o '[^.]+$' < input
-o instructs grep to print only the matching part of the line
-E switches on extended regexp which is needed for + quantifier
[^.]+$ means any character which is not a dot at the end of the line
Try this way:
grep -o -E '[a-z]{2,3}\b' input > output
-o, --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
$ cat input
blabla1.com
blabla2.gov
blabla3.fr
$ cat output
com
gov
fr
$ cut -d. -f2 file
com
gov
fr
If that's not all you need, post some more truly representative input and expected output so we can help you find the right solution.

searching/extracting specific word not the complete line in unix

I would like to search a word in a file in Unix that should return only the word not the complete line.
For ex:
Sample.text:
Hello abc hi aeabcft 123abc OK
Expected output:
abc
aeabcft
123abc
If I search abc in file Sample.txt using grep, it will return complete line but I want the words that contains abc
You can use grep -Eo with an enhanced regex to search all matching words
grep -Eo '\b[[:alnum:]]*abc[[:alnum:]]*\b' Sample.text
abc
aeabcft
123abc
As per man grep:
-o, --only-matching
Prints only the matching part of the lines.
If you need to also do other processing to the file that grep can't accomplish, you could use Awk for printing only the regex match on the line.
awk -v r="abc" '{m=match($0,r,a)}m{print a[0]}' file
Otherwise I'd just use anubhava's grep -o suggestion as it's shorter and clearer.

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources