Find Nth occurrence of a string using Regex - bash

So I have a string at the beginning of a line and can find all of them. I am using ^$string to match and I have thousands of these and an error occurs on a specific line. Let's say I was trying to get to the 100th occurrence of this pattern how would I do so?
For example, I can grep ^$string and list all but I would like to find a specific one.

grep has -m / --max-count option:
grep -m100 '^String' | tail -1
will give you the 100th matched line.
Note:
the -m100 will make grep stop reading the input file if 100 matches are hit. It's pretty useful if you are reading a huge file
the tail command is very fast since it doesn't read the content.

You can use sed to print only a single line of your grep's output :
grep "^$string" inputFile | sed -n '100p'
-n has output disabled by default, 100p prints the input to the output stream for the 100th line only.
Or as #dan mentions in the comments :
grep "^$string" inputFile | sed '100!d;q'

Related

Read word after a specific word on the same line dont have space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
Demo.txt
IN=../files/d
out=../files/d
dataload
name
i want to read "d" from above line.
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
this code helping if line having "."
IN=../files/d.txt
so its printing "d"
here we have "d" without "." as end delimeter. So i want to read till end of line.
i/p :
Demo.txt
IN=../files/d
out=../files/d
dataload
name
output looking for:
d
d
code: in bash
You could use GNU grep with PCRE :
grep -oP '/files/\K[^.]+' file
The -P flag makes grep use PCRE, the -o makes it display only the matched part rather than the full line, and the \K in the regex omits what precedes from the displayed matched part.
Alternatively if you don't have access to GNU grep, the following perl command will have the same effect :
perl -nle 'print $& if m{/files/\K[^.]+}' file
Sample run.
This sed variant should work for you:
sed -n '/\/files\// s~.*/files/\([^.]*\).*~\1~p' file
d
d
Minor change from earlier sed is that it doesn't match \. right after first capture group.
When you don't want to think about a single command solution, you can use
grep -Eo "/files/." Demo.txt | cut -d/ -f3

Get value of parameter from text file using sed

I am trying to extract the value of a parameter from the text file.
Below is my text file with uri_param as parameter.
application.txt
---------------
uri_param=frontier://tenant=stripe;env=qa;svc=new-service#stripe-ftr-qa.stripe.nxz.com:80
command:
--------
egrep ^uri_param application.txt | sed -e 's/.*=//'
I am expecting the strign after first = as output i.e. frontier://tenant=stripe;env=qa;svc=new-service#stripe-ftr-qa.stripe.nxz.com:80, but the output I am getting is new-service#stripe-ftr-qa.stripe.nxz.com:80.
How can I fix this? What I found till now is .* in sed is greedy and it will match the longest string after =.
sed -n '/^uri_param=/ {s///p;q;}' YourFile
extract only the first occurance of the uri_param, remove this uri_param= (replace by nothing) and print it then quit.
OK for small and medium file (a grep is faster enough on a big file like 100 Mb)
sed -r 's/^[_0-9a-zA-Z]+=//g' File_Name
You can try this way
sed -r s'/[^=]+.(.*)/\1/g' File_Name
We can filter it in the grep part:
grep -o "^uri_param=.*:[0-9]\{1,\}" infile|sed -e "s/^uri_param=//"
Or use a more flexible tool like gawk:
gawk 'match($0, "^uri_param=(.*:[0-9]+)", r){print r[1]}' infile
NOTE: If your url doesn't finish allways with the port number the pattern should be adjusted.

Printing a line of a file given line number

Is it possible, in UNIX, to print a particular line of a file? For example I would like to print line 10 of file example.c. I tried with cat, ls, awk but apparently either these don't have the feature or I'm not able to properly read the man :-).
Using awk:
awk 'NR==10' file
Using sed:
sed '10!d' file
sed -n '10{p;q;}' example.c
will print the tenth line of example.c for you.
Try head and tail, you can specify the amount of lines and where to start.
To get the third line:
head -n 3 yourfile.c | tail -n 1
head -n 10 /tmp/asdf | tail -n 1
Unfortunately, all other solutions which use head/tail will NOT work incorrectly if line number provided is larger than total number of lines in our file.
This will print line number N or nothing if N is beyond total number of lines:
grep "" file | grep "^20:"
If you want to cut line number from output, pipe it through sed:
grep "" file | grep "^20:" | sed 's/^20://'
Try this:
cat -n <yourfile> | grep ^[[:space:]]*<NUMBER>[[:space:]].*$
cat -n numbers the file
the regex of grep searches the line numbered ;-)
The original mismatched as mentioned in the comments.
Te current one looks for the exact match.
- i.e. in the particular cas we need a line starting with an arbitrary amount () of spaces the followed by a space followed by whatever (.)
In case anyone thumbles over this regex and doesn't get it at all - here is a good tutorial to get you started: http://regex.learncodethehardway.org/book/ (it uses python regex as an example tough).
This might work for you:
sed '10q;d' file

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources