grep excluding first char - bash

How do I find a line where a pattern is in middle of line. i.e. in the following example. I want to only get 8th line but exclude 1st and 5th line grepping "#"
I know i would use grep "^#" to find only in first character but how to exclude it?
#DD65WKN1:203:H7T67ADXX:2:2216:19936:100494 1:N:0:
GTCGTTCTTCAGGTTCTC
+
FFFFFIIIIFFFIFFFFF
#DD65WKN1:203:H7T67ADXX:2:2216:6629:100501 1:N:0:
TAAAGTAGCAAAAATG
+
FFFFFFFFIFBFIFFF#DD65WKN1:203:H7T67ADXX:2:2216:6629:100501 1:N:0:
TAAAGTAGCAAAAATG
+
FFFFFFFFIFBFIFFF
Thanks

You can match any character beforehand, so that # won't be matched if just in the first position:
$ grep '.#' file
FFFFFFFFIFBFIFFF#DD65WKN1:203:H7T67ADXX:2:2216:6629:100501 1:N:0:
Note that . matches any character. To be completely sure (first solution would match a line starting with ##), you can negate # by using:
grep '[^#]#' file
Or also indicate that you want to find any line starting with a no-# set of characters (at least one, as indicated by +).
grep '^[^#]\+#' file

Use grep with Perl-regex option which supports negative lookbehind.
$ grep -P '(?<!^)#' file
FFFFFFFFIFBFIFFF#DD65WKN1:203:H7T67ADXX:2:2216:6629:100501 1:N:0:
The above grep command will print the line which doesn't have # symbol at the begining but it may present anwhere on that line.

The best thing about unix filters is combining them
grep --invert-match '^#' file | grep '#'
or more traditionally
sed '/^#/d' file | grep '#'

Related

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

Grep-ing for a line beginning with a Maven version number

I'm trying to grep a file for a line that begins with a version number of the form:
X.Y.Z
where X, Y and Z are numbers between 0 and infinity.
As an example say the the line of interest begins 20.2.3
The following will return a result if the first character of the line is a digit:
grep ^[0-9]
The result is:
20.2.3`
where the bold indicates what grep has 'matched on'.
However this will also match lines beginning 4000-43 which I do not want.
So in my regex naivety I tried the following grep:
grep ^[0-9]+\.[0-9]+\.[0-9]
thinking this would match any line beginning with any number followed by two other numbers separated by decimal-points. But it does not.
If I try:
grep ^[0-9]+
it doesn't match anything at all.
How do I modify my regex to match the number format I'm looking for?
Regex + (one or more) quantifier should be escaped with \ when using in BRE (basic regular expressions) mode (default mode):
grep '^[0-9]\+\.[0-9]\+\.[0-9]' <<<"20.2.3"
20.2.3
Otherwise, to make it work - use -E option to enable ERE (extended regular expressions) mode:
grep -E '^[0-9]+\.[0-9]+\.[0-9]' <<<"20.2.3"
20.2.3
The following is a pretty good one, just add \ before the +.
grep "^[0-9]\+\.[0-9]\+\.[0-9]" <filename>
Since you want lines starting with numbers and periods but not starting with a period (-o to weed out anything following the version number):
$ echo 20.2.3 foo |
grep -o ^[^.][0-9.]\\+
20.2.3

Find string then from there pull numbers

Im starting to code bash and not the best but i have a situation. I have an output like:
Configuration file 'hello2.conf' is in use by process 735.
Ending
I want to extract the process ID 735.
I seen answers were to extract ONLY numbers from outputs but then i am left with 2735?
How can i go about extracting 735 from the output? I was thinking search for process then grab number after perhaps?
Thanks!
Use GNU grep with its Perl Compatible Regular Expression capabilities enabled with the -P flag and print only the matching entry using -o flag.
grep -Po 'process \K[0-9]+' <<<"Configuration file 'hello2.conf' is in use by process 735."
735
Use it in a command line as
.. | grep -Po 'process \K[0-9]+'
where the \K escape sequence stands for
\K: This sequence resets the starting point of the reported match. Any previously matched characters are not included in the final matched sequence.
RegEx Demo
You might want to use a regular expressions:
[[ "$line" =~ ([0-9]+)\.$ ]] && echo "${BASH_REMATCH[1]}"
This should match any number at the end of the line, select the number part, and print it!
Good Luck!
If you line remains the same, use cut -d" " -f 9
sed can extract only the numbers at the specific location of the message (using \(...\) match grouping and \1 replacement).
... | sed "s#^Configuration file '.*' is in use by process \([0-9]*\)\.#\1#"

how to grep the following

I have an input file
RAKESH_ONE
RAKESH-TWO
RAKESH123
RAKESHTHREE
/RAKESH/
FIVERAKESH
456RAKESH
WELCOME123
This is RAKESH
I would like to get the output
RAKESH_ONE
RAKESH-TWO
/RAKESH/
This is RAKESH
I want to print the line matching the pattern RAKESH. If the pattern is prefixed or suffixed with alphanumeric we should avoid it.
([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)
This will match patterns on the lines without alphanumeric prefixes or suffixes. It will not match the whole line, but if used with grep or sed you can output just the lines you need.
UPDATE
As requested, here's the full grep command. Use the -E option to use extended regex:
grep -E "([^a-zA-Z0-9]+|^)RAKESH([^a-zA-Z0-9]+|$)" file.txt

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

Resources