Finding a particular string from a file - bash

I have a file that contains one particular string many times. How can I print all occurrences of the string on the screen along with letters following that word till the next space is encountered.
Suppose a line contained:
example="123asdfadf" foo=bar
I want to print example="123asdfadf".
I had tried using less filename | grep -i "example=*" but it was printing the complete lines in which example appeared.

$ grep -o "example[^ ]*" foo
example="abc"
example="123asdfadf"

Since -o is only supported by GNU grep, a portable solution would be to use sed:
sed -n 's/.*\(example=[^[:space:]]*\).*/\1/p' file

Related

extract and display matching strings with similar pattern

I have an extremely long line. It contains a lot of strings with a similar patterns like below;
t[0-4]_vmdk_[a-z]_anything
t followed by a single digit with possible 0-4, and them '_vmdk_', and any long string with possible [a-z], then finally "_" in the end. The rest could be anything.
Ex:
asdfasfsa/_asdf**t2_vmdk_abc_**badfad**t3_vmdk_xyz_**asdfasdf**t1_vmdk_efg_**asbafdfb....
Please help me to display all such strings. Thank you!
The simplest is probably with grep. If the input string comes from the standard output of another command (e.g. echo):
$ str='asdfasfsa/_asdf**t2_vmdk_abc_**badfad**t3_vmdk_xyz_**asdfasdf**t1_vmdk_efg_**asbafdfb....'
$ echo "$str" | grep -o 't[0-4]_vmdk_[a-z]*'
t2_vmdk_abc
t3_vmdk_xyz
t1_vmdk_efg
Explanation: the -o option prints "only the matched (non-empty) parts of a matching line".
If the input strings are stored in a file:
$ grep -o 't[0-4]_vmdk_[a-z]*' file.txt
t2_vmdk_abc
t3_vmdk_xyz
t1_vmdk_efg

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

How do I get rid of “--” line separator when using grep

I'm using the commands given below for splitting my fastq file into two separate paired end reads files:
grep '#.*/1' -A 3 24538_7#2.fq >24538_7#2_1.fq
grep '#.*/2' -A 3 24538_7#2.fq >24538_7#2_2.fq
But it's automatically introducing a -- line separator between the entries. Hence, making my fastq file inappropriate for further processing(because it then becomes an invalid fastq format).
So, I want to get rid of the line separator(--).
PS: I've found the answer for Linux machine but I'm using MacOS, and those didn't work on Mac terminal.
You can use the --no-group-separator option to suppress it (in GNU grep).
Alternatively, you could use (GNU) sed:
sed '\|#.*/1|,+3!d'
deletes all lines other than the one matching #.*/1 and the next three lines.
For macOS sed, you could use
sed -n '\|#.*/1|{N;N;N;p;}'
but this gets unwieldy quickly for more context lines.
Another approach would be to chain grep with itself:
grep '#.*/1' -A 3 file.fq | grep -v "^--"
The second grep selects non-matching (-v) lines that start with -- (though this pattern can sometimes be interpreted as a command line option, requiring some weird escaping like "[-][-]", which is why i put the ^ there).

How to extract a string at end of line after a specific word

I have different location, but they all have a pattern:
some_text/some_text/some_text/log/some_text.text
All locations don't start with the same thing, and they don't have the same number of subdirectories, but I am interested in what comes after log/ only. I would like to extract the .text
edited question:
I have a lot of location:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
Just to show you that I don't know what they are, the user enters these location, so I have no idea what the user enters. The only I know is that it always contains log/ followed by the name of the file.
I would like to extract the type of the file, whatever string comes after the dot
THe only i know is that it always contains log/ followed by the name
of the file.
I would like to extract the type of the file, whatever string comes
after the dot
based on this requirement, this line works:
grep -o '[^.]*$' file
for your example, it outputs:
text
You can use bash built-in string operations. The example below will extract everything after the last dot from the input string.
$ var="some_text/some_text/some_text/log/some_text.text"
$ echo "${var##*.}"
text
Alternatively, use sed:
$ sed 's/.*\.//' <<< "$var"
text
Not the cleanest way, but this will work
sed -e "s/.*log\///" | sed -e "s/\..*//"
This is the sed patterns for it anyway, not sure if you have that string in a variable, or if you're reading from a file etc.
You could also grab that text and store in a sed register for later substitution etc. All depends on exactly what you are trying to do.
Using awk
awk -F'.' '{print $NF}' file
Using sed
sed 's/.*\.//' file
Running from the root of this structure:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
This seems to work, you can skip the echo command if you really just want the file types with no record of where they came from.
$ for DIR in *; do
> echo -n "$DIR "
> find $DIR -path "*/log/*" -exec basename {} \; | sed 's/.*\.//'
> done
f l
s p
t h

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources