Finding a whole string using grep - bash

I'm trying to find whole strings using grep. I am familiar with -w flag, but it gives me hard time since it refers a dot as a delimiter.
For example, I have a file named "a.txt" and a directory names a in some directory, this is what happens:
> ls | grep -w a
a
a.txt
What I want it to find is only "a" and that's it.
How can I do that?

If you want a single a on a line, use
grep '^a$'
If you only take whitespace as the delimiter, use
grep '\([[:space:]]\|^\)a\([[:space:]]\|$\)'
(i.e. whitespace or beginning of the line, a, whitespace or end of the line).

use the x optin of grep
ls | grep -x a

A simpler approach too would be:
grep '^[[:space:]]*a[[:space:]]*$'
Something more friendly with variables is by use of awk. It would not interpret input pattern as regex.
awk -v v="$var" '{ sub(/^[[:space:]]*/, ""); sub(/[[:space:]]*$/, ""); }; $0 == v;'

Related

grep -v *string* and grep -v string creating wildly different results

grep -v mystring myfile.txt
returns ~300KB
grep -v *mystring* myfile.txt
returns ~7GB
....what am I doing wrong here?
Your regular expression is wrong. By default grep takes regular expressions as argument along with the command line flags. The one you have attempted *mystring* is a shell glob expression which expands to a possible set of filenames containing the string mystring. So your grep commands becomes the following; on an assumption that you have filenames containing mystring
grep -v mystring1 foomystring2 foomystring3 myfile.txt
which could produce unexpected results depending on the contents of those files. The right way would be to use the greedy match quantifier .*
grep -v '.*mystring1.*' myfile.txt

Read word after a specific word on the same line dont have space between them

How can I extract a word that comes after a specific word in bash ? More precisely, I have a file which has a line which looks like this:
Demo.txt
IN=../files/d
out=../files/d
dataload
name
i want to read "d" from above line.
sed -n '/\/files\// s~.*/files/\([^.]*\)\..*~\1~p' file
this code helping if line having "."
IN=../files/d.txt
so its printing "d"
here we have "d" without "." as end delimeter. So i want to read till end of line.
i/p :
Demo.txt
IN=../files/d
out=../files/d
dataload
name
output looking for:
d
d
code: in bash
You could use GNU grep with PCRE :
grep -oP '/files/\K[^.]+' file
The -P flag makes grep use PCRE, the -o makes it display only the matched part rather than the full line, and the \K in the regex omits what precedes from the displayed matched part.
Alternatively if you don't have access to GNU grep, the following perl command will have the same effect :
perl -nle 'print $& if m{/files/\K[^.]+}' file
Sample run.
This sed variant should work for you:
sed -n '/\/files\// s~.*/files/\([^.]*\).*~\1~p' file
d
d
Minor change from earlier sed is that it doesn't match \. right after first capture group.
When you don't want to think about a single command solution, you can use
grep -Eo "/files/." Demo.txt | cut -d/ -f3

Grep/Sed/Awk Options

How could you grep or use sed or awk to parse for a dynamic length substring? Here are some examples:
I need to parse out everything except for the "XXXXX.WAV" in these strings, but the strings are not a set length.
Sometimes its like this:
{"filename": "/assets/JFM/imaging/19001.WAV"},
{"filename": "/assets/JFM/imaging/19307.WAV"},
{"filename": "/assets/JFM/imaging/19002.WAV"}
And sometimes like this:
{"filename": "/assets/JFM/LN_405999/101.WAV"},
{"filename": "/assets/JFM/LN_405999/102.WAV"},
{"filename": "/assets/JFM/LN_405999/103.WAV"}
Is there a great dynamic way to parse for just the .WAV? Maybe if I start at "/" and parse until "?
Edit:
Expected output like this:
19001.WAV
19307.WAV
19002.WAV
Or:
101.WAV
101.WAV
103.WAV
Just use grep as proposed in comments:
grep -o '[^/]\{1,\}\.WAV' yourfile
If the wav file always contains numbers, this seems more explicit (same result):
grep -o '[0-9]\{1,\}\.WAV'
Assuming there are [ and ] lines at the beginning and end of your file, it looks like your input is JSON, in which case I would recommend installing and using jq rather than text-based utilities, and doing something like this:
jq -r '.[]|.filename|split("/")[-1]'
But failing that, any of the tools listed will work just fine.
grep -o '[^/]*\.WAV'
or
sed -ne 's,.*/\([^/]*\.WAV\).*$,\1,p'
or
awk -F'"' '/WAV/ {split($4,a,"/"); print a[length(a)]}'
In each case there are a variety of other possible solutions as well.
Or with sed
$ sed 's,.*/,,; s,".*,,' x
101.WAV
102.WAV
103.WAV
Explanation:
s,.*/,, - delete everything up to and including the rightmost /
s,".*,, - delete everything starting with the leftmost " to the end of the line
another awk
awk -F'[/"]' '{print $(NF-1)}' file
19001.WAV
19307.WAV
19002.WAV
Try this -
awk -F'[{":}/]' '{print $(NF-2)}' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.WAV' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.[[:alpha:]]{3}' f
19001.WAV
19307.WAV
19002.WAV
You can easily change the value of digit and character as per your need for different example in egrep but awk will work fine for both case.
All of the programs you listed use regex to parse the names, so I will show you an example using grep, being probably the most basic one for this case.
There are a couple of options, depending on the exact way you define the XXX part before the ".wav".
Option 1, as you pointed out is just the file name, i.e., everything after the last slash:
grep -hoi "[^/]\+\.WAV"
This reads as "any character besides slash" ([^/]) repeated at least once (\+), followed by a literal .WAV (\.WAV).
Option 2 would be to only grab the digits before the extension:
grep -hoi "[[:digit:]]\+\.WAV"
OR
grep -hoi "[0-9]\+\.WAV"
These read as "digits" ([[:digit:]] and [0-9] mean the same thing) repeated at least once (\+), followed by a literal .WAV (\.WAV).
In all cases, I recommend using the flags -h, -o, -i, which I have concatenated into a single option -hoi. -h suppresses the file name from the output. -o makes grep only output the portion that matches. -i makes the match case insensitive, so should your extension ever change to .wav instead of .WAV, you'll be fine.
Also, in all cases, the input is up to you. You can pipe it in from another program, which will look like
program | grep -hoi "[^/]\+\.WAV"
You can get it from a file using stdin redirection:
grep -hoi "[^/]\+\.WAV" < somefile.txt
Or you can just pass the filename to grep:
grep -hoi "[^/]\+\.WAV" somefile.txt
awk -F/ '{print substr($5,1,7)}' file
101.WAV
102.WAV
103.WAV

Grep (fgrep) bash exact match end of line

I have the below example file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/apersand $ file
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/file[with square brackets]
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/~$tempfile
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
d41d8cd98f00b204e9800998ecf8427e /home/abid/Testing/FileNamesTest/single quote's
I want to grep the last part of the file (the file name) but I'm after an exact match for the last part of the line (the file name)
grep FileThree$ files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
gives back an exact match and doesnt find "FileThreeDays" which is what I'm after but because some of the file names contains square brackets it I'm having to use grep -F or fgrep. However using fgrep like the above doesnt work it returns nothing.
How can I exact match the last part of the line using fgrep whilst still honoring the special characters above ~ / $ / ' / [ ] etc...or any other method using maybe awk...
Further....
using fgrep withou return both these files I only want an exact match (using the use of the $ above with grep), but $ with fgrep doesnt return anything.
grep -F FileThree files.md5
017a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThree
217a3635ccb76250b2036d6aea330c80 /home/abid/Testing/FileNamesTest/FileThreeDays
I can't tell all the details from your question, but it sounds like you can use grep and just escape the special characters: grep 'File\[Three\]Days$'
If you want to use fgrep, though, you can use some tr tricks to help you. If all you want is the filename (without the directory name), you can do something like
cat files.md5 | tr '/' '\n' | fgrep FileThreeDays
That tr command replaces slashes with newlines, so it will put each filename on its own line. That means that fgrep will only find the filename when it searches for FileThreeDays.
If you want the full filename with directory, it's a little trickier, but a similar approach will work. Assuming that there's always a double space between the SHA and the filename, and that there aren't any filenames with double spaces or tab characters in them, you can try something like this:
sed 's/ /\t' files.md5 | tr '\t' '\n' | fgrep FileThreeDays
That sed command converts the double spaces to tabs. The tr command turns those tabs into newlines (the same trick as above).
I would use awk:
awk '{$1="";print}' file
$1="" cuts the first column to an empty string, and print prints the modified line - which only contains the filename now.
However, this leaves a blank space at the start of each line. If you care about it and want to remove it, set the output field separator to an empty string:
awk '{$1="";print}' OFS="" file

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources