bash grep newline - bash

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?

Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.

Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.

I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.

I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)

So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third

Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.

grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter

grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file

you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.

I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`

grep -v '^first' filename
Where the -v flag inverts the match.

Related

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no
With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no
You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search
I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

Grep/Sed/Awk Options

How could you grep or use sed or awk to parse for a dynamic length substring? Here are some examples:
I need to parse out everything except for the "XXXXX.WAV" in these strings, but the strings are not a set length.
Sometimes its like this:
{"filename": "/assets/JFM/imaging/19001.WAV"},
{"filename": "/assets/JFM/imaging/19307.WAV"},
{"filename": "/assets/JFM/imaging/19002.WAV"}
And sometimes like this:
{"filename": "/assets/JFM/LN_405999/101.WAV"},
{"filename": "/assets/JFM/LN_405999/102.WAV"},
{"filename": "/assets/JFM/LN_405999/103.WAV"}
Is there a great dynamic way to parse for just the .WAV? Maybe if I start at "/" and parse until "?
Edit:
Expected output like this:
19001.WAV
19307.WAV
19002.WAV
Or:
101.WAV
101.WAV
103.WAV
Just use grep as proposed in comments:
grep -o '[^/]\{1,\}\.WAV' yourfile
If the wav file always contains numbers, this seems more explicit (same result):
grep -o '[0-9]\{1,\}\.WAV'
Assuming there are [ and ] lines at the beginning and end of your file, it looks like your input is JSON, in which case I would recommend installing and using jq rather than text-based utilities, and doing something like this:
jq -r '.[]|.filename|split("/")[-1]'
But failing that, any of the tools listed will work just fine.
grep -o '[^/]*\.WAV'
or
sed -ne 's,.*/\([^/]*\.WAV\).*$,\1,p'
or
awk -F'"' '/WAV/ {split($4,a,"/"); print a[length(a)]}'
In each case there are a variety of other possible solutions as well.
Or with sed
$ sed 's,.*/,,; s,".*,,' x
101.WAV
102.WAV
103.WAV
Explanation:
s,.*/,, - delete everything up to and including the rightmost /
s,".*,, - delete everything starting with the leftmost " to the end of the line
another awk
awk -F'[/"]' '{print $(NF-1)}' file
19001.WAV
19307.WAV
19002.WAV
Try this -
awk -F'[{":}/]' '{print $(NF-2)}' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.WAV' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.[[:alpha:]]{3}' f
19001.WAV
19307.WAV
19002.WAV
You can easily change the value of digit and character as per your need for different example in egrep but awk will work fine for both case.
All of the programs you listed use regex to parse the names, so I will show you an example using grep, being probably the most basic one for this case.
There are a couple of options, depending on the exact way you define the XXX part before the ".wav".
Option 1, as you pointed out is just the file name, i.e., everything after the last slash:
grep -hoi "[^/]\+\.WAV"
This reads as "any character besides slash" ([^/]) repeated at least once (\+), followed by a literal .WAV (\.WAV).
Option 2 would be to only grab the digits before the extension:
grep -hoi "[[:digit:]]\+\.WAV"
OR
grep -hoi "[0-9]\+\.WAV"
These read as "digits" ([[:digit:]] and [0-9] mean the same thing) repeated at least once (\+), followed by a literal .WAV (\.WAV).
In all cases, I recommend using the flags -h, -o, -i, which I have concatenated into a single option -hoi. -h suppresses the file name from the output. -o makes grep only output the portion that matches. -i makes the match case insensitive, so should your extension ever change to .wav instead of .WAV, you'll be fine.
Also, in all cases, the input is up to you. You can pipe it in from another program, which will look like
program | grep -hoi "[^/]\+\.WAV"
You can get it from a file using stdin redirection:
grep -hoi "[^/]\+\.WAV" < somefile.txt
Or you can just pass the filename to grep:
grep -hoi "[^/]\+\.WAV" somefile.txt
awk -F/ '{print substr($5,1,7)}' file
101.WAV
102.WAV
103.WAV

sed bash substitution only if variable has a value

I'm trying to find a way using variables and sed to do a specific text substitution using a changing input file, but only if there is a value given to replace the existing string with. No value= do nothing (rather than remove the existing string).
Example
Substitute.csv contains 5 lines=
this-has-text
this-has-text
this-has-text
this-has-text
and file.text has one sentence=
"When trying this I want to be sure that text-this-has is left alone."
If I run the following command in a shell script
Text='text-this-has'
Change=`sed -n '3p' substitute.csv`
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
I end up with
"When trying this I want to be sure that is left alone."
But I'd like it to remain as
"When trying this I want to be sure that text-this-has is left alone."
Any way to tell sed "If I give you nothing new, do nothing"?
I apologize for the overthinking, bad habit. Essentially what I'd like to accomplish is if line 3 of the csv file has a value - replace $Text with $Change inline. If the line is empty, leave $Text as $Text.
Text='text-this-has'
Change=$(sed -n '3p' substitute.csv)
if [[ -n $Change ]]; then
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
fi
Just keep it simple and use awk:
awk -v t="$Text" -v c="$Change" 'c!=""{sub(t,c)} {print}' file
If you need inplace editing just use GNU awk with -i inplace.
Given your clarified requirement, this is probably what you actually want:
awk -v t="$Text" 'NR==FNR{if (NR==3) c=$0; next} c!=""{sub(t,c)} {print}' Substitute.csv file.txt
Testing whether $Change has a value before launching into the grep and sed is undoubtedly the most efficient bash solution, although I'm a bit skeptical about the duplication of grep and sed; it saves a temporary file in the case of files which don't contain the target string, but at the cost of an extra scan up to the match in the case of files which do contain it.
If you're looking for typing efficiency, though, the following might be interesting:
find . -name '*.txt' -exec sed -i "s|$Text|${Change:-&}|" {} \;
Which will recursively find all files whose names end with the extension .txt and execute the sed command on each one. ${Change:-&} means "the value of $Change if it exists and is non-empty, and otherwise an &"; & in the replacement of a sed s command means "the matched text", so s|foo|&| replaces every occurrence of foo with itself. That's an expensive no-op but if your time matters more than your cpu time, it might have been worth it.

awk/sed extract string from between patterns

I know there has probably been a few hundred forms of this question asked on stackoverflow, but I can't seem to find a suitable answer to my question.
I'm trying to parse through the /etc/ldap.conf file on a Linux box so that I can specifically pick out the description fields from between (description= and ):
*-bash-3.2$ grep '^nss_base_passwd' /etc/ldap.conf
nss_base_passwd ou=People,dc=ca,dc=somecompany,dc=com?one?|(description=TD_FI)(description=TD_F6)(description=TD_F6)(description=TRI_142)(description=14_142)(description=REX5)(description=REX5)(description=1950)*
I'm looking to extract these into their own list with no duplicates:
TD_FI
TD_F6
TRI_142
14_142
REX5
1950
(or all on one line with a proper delimiter)
I had played with sed for a few hours but couldn't get it to work - I'm not entirely sure how to use the global option.
You could use grep with -P option,
$ grep '^nss_base_passwd' /etc/ldap.conf | grep -oP '(?<=description\=)[^)]*' | uniq
TD_FI
TD_F6
TRI_142
14_142
REX5
1950
Explanation:
A positive lookbehind is used in grep to print all the characters which was just after to the description= upto the next ) bracket. uniq command is used to remove the duplicates.
perl -nE 'say join(",", /description=\K([^)]+)/g) if /^nss_base_passwd/' /etc/ldap.conf
TD_FI,TD_F6,TD_F6,TRI_142,14_142,REX5,REX5,1950
Try this:
grep '^nss_base_passwd' /etc/ldap.conf |
grep -oE '[(]description=[^)]*' | sort -u |
cut -f2- -d=
Explanations:
With bash, if you end a line with | (or || or &&), the shell knows that the command continues on the next line, so you don't need to use \.
The second grep uses the -o flag to indicate that the matching expressions should be printed out, one per line. It also uses the -E flag to indicate that the pattern is an "Extended" (i.e. normal) regular expression.
Since -o will print the entire match, we need to extract the part after the prefix, for which we use cut, specifying a delimiter of =. -f2- means "all the fields starting with the second field", which we need in case there is an = in the description.
Avinash's answer was very close. Here is my improved version:
grep '^nss_base_passwd' /etc/ldap.conf | grep -Po '\(description=\K[^)]+' | sort -u
There is no need to use lookaround syntax when you can simply use \K (which is actually a shortcut for a corresponding zero-width assertion).
Also, you said that you want NO duplicates, but uniq will only remove duplicate adjacent lines, it will not remove duplicates if there is something in between. That's why I am using sort -u instead.

How to grep, excluding some patterns?

I'd like find lines in files with an occurrence of some pattern and an absence of some other pattern. For example, I need find all files/lines including loom except ones with gloom. So, I can find loom with command:
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Now, I want to search loom excluding gloom. However, both of following commands failed:
grep -v 'gloom' -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
grep -n 'loom' -v 'gloom' ~/projects/**/trunk/src/**/*.#(h|cpp)
What should I do to achieve my goal?
EDIT 1: I mean that loom and gloom are the character sequences (not necessarily the words). So, I need, for example, bloomberg in the command output and don't need ungloomy.
EDIT 2: There is sample of my expectations.
Both of following lines are in command output:
I faced the icons that loomed through the veil of incense.
Arty is slooming in a gloomy day.
Both of following lines aren't in command output:
It’s gloomyin’ ower terrible — great muckle doolders o’ cloods.
In the south west round of the heigh pyntit hall
How about just chaining the greps?
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Another solution without chaining grep:
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Between brackets, you exclude the character g before any occurrence of loom, unless loom is the first chars of the line.
A bit old, but oh well...
The most up-voted solution from #houbysoft will not work as that will exclude any line with "gloom" in it, even if it has "loom". According to OP's expectations, we need to include lines with "loom", even if they also have "gloom" in them. This line needs to be in the output "Arty is slooming in a gloomy day.", but this will be excluded by a chained grep like
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Instead, the egrep regex example of Bentoy13 works better
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
as it will include any line with "loom" in it, regardless of whether or not it has "gloom". On the other hand, if it only has gloom, it will not include it, which is precisely the behaviour OP wants.
Just use awk, it's much simpler than grep in letting you clearly express compound conditions.
If you want to skip lines that contains both loom and gloom:
awk '/loom/ && !/gloom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
or if you want to print them:
awk '/(^|[^g])loom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
and if the reality is you just want lines where loom appears as a word by itself:
awk '/\<loom\>/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
-v is the "inverted match" flag, so piping is a very good way:
grep "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)| grep -v "gloom"
Simply use! grep -v multiple times.
#Content of file
[root#server]# cat file
1
2
3
4
5
#Exclude the line or match
[root#server]# cat file |grep -v 3
1
2
4
5
#Exclude the line or match multiple
[root#server]# cat file |grep -v "3\|5"
1
2
4
/*You might be looking something like this?
grep -vn "gloom" `grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)`
The BACKQUOTES are used like brackets for commands, so in this case with -l enabled,
the code in the BACKQUOTES will return you the file names, then with -vn to do what you wanted: have filenames, linenumbers, and also the actual lines.
UPDATE Or with xargs
grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp) | xargs grep -vn "gloom"
Hope that helps.*/
Please ignore what I've written above, it's rubbish.
grep -n "loom" `grep -l "loom" tt4.txt` | grep -v "gloom"
#this part gets the filenames with "loom"
#this part gets the lines with "loom"
#this part gets the linenumber,
#filename and actual line
You can use grep -P (perl regex) supported negative lookbehind:
grep -P '(?<!g)loom\b' ~/projects/**/trunk/src/**/*.#(h|cpp)
I added \b for word boundaries.
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Question: search for 'loom' excluding 'gloom'.
Answer:
grep -w 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)

Resources