How to ignore all lines before a match occurs in bash? - bash

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql

Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.

How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.

This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.

awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'

you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END

Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file

This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Related

How to properly validate a part of the output of a command in BASH [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

bash scripting: Can I get sed to output the original line, then a space, then the modified line?

I'm new to Unix in all its forms, so please go easy on me!
I have a bash script that will pipe an ls command with arbitrary filenames into sed, which will use an arbitrary replacement pattern on the files, and then this will be piped into awk for some processing. The catch is, awk needs to know both the original file name and the new one.
I've managed everything except getting the original file names into awk. For instance, let's say my files are test.* and my replacement pattern is 's:es:ar;', which would change every occurrence of "test" to "tart". For testing purposes I'm just using awk to print what it's receiving:
ls "$#" | sed "$pattern" | awk '{printf "0: %s\n1: %s\n2: %s\n", $0,$1,$2}'
where test.* is in $# and the pattern is stored in $pattern.
Clearly, this doesn't get me to where I want to be. The output is obviously
0: tart.c
1: tart.c
2:
If I could get sed to output "test.c tart.c", then I'd have two parameters for awk. I've played around with the pattern to no avail, even hardcoding "test.c" into the replacement. But of course that just gave me amateur results like "ttest.c art.c". Is it possible for sed to remember the input, then work it into the beginning of the output? Do I even have the right ideas? Thanks in advance!
Two ways to change the first t in a b in the duplicated field.
Duplicate (& replays the matched part), change first word and swap (remember 2 strings with a space in between):
echo test.c | sed -r 's/.*/& &/;s/t/b/;s/([^ ]*) (.*)/\2 \1/'
or with more magic (copy original value to buffer, make the change, insert value from buffer as the first line and replace eond of line with a space)
echo test.c | sed 'h;s/t/b/;x;G;s/\n/ /'
Use Perl instead of sed:
echo test.c | perl -lne 'print "$_ ", s/es/ar/r'
-l removes the newline from input and adds it after each print. The /r modifier to the substitution returns the modified string instead of changing the variable (Perl 5.14+ needed).
Old answer, not working for s/t/b/2 or s/.*/replaced/2:
You can duplicate the contents of the line with s/.*/& &/, then just tell sed that it should only apply the second substitution (this works at least in GNU sed):
echo test.c | sed 's/.*/& &/; s/es/ar/2'
$ echo 'foo' | awk '{old=$0; gsub(/o/,"e"); print old, $0}'
foo fee

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"
If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}
I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.
Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.
When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2
grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

Delete all lines beginning with a # from a file

All of the lines with comments in a file begin with #. How can I delete all of the lines (and only those lines) which begin with #? Other lines containing #, but not at the beginning of the line should be ignored.
This can be done with a sed one-liner:
sed '/^#/d'
This says, "find all lines that start with # and delete them, leaving everything else."
I'm a little surprised nobody has suggested the most obvious solution:
grep -v '^#' filename
This solves the problem as stated.
But note that a common convention is for everything from a # to the end of a line to be treated as a comment:
sed 's/#.*$//' filename
though that treats, for example, a # character within a string literal as the beginning of a comment (which may or may not be relevant for your case) (and it leaves empty lines).
A line starting with arbitrary whitespace followed by # might also be treated as a comment:
grep -v '^ *#' filename
if whitespace is only spaces, or
grep -v '^[ ]#' filename
where the two spaces are actually a space followed by a literal tab character (type "control-v tab").
For all these commands, omit the filename argument to read from standard input (e.g., as part of a pipe).
The opposite of Raymond's solution:
sed -n '/^#/!p'
"don't print anything, except for lines that DON'T start with #"
you can directly edit your file with
sed -i '/^#/ d'
If you want also delete comment lines that start with some whitespace use
sed -i '/^\s*#/ d'
Usually, you want to keep the first line of your script, if it is a sha-bang, so sed should not delete lines starting with #!. also it should delete lines, that just contain only a hash but no text. put it all together:
sed -i '/^\s*\(#[^!].*\|#$\)/d'
To be conform with all sed variants you need to add a backup extension to the -i option:
sed -i.bak '/^\s*#/ d' $file
rm -Rf $file.bak
You can use the following for an awk solution -
awk '/^#/ {sub(/#.*/,"");getline;}1' inputfile
This answer builds upon the earlier answer by Keith.
egrep -v "^[[:blank:]]*#" should filter out comment lines.
egrep -v "^[[:blank:]]*(#|$)" should filter out both comments and empty lines, as is frequently useful.
For information about [:blank:] and other character classes, refer to https://en.wikipedia.org/wiki/Regular_expression#Character_classes.
If you want to delete from the file starting with a specific word, then do this:
grep -v '^pattern' currentFileName > newFileName && mv newFileName currentFileName
So we have removed all the lines starting with a pattern, writing the content into a new file, and then copy the content back into the source/current file.
You also might want to remove empty lines as well
sed -E '/(^$|^#)/d' inputfile
Delete all empty lines and also all lines starting with a # after any spaces:
sed -E '/^$|^\s*#/d' inputfile
For example, see the following 3 deleted lines (including just line numbers!):
1. # first comment
2.
3. # second comment
After testing the command above, you can use option -i to edit the input file in place.
Just this!
Here is it with a loop for all files with some extension:
ll -ltr *.filename_extension > list.lst
for i in $(cat list.lst | awk '{ print $8 }') # validate if it is the 8 column on ls
do
echo $i
sed -i '/^#/d' $i
done

How to get the part of a file after the first line that matches a regular expression

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
The following will print the line matching TERMINATE till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.
If you want the lines before TERMINATE:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.
As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)
The other solutions might consume a lot of memory if you use them on very large files.
If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).
This will print all lines after the TERMINATE-line.
Generalization:
You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.
Notes:
The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.
grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Use Bash parameter expansion like the following:
content=$(cat file)
echo "${content#*TERMINATE}"
There are many ways to do it with sed or awk:
sed -n '/TERMINATE/,$p' file
This looks for TERMINATE in your file and prints from that line up to the end of the file.
awk '/TERMINATE/,0' file
This is exactly the same behaviour as sed.
In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):
awk 'NR>=535' file
Example
$ seq 10 > a #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10
If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:
tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file
And the following will print from the following line matching TERMINATE till the end of the file:
tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file
It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.
Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:
awk '/TERMINATE/ {y=1;next} y' (Hai Vu's answer to 'grep +A': print everything after a match)
awk '/TERMINATE/ ? c++ : c' (Steven Penny's answer to 'grep +A': print everything after a match)
perl -ne 'print unless 1 .. /TERMINATE/' (tchrist's answer to 'grep +A': print everything after a match)
This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:
grep -A466 'TERMINATE' file
sed is a much better tool for the job:
sed -n '/re/,$p' file
where re is a regular expression.
Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.
This will print all lines from the last found line "TERMINATE" till the end of the file:
LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

Resources