Print lines after a pattern until second occurrence of a different pattern - bash

So I know how to print lines from one pattern to another pattern:
sed -ne '/pattern_1/,/pattern_2/ p'
Which works for input that looks like this:
random_line_1
pattern_1
random_line_2
random_line_3
random_line_4
random_line_5
pattern_2
random_line_6
So that lines from pattern_1 to pattern_2 get printed.
But how can I print lines until the second occurrence of the second pattern:
random_line_1
pattern_1
pattern_2
random_line_3
random_line_4
random_line_5
pattern_2
random_line_6
I want to print the lines from pattern_1 to the second pattern_2 so that I get this as output:
pattern_1
pattern_2
random_line_3
random_line_4
random_line_5
pattern_2
More specifically, I am trying to capture text, starting at a header, that is surrounded by empty lines, that may or may not have text before the header and after the second empty line (where pattern_1 is the header and pattern_2 is the empty line):
Header:
<empty line>
Some_text
Some_more_text
Even_more_text
When_will_it_stop
<empty line>
Preferably, a sed answer would work best since I know a little bit about how it works, but I would be open to awk submissions, as long as every piece of the command is explained.

I'm not at a machine on which to test, but you should be able to do something very simple to understand just with grep and its "context" switches (-A, -B and -C).
So to delete all lines before pattern1, simply find pattern1 and all lines after (-A):
grep -A 9999 "pattern1" YourFile
Then, in the result, search for the second occurrence (-m2) of pattern2 and everything before (-B):
grep -A 9999 "pattern1" YourFile | grep -B 9999 -m2 "pattern2"

A simpler sed example for your specific case:
sed -ne '/pattern_1/,/pattern_2/{/pattern_1/N;p}'
This just says that within the range, suck the line after the header pattern_1 into the pattern space and print it. This means that if the line after pattern_1 is pattern_2, that occurence of pattern_2 will not count for the range.
In other words:
sed -ne '/Header/,/^$/{/Header/N;p}'

Could you please try following.
awk '/pattern_1/{a=1}
a<3 && a;
/pattern_2/{a++}
' Input_file
Adding code with explanation as follows too.
awk '/pattern_1/{a=1} ##Searching for string /pattern_1/ in a line, if it is present in a line then making variable a value as 1.
a<3 && a; ##Now checking if variable a value is less than 3 and it is NOT NULL, so if both conditions are TRUE then didnot define any action, so by default print action will happen on current line of the Input_file.
/pattern_2/{a++} ##Searching string pattern_2 in a line and incrementing the value of variable a with 1 each time it sees this string in a line.
' Input_file ##mentioning Input_file name over here.

This might work for you (GNU sed):
sed -n '/pattern_1/{:a;N;s/pattern_2/&/2p;Ta}' file
From the address containing pattern_1 gather up the following lines in the pattern space until the second substitution of the pattern_2 has been successfully printed.
For your particular example, use:
sed -n '/Header:/{:a;N;s/\n$/&/mp2;Ta}' file
N.B. the m flag allows for matching on multiple lines within the pattern space. The T command is the opposite of the t command, which jumps to location :x (where x is user defined) when the previous substitution is successful.

Related

Find the first empty line and print line number

I want to assign the first value of
grep -E --line-number --with-filename '^$' filename
to a variable. This command will return the line number of every empty line in my data file, which occur at the same interval as the first empty line like:
filename:122:
filename:244:
filename:366:
Is there a way to only return the line number of the first empty line - i.e. 122?
That'd be easier with AWK.
awk '! NF { print NR; exit }' file
You can limit the number of matches per file using -m.
Therefore, to generate a list of files and line numbers of their first empty lines use
grep -E --line-number --with-filename -m1 '^$' list of files
or equivalent but shorter
grep -EnHm1 '^$' list of files
A fairly concise sed one-liner:
sed '/./d;=;q' file
You can specify the -n option if the extra newline isn't desired.
Notes:The title and text in the question are not consistent. The empty line is not equivalent to the blank line. The sed command above will print the line number of the first empty line. If printing the first blank line is intended, the . should be replaced with [^[:blank:]].

Sed doubts when n occurrences are used

I'm trying to replace the nth occurrence of a substring in a file. I tried to achieve this using sed but all attempts failed to give me the desired output. Some of the attempts are:
sed 's/old/new/g'
sed 's/old/new/3'
sed 's/old/new/3g'
The most common usage of sed is to perform a replacement such as
sed 's/foo/bar/' file
This will replace the first occurrence of the string foo by the string bar and it will do this for every line in file.
If you want to replace the 3rd occurrence of the string foo only, but do this for every line, then you can write:
sed 's/foo/bar/3' file.
Finally, if you want to replace all occurrences, then you use :
sed 's/foo/bar/g' file.
Any combination such as
sed 's/foo/bar/3g' file
results in unspecified behaviour.
If you want to replace the nth occurrence in a file than sed is not the right tool, but perl or awk might be better.
If you know you have maximum one occurrence of "foo" per line, you can do
awk '/foo/{c++}(c==n){sub("foo","bar")}1' file
If more than a single occurrence per line might appear it becomes a bit more tricky, various solutions are possible:
awk 'BEGIN{FS="foo";OFS="bar";n=5}
(c<n) && (c+NF-1>=n) {
for(i=1;i<NF;++i) printf $i ((++c==n) ? OFS : FS); print $NF; next
}
{c+=NF-1; print}' file

substitute a letter at a specific position in the file itself using bash

I am trying to do this:
I have a file with content like below;
file:
abcdefgh
I am looking for a way to do this;
file:
aBCdefgh
So,make the 2nd and 3rd letter "capital/uppercase" in the file itself, because I have to do multiple conversions at different positions in a string in the file. Can someone please help me to know how to do this?
I came to know something like this below, but it does only for a single first character of the string in the file:
sed -i 's/^./\U&/' file
output:
Abcdefgh
Thanks much!
Change your sed approach to the following:
sed -i 's/\(.\)\(..\)/\1\U\2/' file
$ cat file
aBCdefgh
matching section:
\(.\) - match the 1st char of the string into the 1st captured group
\(..\) - match the next 2 chars placing into the 2nd captured group
replacement section:
\1 - points to the 1st parenthesized group \1 i.e. the 1st char
\U\2 - uppercase the characters from the 2nd captured group \2
Bonus approach for I want to capitalize "105th & 106th" characters:
sed -Ei 's/(.{104})(..)/\1\U\2/' file
awk on duty.
echo "abcdefgh" | awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'
Output will be as follows.
aBCdefgh
In case you have a Input_file and you want to save the edits into same Input_file.
awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}' Input_file > temp_file && mv temp_file Input_file
Explanation: Please run above code as this is only for explanation purposes.
echo "abcdefgh" ##using echo command to print a string on the standard output.
| ##Pipe(|) is used for taking a command's standard output to pass as a standard input to another command(in this case echo is passing it's standard output to awk).
awk '{ ##Starting awk here.
##Print command in awk is being used to print anything variable, string etc etc.
##substring is awk's in-built utility which will allow us to get the specific parts of the line, variable. So it's syntax is substr(line/variable,starting point of the line/number,number of characters you need from the strating point mentioned), in case you haven't mentioned any number of characters it will take all the characters from starting point to till the end of the line.
##toupper, so it is also a awk's in-built utility which will covert any text to UPPER CASE passed to it, so in this case I am passing 2nd and 3rd character to it as per OP's request.
print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'

How do I delete every line from a file matching the first line

I want to delete every line of a file that matches the first line, but not delete the first line.
I've tried this code so far, but it deletes all matching patterns including the first line.
sed -i "1,/$VARIABLE_CONTAINING_PATTERN/d" $MY_FILE.txt
Following your description literally - to delete every line of a file that matches the first line, but not delete the first line, awk solution:
Let's say we have the following myfile.txt:
my pattern
some text
another pattern
regex
awk sed
my pattern
text text
my pattern
our patterns
awk 'NR==1{ pat=$0; print }NR>1 && $0!~pat' myfile.txt > tmp && mv tmp myfile.txt
Final myfile.txt contents:
my pattern
some text
another pattern
regex
awk sed
text text
our patterns
Using awk. Solution depends a bit on your definition of a match:
$ cat file
1
2
3
1
12
$ awk 'NR==1{p=$0;print;next} p $0 != p' file
1
2
3
12
$ awk 'NR==1{p=$0;print;next} $0 !~ p' file
1
2
3
Therefore you should provide proper sample data with the expected output.
This might work for you (GNU sed):
sed -i '1h;1!G;/^\(.*\)\n\1$/!P;d' file
Copy the first line into the hold space (HS). For every line except the first, append the HS to the pattern space (PS). Compare the current line to the first line and print the current line if it is not the same. Delete the pattern space.

How to get the part of a file after the first line that matches a regular expression

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
The following will print the line matching TERMINATE till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.
If you want the lines before TERMINATE:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.
As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)
The other solutions might consume a lot of memory if you use them on very large files.
If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).
This will print all lines after the TERMINATE-line.
Generalization:
You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.
Notes:
The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.
grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Use Bash parameter expansion like the following:
content=$(cat file)
echo "${content#*TERMINATE}"
There are many ways to do it with sed or awk:
sed -n '/TERMINATE/,$p' file
This looks for TERMINATE in your file and prints from that line up to the end of the file.
awk '/TERMINATE/,0' file
This is exactly the same behaviour as sed.
In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):
awk 'NR>=535' file
Example
$ seq 10 > a #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10
If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:
tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file
And the following will print from the following line matching TERMINATE till the end of the file:
tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file
It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.
Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:
awk '/TERMINATE/ {y=1;next} y' (Hai Vu's answer to 'grep +A': print everything after a match)
awk '/TERMINATE/ ? c++ : c' (Steven Penny's answer to 'grep +A': print everything after a match)
perl -ne 'print unless 1 .. /TERMINATE/' (tchrist's answer to 'grep +A': print everything after a match)
This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:
grep -A466 'TERMINATE' file
sed is a much better tool for the job:
sed -n '/re/,$p' file
where re is a regular expression.
Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.
This will print all lines from the last found line "TERMINATE" till the end of the file:
LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

Resources