Grep penultimate line - bash

Like the title says, how can I filter with grep (or similar bash tool) the line-before-the-last-line of a (variable length) file?
That is, show everything EXCEPT the penultimate line.
Thanks

You can use a combination of head and tail like this for example:
$ cat input
one
two
three
HIDDEN
four
$ head -n -2 input ; tail -n 1 input
one
two
three
four
From the coreutils head documentation:
‘-n k’
‘--lines=k’
Output the first k lines. However, if k starts with a ‘-’, print all but the last k lines of each file. Size multiplier suffixes are the same as with the -c option.
So the head -n -2 part strips all but the last two lines of its input.
This is unfortunately not portable. (POSIX does not allow negative values in the -n parameter.)

grep is the wrong tool for this. You can wing it with something like
# Get line count
count=$(wc -l <file)
# Subtract one
penultimate=$(expr $count - 1)
# Delete that line, i.e. print all other lines.
# This doesn't modify the file, just prints
# the requested lines to standard output.
sed "${penultimate}d" file
Bash has built-in arithmetic operators which are more elegant than expr; but expr is portable to other shells.
You could also do this in pure sed but I don't want to think about it. In Perl or awk, it would be easy to print the previous line and then at EOF print the final line.
Edit: I thought about sed after all.
sed -n '$!x;1!p' file
In more detail; unless we are at the last line ($), exchange the pattern space and the hold space (remember the current line; retrieve the previous line, if any). Then, unless this is the first line, print whatever is now in the pattern space (the previous line, except when we are on the last line).

awk oneliner: (test with seq 10):
kent$ seq 10|awk '{a[NR]=$0}END{for(i=1;i<=NR;i++)if(i!=NR-1)print a[i]}'
1
2
3
4
5
6
7
8
10

Using ed:
printf '%s\n' H '$-1d' wq | ed -s file # in-place file edit
printf '%s\n' H '$-1d' ',p' wq | ed -s file # write to stdout

Related

Using variables in sed -n x,yp to print lines from line number x to line number y

I have a file "testfile" with 10 lines. I want to print lines 3 (lower) to 6(upper) out of these lines. So I use cat testfile | sed -n 3,6p to print the lines. Now if I calculate the upper limit to be displayed based on some calculation and the result is saved in a variable say "y". Assume y=6, how can I use the same sed command now??
sed -n 3,$yp doesn't work as yp is considered like a variable here. How do I distinguish between $y and p here.
With curly braces:
sed -n 3,${y}p FILE
and the useless use of cat can be avoided, too. :)

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

How to get the part of a file after the first line that matches a regular expression

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
The following will print the line matching TERMINATE till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.
If you want the lines before TERMINATE:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.
As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)
The other solutions might consume a lot of memory if you use them on very large files.
If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).
This will print all lines after the TERMINATE-line.
Generalization:
You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.
Notes:
The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.
grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Use Bash parameter expansion like the following:
content=$(cat file)
echo "${content#*TERMINATE}"
There are many ways to do it with sed or awk:
sed -n '/TERMINATE/,$p' file
This looks for TERMINATE in your file and prints from that line up to the end of the file.
awk '/TERMINATE/,0' file
This is exactly the same behaviour as sed.
In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):
awk 'NR>=535' file
Example
$ seq 10 > a #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10
If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:
tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file
And the following will print from the following line matching TERMINATE till the end of the file:
tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file
It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.
Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:
awk '/TERMINATE/ {y=1;next} y' (Hai Vu's answer to 'grep +A': print everything after a match)
awk '/TERMINATE/ ? c++ : c' (Steven Penny's answer to 'grep +A': print everything after a match)
perl -ne 'print unless 1 .. /TERMINATE/' (tchrist's answer to 'grep +A': print everything after a match)
This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:
grep -A466 'TERMINATE' file
sed is a much better tool for the job:
sed -n '/re/,$p' file
where re is a regular expression.
Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.
This will print all lines from the last found line "TERMINATE" till the end of the file:
LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Resources