How to start from the last line with tail? - bash

I have a huge log file. I need to find something and print last line. Like this:
tail -n +1 "$log" | awk '$9 ~ "'$something'" {print $0}' | tail -n1
But when I execute this command, tail starts from 1st line and reads all the lines. And running few mins.
How can I start to read from the last line and stop when I find something? So maybe I don't need to read all lines and running just few secs. Because I need just last line about $something.

Note you are saying tail -n +1 "$log", which is interpreted by tail as: start reading from line 1. So you are in fact doing cat "$log".
You probably want to say tail -n 1 "$log" (without the + before 1) to get the last n lines.
Also, if you want to get the last match of $something, you may want to use tac. This prints a file backwards: first the last line, then the penultimate... and finally the first one.
So if you do
tac "$log" | grep -m1 "$something"
this will print the last match of $something and then exit, because -mX prints the first X matches.
Or of course you can use awk as well:
tac "$log" | awk -v pattern="$something" '$9 ~ pattern {print; exit}'
Note the usage of -v to give to awk the variable. This way you avoid a confusing mixure of single and double quotes in your code.

tac $FILE | grep $SOMETHING -m 1
tac: the reverse of cat :-)
grep -m: search and stop on first occurrence

Instead of tail, use tac. It will reverse the file and you can exit when you first grep something:
tac "$log" | awk '$9 ~ "'$something'" {print $0;exit}'

tail -1000 takes only the last 1000 lines from your file.
You could grep that part, but you wouldn't know if the thing you grep for occurred in the earlier lines. There's no way to grep "backwards".

Related

Printing first and last match in file

Is there a cleaner solution to the following?
grep INFO messages | head -1
grep INFO messages | tail -1
The length of INFO or messages is random.
Try:
grep INFO messages | sed -n '1p;$p'
grep - will search for pattern from messages file
sed -n '1p;$p' - will print first (1p) and last($p) line
You can use -m to establish how many matches you want:
For the first:
grep -m1 "INFO" messages
For the last, let's print the file backwards with tac and then use the same logic:
tac messages | grep -m1 "INFO"
This way, you avoid processing the whole file twice: you will just process it until a match is found.
From man grep:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to just
after the last matching line before exiting, regardless of the
presence of trailing context lines. This enables a calling process to
resume a search. When grep stops after NUM matching lines, it
outputs any trailing context lines. When the -c or --count option is
also used, grep does not output a count greater than NUM. When the -v
or --invert-match option is also used, grep stops after outputting
NUM non-matching lines.
man tac:
tac - concatenate and print files in reverse
This may be what you want:
awk '/INFO/{f[++c]=$0} END{ if (c>0) print f[1] ORS f[c] }' messages
or:
awk '/INFO/{f[++c]=$0} END{ if (c>0) print f[1]; if (c>1) print f[c] }' messages
but without sample input and expected output it's a guess.
I guess you could use awk:
awk '/INFO/{a[++i]=$0}END{print a[1];print a[i]}' messages
This will store every match in an array, which could be an issue for memory consumption if there are very many matches. An alternative would be to only store the first and the most recent:
awk '/INFO/{a[++i>2?2:i]=$0}END{print a[1];print a[2]}' messages
Or as Etan has suggested (thanks):
awk '/INFO/{a=$0}a&&!i++{print}END{if(a)print a}' messages
The advantage to this one is that if there are no matches, nothing will be printed.

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

get the second last line from shell pipeline

I want to get the second last line from the ls -l output.
I know that
ls -l|tail -n 2| head -n 1
can do this, just wondering if sed can do this in just one command?
ls -l|sed -n 'x;$p'
It can't do third to last though, because sed only has 1 hold space, so can only remember one older line. And since it processes the lines one at a time, it does not know the line will be next to last when processing it. awk could return thrid to last, because you can have arbitrary number of variables there, but the script would be much longer than the tail -n X|head -n 1.
In a awk one-liner :
echo -e "aaa\nbbb\nccc\nddd" | awk '{v[c++]=$0}END{print v[c-2]}'
ccc
Try this to delete second-last line in file
sed -e '$!{h;d;}' -e x filename
tac filename | sed -n 2p
-- but involves a pipe, too

Reading a specific line of a file

What is the best way (better performance) to read a specific line of a file? Currently, I'm using the following command line:
head -line_number file_name | tail -1
ps.: preferentially, using shell tools.
You could use sed.
# print line number 10
$ sed -n '10p' file_name
$ sed '10!d' file_name
$ sed '10q;d' file_name
#print 10th line
awk NR==10 file_name
awk -v linenum=10 'NR == linenum {print; exit}' file
If you know the lines are the same length, then a program could directly index in to that line without reading all the preceeding ones: something like od might be able to do that, or you could code it up in half a dozen lines in most-any language. Look for a function called seek() or fseek().
Otherwise, perhaps...
tail +N | head -n 1
...as this asks tail to skip to the Nth line, and there are less lines put needlessly through the pipe than with your head to tail solution.
ruby -ne '$.==10 and (print; exit)' file
I've tried it couple of times to avoid the file cache and found the head + tail was quick but the ruby was the fastest:
$ wc -l myfile.txt
920391 myfile.txt
$ time awk NR==334227 myfile.txt
my_searched_line
real 0m14.963s
user 0m1.235s
sys 0m0.126s
$ time head -334227 myfile.txt |tail -1
my_searched_line
real 0m5.524s
user 0m0.569s
sys 0m0.725s
$ time sed '334227!d' myfile
my_searched_line
real 0m12.565s
user 0m0.814s
sys 0m0.398s
$ time ruby -ne '$.==334227 and (print; exit)' myfile
my_searched_line
real 0m0.750s
user 0m0.568s
sys 0m0.179s

Resources