Make grep stop after first NON-matching line - bash

I'm trying to use grep to go through some logs and only select the most recent entries. The logs have years of heavy traffic on them so it's silly to do
tac error.log | grep 2012
tac error.log | grep "Jan.2012"
etc.
and wait for 10 minutes while it goes through several million lines which I already know are not going to match. I know there is the -m option to stop at the first match but I don't know of a way to make it stop at first non-match. I could do something like grep -B MAX_INT -m 1 2011 but that's hardly an optimal solution.
Can grep handle this or would awk make more sense?

How about using awk like this:
tac error.log | awk '{if(/2012/)print;else exit}'
This should exit as soon as a line not matching 2012 is found.

Here is a solution in python:
# foo.py
import sys, re
for line in sys.stdin:
if re.match(r'2012', line):
print line,
continue
break
you#host> tac foo.txt | python foo.py

I don't think grep supports this.
But here is my "why did we have awk again" answer:
tail -n `tac biglogfile | grep -vnm1 2012 | sed 's/:.*//' | xargs expr -1 +` biglogfile
Note that this isn't going to be exact if your log is being written to.

The excellent one-line scripts for sed page to the rescue:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
In other words, you should be able to do the following:
sed -n '/Jan 01 2012/,/Feb 01 2012/p' error.log | grep whatevs

Here is an example that parses the user's dot-plan file and stops at the first non-matching line:
PID=$$
while read ln; do
echo $ln | {
if grep "^[-*+] " >/dev/null; then
# matched
echo -e $ln
elif grep "^[#]" >/dev/null; then
# ignore comment line
:
else
# stop at first non-matching line
kill $PID
fi
}
done <$HOME/.plan
Of course this approach is considerably slower than if Grep reads the lines but at least you can incorporate several cases (not just the non-match).
For more complex scripts, it is worth noting that Bash can also apply regular expressions to variables, i.e. you can also do completely without grep.

Related

bash: pipe continuously into a grep

Not sure how to explain this but, what I am trying to achieve is this:
- tailing a file and grepping for a patter A
- then I want to pipe into another customGrepFunction where it matches pattern B, and if B matches echo something out. Need the customGrepFunction in order to do some other custom stuff.
The sticky part here is how to make the grepCustomFunction work here.In other words when only patternA matches echo the whole line and when both patterA & patternB match printout something custom:
when I only run:
tail -f file.log | grep patternA
I can see the pattenA rows are being printed/tailed however when I add the customGrepFunction nothing happens.
tail -f file.log | grep patternA | customGrepFunction
And the customGrepFunction should be available globally in my bin folder:
customGrepFunction(){
if grep patternB
then
echo "True"
fi
}
I have this setup however it doesn't do what I need it to do, it only echos True whenever I do Ctrl+C and exit the tailing.
What am I missing here?
Thanks
What's Going Wrong
The code: if grep patternB; then echo "true"; fi
...waits for grep patternB to exit, which will happen only when the input from tail -f file.log | grep patternA hits EOF. Since tail -f waits for new content forever, there will never be an EOF, so your if statement will never complete.
How To Fix It
Don't use grep on the inside of your function. Instead, process content line-by-line and use bash's native regex support:
customGrepFunction() {
while IFS= read -r line; do
if [[ $line =~ patternB ]]; then
echo "True"
fi
done
}
Next, make sure that grep isn't buffering content (if it were, then it would be written to your code only in big chunks, delaying until such a chunk is available). The means to do this varies by implementation, but with GNU grep, it would look like:
tail -f file.log | grep --line-buffered patternA | customGrepFunction

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

Getting head to display all but the last line of a file: command substitution and standard I/O redirection

I have been trying to get the head utility to display all but the last line of standard input. The actual code that I needed is something along the lines of cat myfile.txt | head -n $(($(wc -l)-1)). But that didn't work. I'm doing this on Darwin/OS X which doesn't have the nice semantics of head -n -1 that would have gotten me similar output.
None of these variations work either.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
I tested out more variations and in particular found this to work:
cat <<EOF | echo $(($(wc -l)-1))
>Hola
>Raul
>Como Esta
>Bueno?
>EOF
3
Here's something simpler that also works.
echo "hello world" | echo $(($(wc -w)+10))
This one understandably gives me an illegal line count error. But it at least tells me that the head program is not consuming the standard input before passing stuff on to the subshell/command substitution, a remote possibility, but one that I wanted to rule out anyway.
echo "hello" | head -n $(cat && echo 1)
What explains the behavior of head and wc and their interaction through subshells here? Thanks for your help.
head -n -1 will give you all except the last line of its input.
head is the wrong tool. If you want to see all but the last line, use:
sed \$d
The reason that
# Sample of incorrect code:
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
fails is that wc consumes all of the input and there is nothing left for head to see. wc inherits its stdin from the subshell in which it is running, which is reading from the output of the echo. Once it consumes the input, it returns and then head tries to read the data...but it is all gone. If you want to read the input twice, the data will have to be saved somewhere.
Using sed:
sed '$d' filename
will delete the last line of the file.
$ seq 1 10 | sed '$d'
1
2
3
4
5
6
7
8
9
For Mac OS X specifically, I found an answer from a comment to this Q&A.
Assuming you are using Homebrew, run brew install coreutils then use the ghead command:
cat myfile.txt | ghead -n -1
Or, equivalently:
ghead -n -1 myfile.txt
Lastly, see brew info coreutils if you'd like to use the commands without the g prefix (e.g., head instead of ghead).
cat myfile.txt | echo $(($(wc -l)-1))
This works. It's overly complicated: you could just write echo $(($(wc -l)-1)) <myfile.txt or echo $(($(wc -l <myfile.txt)-1)). The problem is the way you're using it.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
wc consumes all the input as it's counting the lines. So there is no data left to read in the pipe by the time head is started.
If your input comes from a file, you can redirect both wc and head from that file.
head -n $(($(wc -l <myfile.txt) - 1)) <myfile.txt
If your data may come from a pipe, you need to duplicate it. The usual tool to duplicate a stream is tee, but that isn't enough here, because the two outputs from tee are produced at the same rate, whereas here wc needs to fully consume its output before head can start. So instead, you'll need to use a single tool that can detect the last line, which is a more efficient approach anyway.
Conveniently, sed offers a way of matching the last line. Either printing all lines but the last, or suppressing the last output line, will work:
sed -n '$! p'
sed '$ d'
Here is a one-liner that can get you the desired output, and it can be used more generally for getting all lines from a file except the last n lines.
grep -n "" myfile.txt \ # output the line number for each line
| sort -nr \ # reverse the file by using those line numbers
| sed '1,4d' \ # delete first 4 lines (last 4 of the original file)
| sort -n \ # reverse the reversed file (correct the line order)
| sed 's/^[0-9]*://' # remove the added line numbers
Here is the above command in an actual single line and runnable (can't execute the above due to the added comments):
grep -n "" myfile.txt | sort -nr | sed '1,4d' | sort -n | sed 's/^[0-9]*://'
It's a little cumbersome, and this problem can be solved with more comprehensive commands like ghead, but when you can't or don't want to download such tools, it's nice to be able to do this with the more basic options. I've been in situations where it's simply not an option to get better tools.
awk 'NR>1{print p}{p=$0}'
For this job, an awk one-liner is a bit longer than a sed one.

get the second last line from shell pipeline

I want to get the second last line from the ls -l output.
I know that
ls -l|tail -n 2| head -n 1
can do this, just wondering if sed can do this in just one command?
ls -l|sed -n 'x;$p'
It can't do third to last though, because sed only has 1 hold space, so can only remember one older line. And since it processes the lines one at a time, it does not know the line will be next to last when processing it. awk could return thrid to last, because you can have arbitrary number of variables there, but the script would be much longer than the tail -n X|head -n 1.
In a awk one-liner :
echo -e "aaa\nbbb\nccc\nddd" | awk '{v[c++]=$0}END{print v[c-2]}'
ccc
Try this to delete second-last line in file
sed -e '$!{h;d;}' -e x filename
tac filename | sed -n 2p
-- but involves a pipe, too

how to proceed once a file containing something in shell

I am writing some BASH shell script that will continuously check a file to see if the file already contains "Completed!" before proceeding. (Of course, assume the file is being updated and will eventually contain the phrase "Completed!")
I am not sure how to do this. Thank you for your help.
You can do something like:
while ! grep -q -e 'Completed!' file ; do
sleep 1 # Or some other number of seconds
done
# Here the file contains completed
Amongst the standard utilities, tail has an option to keep reading from a file: tail -f. So filter the output of tail -f.
<some_file tail -f -n +1 | grep 'Completed!' | head -n 1 >/dev/null
There may be a delay due to buffering. You can at least reduce the delay by using fewer tools in the pipeline. In fact, some implementations of tail never buffer when you do tail -f, so the following snippet will return as soon as Completed! is written to the file.
<some_file tail -f -n +1 | sed -e '/Completed!/ q'
This assumes that the file is being appended to by some other tool. If the file is overwritten by the data-producing program after you start tail, this solution won't work. You can search the file periodically. On some systems you can call a notification mechanism to know whenever the file changes, e.g. with inotifywait under Linux.
I've done this in Kornshell:
tail -f somefile | while read line
do
echo $line
[[ $line == *Completed!* ]] && break
done
Note no quotes around the *Completed!* string. This allows the double square brackets to do glob pattern matching instead of string matching.
This seems to work in BASH too. However, the line with the Completed must end in a NL. Otherwise, it'll take an extra line before it breaks the loop.
You can use grep too:
tail -f somefile | while read line
do
echo $line
grep -iq "Completed!" && break
done
The -q parameter means quiet. If your grep doesn't take the -q parameter, you might have to pipe it to /dev/null. The -i is ignore case. Whether you want to do that is up to you.
The advantage is that you aren't doing any processing unless there's a line to read. Using sleep may mean you miss the line, or that you're processing when no line has been added to the file.
Using grep in a pipe you may turn on line buffering mode by adding the --line-buffered option!

Resources