Printing first and last match in file - bash

Is there a cleaner solution to the following?
grep INFO messages | head -1
grep INFO messages | tail -1
The length of INFO or messages is random.

Try:
grep INFO messages | sed -n '1p;$p'
grep - will search for pattern from messages file
sed -n '1p;$p' - will print first (1p) and last($p) line

You can use -m to establish how many matches you want:
For the first:
grep -m1 "INFO" messages
For the last, let's print the file backwards with tac and then use the same logic:
tac messages | grep -m1 "INFO"
This way, you avoid processing the whole file twice: you will just process it until a match is found.
From man grep:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to just
after the last matching line before exiting, regardless of the
presence of trailing context lines. This enables a calling process to
resume a search. When grep stops after NUM matching lines, it
outputs any trailing context lines. When the -c or --count option is
also used, grep does not output a count greater than NUM. When the -v
or --invert-match option is also used, grep stops after outputting
NUM non-matching lines.
man tac:
tac - concatenate and print files in reverse

This may be what you want:
awk '/INFO/{f[++c]=$0} END{ if (c>0) print f[1] ORS f[c] }' messages
or:
awk '/INFO/{f[++c]=$0} END{ if (c>0) print f[1]; if (c>1) print f[c] }' messages
but without sample input and expected output it's a guess.

I guess you could use awk:
awk '/INFO/{a[++i]=$0}END{print a[1];print a[i]}' messages
This will store every match in an array, which could be an issue for memory consumption if there are very many matches. An alternative would be to only store the first and the most recent:
awk '/INFO/{a[++i>2?2:i]=$0}END{print a[1];print a[2]}' messages
Or as Etan has suggested (thanks):
awk '/INFO/{a=$0}a&&!i++{print}END{if(a)print a}' messages
The advantage to this one is that if there are no matches, nothing will be printed.

Related

Find Nth occurrence of a string using Regex

So I have a string at the beginning of a line and can find all of them. I am using ^$string to match and I have thousands of these and an error occurs on a specific line. Let's say I was trying to get to the 100th occurrence of this pattern how would I do so?
For example, I can grep ^$string and list all but I would like to find a specific one.
grep has -m / --max-count option:
grep -m100 '^String' | tail -1
will give you the 100th matched line.
Note:
the -m100 will make grep stop reading the input file if 100 matches are hit. It's pretty useful if you are reading a huge file
the tail command is very fast since it doesn't read the content.
You can use sed to print only a single line of your grep's output :
grep "^$string" inputFile | sed -n '100p'
-n has output disabled by default, 100p prints the input to the output stream for the 100th line only.
Or as #dan mentions in the comments :
grep "^$string" inputFile | sed '100!d;q'

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

How to start from the last line with tail?

I have a huge log file. I need to find something and print last line. Like this:
tail -n +1 "$log" | awk '$9 ~ "'$something'" {print $0}' | tail -n1
But when I execute this command, tail starts from 1st line and reads all the lines. And running few mins.
How can I start to read from the last line and stop when I find something? So maybe I don't need to read all lines and running just few secs. Because I need just last line about $something.
Note you are saying tail -n +1 "$log", which is interpreted by tail as: start reading from line 1. So you are in fact doing cat "$log".
You probably want to say tail -n 1 "$log" (without the + before 1) to get the last n lines.
Also, if you want to get the last match of $something, you may want to use tac. This prints a file backwards: first the last line, then the penultimate... and finally the first one.
So if you do
tac "$log" | grep -m1 "$something"
this will print the last match of $something and then exit, because -mX prints the first X matches.
Or of course you can use awk as well:
tac "$log" | awk -v pattern="$something" '$9 ~ pattern {print; exit}'
Note the usage of -v to give to awk the variable. This way you avoid a confusing mixure of single and double quotes in your code.
tac $FILE | grep $SOMETHING -m 1
tac: the reverse of cat :-)
grep -m: search and stop on first occurrence
Instead of tail, use tac. It will reverse the file and you can exit when you first grep something:
tac "$log" | awk '$9 ~ "'$something'" {print $0;exit}'
tail -1000 takes only the last 1000 lines from your file.
You could grep that part, but you wouldn't know if the thing you grep for occurred in the earlier lines. There's no way to grep "backwards".

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

How to ignore all lines before a match occurs in bash?

I would like ignore all lines which occur before a match in bash (also ignoring the matched line. Example of input could be
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
and if I match R2-01.sql in this already sorted input I would like to get
R2-02.sql
R2-03.sql
Many ways possible. For example: assuming that your input is in list.txt
PATTERN="R2-01.sql"
sed "0,/$PATTERN/d" <list.txt
because, the 0,/pattern/ works only on GNU sed, (e.g. doesn't works on OS X), here is an tampered solution. ;)
PATTERN="R2-01.sql"
(echo "dummy-line-to-the-start" ; cat - ) < list.txt | sed "1,/$PATTERN/d"
This will add one dummy line to the start, so the real pattern must be on line the 1 or higher, so the 1,/pattern/ will works - deleting everything from the line 1 (dummy one) up to the pattern.
Or you can print lines after the pattern and delete the 1st, like:
sed -n '/pattern/,$p' < list.txt | sed '1d'
with awk, e.g.:
awk '/pattern/,0{if (!/pattern/)print}' < list.txt
or, my favorite use the next perl command:
perl -ne 'print unless 1../pattern/' < list.txt
deletes the 1.st line when the pattern is on 1st line...
another solution is reverse-delete-reverse
tail -r < list.txt | sed '/pattern/,$d' | tail -r
if you have the tac command use it instead of tail -r The interesant thing is than the /pattern/,$d' works on the last line but the1,/pattern/d` doesn't on the first.
How to ignore all lines before a match occurs in bash?
The question headline and your example don't quite match up.
Print all lines from "R2-01.sql" in sed:
sed -n '/R2-01.sql/,$p' input_file.txt
Where:
-n suppresses printing the pattern space to stdout
/ starts and ends the pattern to match (regular expression)
, separates the start of the range from the end
$ addresses the last line in the input
p echoes the pattern space in that range to stdout
input_file.txt is the input file
Print all lines after "R2-01.sql" in sed:
sed '1,/R2-01.sql/d' input_file.txt
1 addresses the first line of the input
, separates the start of the range from the end
/ starts and ends the pattern to match (regular expression)
$ addresses the last line in the input
d deletes the pattern space in that range
input_file.txt is the input file
Everything not deleted is echoed to stdout.
This is a little hacky, but it's easy to remember for quickly getting the output you need:
$ grep -A99999 $match $file
Obviously you need to pick a value for -A that's large enough to match all contents; if you use a too-small value the output will be silently truncated.
To ensure you get all output you can do:
$ grep -A$(wc -l $file) $match $file
Of course at that point you might be better off with the sed solutions, since they don't require two reads of the file.
And if you don't want the matching line itself, you can simply pipe this command into tail -n+1 to skip the first line of output.
awk -v pattern=R2-01.sql '
print_it {print}
$0 ~ pattern {print_it = 1}
'
you can do with this,but i think jomo666's answer was better.
sed -nr '/R2-01.sql/,${/R2-01/d;p}' <<END
R1-01.sql
R1-02.sql
R1-03.sql
R1-04.sql
R2-01.sql
R2-02.sql
R2-03.sql
END
Perl is another option:
perl -ne 'if ($f){print} elsif (/R2-01\.sql/){$f++}' sql
To pass in the regex as an argument, use -s to enable a simple argument parser
perl -sne 'if ($f){print} elsif (/$r/){$f++}' -- -r=R2-01\\.sql file
This can be accomplished with grep, by printing a large enough context following the $match. This example will output the first matching line followed by 999,999 lines of "context".
grep -A999999 $match $file
For added safety (in case the $match begins with a hyphen, say) you should use -e to force $match to be used as an expression.
grep -A999999 -e '$match' $file

Resources