Extracting last 10 lines of a file that match "foo" - bash

I want to write the last ten lines which contain a spesific word such as "foo" in a file to a new text file named for instance boo.txt.
How can I achieve this in the command prompt of a unix terminal?

You can use grep and tail:
grep "foo" input.txt | tail -n 10 > boo.txt
The default number of lines printed by tail is 10, so you can omit the -n 10 part if you always want that many.
The > redirection will create boo.txt if it didn't exist. If it did exist prior to running this, the file will be truncated (i.e. emptied) first. So boo.txt will contain at most 10 lines of text in any case.
If you wanted to append to boo.txt, you should change the redirection to use >>.
grep "bar" input.txt | tail -n 42 >> boo.txt
You might also be interested in head if you are looking for the first occurrences of the string.

grep foo /path/to/input/file | tail > boo.txt

Related

Deleting each line in a file from index specified in another file in bash [duplicate]

I want to delete one or more specific line numbers from a file. How would I do this using sed?
If you want to delete lines from 5 through 10 and line 12th:
sed -e '5,10d;12d' file
This will print the results to the screen. If you want to save the results to the same file:
sed -i.bak -e '5,10d;12d' file
This will store the unmodified file as file.bak, and delete the given lines.
Note: Line numbers start at 1. The first line of the file is 1, not 0.
You can delete a particular single line with its line number by
sed -i '33d' file
This will delete the line on 33 line number and save the updated file.
and awk as well
awk 'NR!~/^(5|10|25)$/' file
$ cat foo
1
2
3
4
5
$ sed -e '2d;4d' foo
1
3
5
$
This is very often a symptom of an antipattern. The tool which produced the line numbers may well be replaced with one which deletes the lines right away. For example;
grep -nh error logfile | cut -d: -f1 | deletelines logfile
(where deletelines is the utility you are imagining you need) is the same as
grep -v error logfile
Having said that, if you are in a situation where you genuinely need to perform this task, you can generate a simple sed script from the file of line numbers. Humorously (but perhaps slightly confusingly) you can do this with sed.
sed 's%$%d%' linenumbers
This accepts a file of line numbers, one per line, and produces, on standard output, the same line numbers with d appended after each. This is a valid sed script, which we can save to a file, or (on some platforms) pipe to another sed instance:
sed 's%$%d%' linenumbers | sed -f - logfile
On some platforms, sed -f does not understand the option argument - to mean standard input, so you have to redirect the script to a temporary file, and clean it up when you are done, or maybe replace the lone dash with /dev/stdin or /proc/$pid/fd/1 if your OS (or shell) has that.
As always, you can add -i before the -f option to have sed edit the target file in place, instead of producing the result on standard output. On *BSDish platforms (including OSX) you need to supply an explicit argument to -i as well; a common idiom is to supply an empty argument; -i ''.
The shortest, deleting the first line in sed
sed -i '1d' file
As Brian states here, <address><command> is used, <address> is <1> and <command> <d>.
I would like to propose a generalization with awk.
When the file is made by blocks of a fixed size
and the lines to delete are repeated for each block,
awk can work fine in such a way
awk '{nl=((NR-1)%2000)+1; if ( (nl<714) || ((nl>1025)&&(nl<1029)) ) print $0}'
OriginFile.dat > MyOutputCuttedFile.dat
In this example the size for the block is 2000 and I want to print the lines [1..713] and [1026..1029].
NR is the variable used by awk to store the current line number.
% gives the remainder (or modulus) of the division of two integers;
nl=((NR-1)%BLOCKSIZE)+1 Here we write in the variable nl the line number inside the current block. (see below)
|| and && are the logical operator OR and AND.
print $0 writes the full line
Why ((NR-1)%BLOCKSIZE)+1:
(NR-1) We need a shift of one because 1%3=1, 2%3=2, but 3%3=0.
+1 We add again 1 because we want to restore the desired order.
+-----+------+----------+------------+
| NR | NR%3 | (NR-1)%3 | (NR-1)%3+1 |
+-----+------+----------+------------+
| 1 | 1 | 0 | 1 |
| 2 | 2 | 1 | 2 |
| 3 | 0 | 2 | 3 |
| 4 | 1 | 0 | 1 |
+-----+------+----------+------------+
cat -b /etc/passwd | sed -E 's/^( )+(<line_number>)(\t)(.*)/--removed---/g;s/^( )+([0-9]+)(\t)//g'
cat -b -> print lines with numbers
s/^( )+(<line_number>)(\t)(.*)//g -> replace line number to null (remove line)
s/^( )+([0-9]+)(\t)//g #remove numbers the cat printed

Omit lines from the beginning or end of a file in Bash [duplicate]

This question already has answers here:
Remove the last line from a file in Bash
(16 answers)
Closed 9 years ago.
Given a text file a.txt, how to cut the head or tail from the file?
For example, remove the first 10 lines or the last 10 lines.
to list all but the last 10 lines of a file:
head -n -10 file
to list all but the first 10 lines of a file:
tail -n +10 file
To omit lines at the beginning of a file, you can just use tail. For example, given a file a.txt:
$ cat > a.txt
one
two
three
four
five
^D
...you can start at the third line, omitting the first two, by passing a number prepended with + for the -n argument:
$ tail -n +3 a.txt
three
four
five
(Or just tail +3 a.txt for short.)
To omit lines at the end of the file you can do the same with head, but only if you have the GNU coreutils version (the BSD version that ships with Mac OS X, for example, won't work). To omit the last two lines of the file, pass a negative number for the -n argument:
$ head -n -2 a.txt
one
two
three
If the GNU version of head isn't available on your system (and you're unable to install it) you'll have to resort to other methods, like those given by #ruifeng.
to cut the the first 10 lines, you can use any one of these
awk 'NR>10' file
sed '1,10d' file
sed -n '11,$p' file
To cut the last 10 lines, you can use
tac file | sed '1,10d' | tac
or use head
head -n -10 file
cat a.txt | sed '1,10d' | sed -n -e :a -e '1, 10!{P;N;D;};N;ba'
IFS=$'\n';array=( $(cat file) )
for((i=0;i<=${#array[#]}-10;i++)) ; do echo "${array[i]}"; done

Redirecting the lines in to different files under a for loop in shell

I want to put certain lines in to two different files in a shell script. How should I put the syntax for this.
Example:
A for loop prints 6 lines, and I want that the first two lines should be appended to 1st file, and the last 4 lines should be appended to the other file.
A for loop prints 6 lines, and I want that the first two lines should
be appended to 1st file, and the last 4 lines should be appended to
the other file.
There is no way. One option would be to redirect everything to a file and then copy the desired sections of the log to other files.
for i in {1..6}; do
echo $i > log
done
head -4 log >> logfile1 # Appends the first four lines to logfile1
tail -2 log >> logfile2 # Appends the last two lines to logfile2
Answer
If you're using BASH you can use tee to send the same input to both head -n2 and tail -n4 at the same time using a combination of process substitution and a pipe:
$ for i in {1..6}; do echo $i; done | tee >(head -n2 >first2.txt) | tail -n4 >last4.txt
$ cat first2.txt
1
2
$ cat last4.txt
3
4
5
6
Explanation
By default tee takes its STDIN and copies it to file(s) specified as arguments in addition to its STDOUT. Since process substitution returns a /dev/fd path to a file descriptor (echo >(true) to see an example) tee is able to write to that path like any other regular file.
Here's what the tee command looks like after substitution:
tee /dev/fd/xx | tail -n4 >last4.txt
Or more visually:
tee | tail -n4 >last4.txt
:
/dev/fd/xx
:
:..>(head -n2 >first2.txt)
So the output gets copied both to the head process (Whose output is redirected to first2.txt) and out STDIN which is piped to the tail process:
Note that process substitution is a BASH-ism, so if you're using a different shell or concerned about POSIX compliance it might not be available.

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

Resources