Getting head to display all but the last line of a file: command substitution and standard I/O redirection - bash

I have been trying to get the head utility to display all but the last line of standard input. The actual code that I needed is something along the lines of cat myfile.txt | head -n $(($(wc -l)-1)). But that didn't work. I'm doing this on Darwin/OS X which doesn't have the nice semantics of head -n -1 that would have gotten me similar output.
None of these variations work either.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
I tested out more variations and in particular found this to work:
cat <<EOF | echo $(($(wc -l)-1))
>Hola
>Raul
>Como Esta
>Bueno?
>EOF
3
Here's something simpler that also works.
echo "hello world" | echo $(($(wc -w)+10))
This one understandably gives me an illegal line count error. But it at least tells me that the head program is not consuming the standard input before passing stuff on to the subshell/command substitution, a remote possibility, but one that I wanted to rule out anyway.
echo "hello" | head -n $(cat && echo 1)
What explains the behavior of head and wc and their interaction through subshells here? Thanks for your help.

head -n -1 will give you all except the last line of its input.

head is the wrong tool. If you want to see all but the last line, use:
sed \$d
The reason that
# Sample of incorrect code:
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
fails is that wc consumes all of the input and there is nothing left for head to see. wc inherits its stdin from the subshell in which it is running, which is reading from the output of the echo. Once it consumes the input, it returns and then head tries to read the data...but it is all gone. If you want to read the input twice, the data will have to be saved somewhere.

Using sed:
sed '$d' filename
will delete the last line of the file.
$ seq 1 10 | sed '$d'
1
2
3
4
5
6
7
8
9

For Mac OS X specifically, I found an answer from a comment to this Q&A.
Assuming you are using Homebrew, run brew install coreutils then use the ghead command:
cat myfile.txt | ghead -n -1
Or, equivalently:
ghead -n -1 myfile.txt
Lastly, see brew info coreutils if you'd like to use the commands without the g prefix (e.g., head instead of ghead).

cat myfile.txt | echo $(($(wc -l)-1))
This works. It's overly complicated: you could just write echo $(($(wc -l)-1)) <myfile.txt or echo $(($(wc -l <myfile.txt)-1)). The problem is the way you're using it.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
wc consumes all the input as it's counting the lines. So there is no data left to read in the pipe by the time head is started.
If your input comes from a file, you can redirect both wc and head from that file.
head -n $(($(wc -l <myfile.txt) - 1)) <myfile.txt
If your data may come from a pipe, you need to duplicate it. The usual tool to duplicate a stream is tee, but that isn't enough here, because the two outputs from tee are produced at the same rate, whereas here wc needs to fully consume its output before head can start. So instead, you'll need to use a single tool that can detect the last line, which is a more efficient approach anyway.
Conveniently, sed offers a way of matching the last line. Either printing all lines but the last, or suppressing the last output line, will work:
sed -n '$! p'
sed '$ d'

Here is a one-liner that can get you the desired output, and it can be used more generally for getting all lines from a file except the last n lines.
grep -n "" myfile.txt \ # output the line number for each line
| sort -nr \ # reverse the file by using those line numbers
| sed '1,4d' \ # delete first 4 lines (last 4 of the original file)
| sort -n \ # reverse the reversed file (correct the line order)
| sed 's/^[0-9]*://' # remove the added line numbers
Here is the above command in an actual single line and runnable (can't execute the above due to the added comments):
grep -n "" myfile.txt | sort -nr | sed '1,4d' | sort -n | sed 's/^[0-9]*://'
It's a little cumbersome, and this problem can be solved with more comprehensive commands like ghead, but when you can't or don't want to download such tools, it's nice to be able to do this with the more basic options. I've been in situations where it's simply not an option to get better tools.

awk 'NR>1{print p}{p=$0}'
For this job, an awk one-liner is a bit longer than a sed one.

Related

Why doesn't this sed command put a newline

I have a file, ciao.py thas has only one line in it: print("ciao")
I want to do this: I want to do that via pipe stream, and als, if I do cat ciao.py | sed 's/.*/&\n&/' it would work, but I want to do this in two separated parts, simulating the case where I want to print it and then pass that to further commands.
If I do this:
cat ciao.py | sed 's/.*/&\n/' |tee >(xargs echo) | xargs echo
it does not work. It prints print("ciao") print("ciao") in the same line. I don't understand why, since I am putting \n with sed.
I'd guess print cia is appearing twice on the same line because xargs is calling echo with multiple strings since xargs calls the command you provide it with groups of input lines at a time by default.
Is this what you're trying to do?
$ cat ciao.py | sed 's/.*/&\n/' |tee >(xargs -n 1 echo) | xargs -n 1 echo
print(ciao)
print(ciao)
or:
$ cat ciao.py | sed 's/.*/&\n/' |tee >(cat) | xargs -n 1 echo
print(ciao)
print(ciao)
There are, of course, better ways to get that output from that input, e.g.:
$ sed 'p' ciao.py
print("ciao")
print("ciao")

Make grep stop after first NON-matching line

I'm trying to use grep to go through some logs and only select the most recent entries. The logs have years of heavy traffic on them so it's silly to do
tac error.log | grep 2012
tac error.log | grep "Jan.2012"
etc.
and wait for 10 minutes while it goes through several million lines which I already know are not going to match. I know there is the -m option to stop at the first match but I don't know of a way to make it stop at first non-match. I could do something like grep -B MAX_INT -m 1 2011 but that's hardly an optimal solution.
Can grep handle this or would awk make more sense?
How about using awk like this:
tac error.log | awk '{if(/2012/)print;else exit}'
This should exit as soon as a line not matching 2012 is found.
Here is a solution in python:
# foo.py
import sys, re
for line in sys.stdin:
if re.match(r'2012', line):
print line,
continue
break
you#host> tac foo.txt | python foo.py
I don't think grep supports this.
But here is my "why did we have awk again" answer:
tail -n `tac biglogfile | grep -vnm1 2012 | sed 's/:.*//' | xargs expr -1 +` biglogfile
Note that this isn't going to be exact if your log is being written to.
The excellent one-line scripts for sed page to the rescue:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
In other words, you should be able to do the following:
sed -n '/Jan 01 2012/,/Feb 01 2012/p' error.log | grep whatevs
Here is an example that parses the user's dot-plan file and stops at the first non-matching line:
PID=$$
while read ln; do
echo $ln | {
if grep "^[-*+] " >/dev/null; then
# matched
echo -e $ln
elif grep "^[#]" >/dev/null; then
# ignore comment line
:
else
# stop at first non-matching line
kill $PID
fi
}
done <$HOME/.plan
Of course this approach is considerably slower than if Grep reads the lines but at least you can incorporate several cases (not just the non-match).
For more complex scripts, it is worth noting that Bash can also apply regular expressions to variables, i.e. you can also do completely without grep.

get the second last line from shell pipeline

I want to get the second last line from the ls -l output.
I know that
ls -l|tail -n 2| head -n 1
can do this, just wondering if sed can do this in just one command?
ls -l|sed -n 'x;$p'
It can't do third to last though, because sed only has 1 hold space, so can only remember one older line. And since it processes the lines one at a time, it does not know the line will be next to last when processing it. awk could return thrid to last, because you can have arbitrary number of variables there, but the script would be much longer than the tail -n X|head -n 1.
In a awk one-liner :
echo -e "aaa\nbbb\nccc\nddd" | awk '{v[c++]=$0}END{print v[c-2]}'
ccc
Try this to delete second-last line in file
sed -e '$!{h;d;}' -e x filename
tac filename | sed -n 2p
-- but involves a pipe, too

Command to get nth line of STDOUT

Is there any bash command that will let you get the nth line of STDOUT?
That is to say, something that would take this
$ ls -l
-rw-r--r--# 1 root wheel my.txt
-rw-r--r--# 1 root wheel files.txt
-rw-r--r--# 1 root wheel here.txt
and do something like
$ ls -l | magic-command 2
-rw-r--r--# 1 root wheel files.txt
I realize this would be bad practice when writing scripts meant to be reused, BUT when working with the shell day to day it'd be useful to me to be able to filter my STDOUT in such a way.
I also realize this would be semi-trivial command to write (buffer STDOUT, return a specific line), but I want to know if there's some standard shell command to do this that would be available without me dropping a script into place.
Using sed, just for variety:
ls -l | sed -n 2p
Using this alternative, which looks more efficient since it stops reading the input when the required line is printed, may generate a SIGPIPE in the feeding process, which may in turn generate an unwanted error message:
ls -l | sed -n -e '2{p;q}'
I've seen that often enough that I usually use the first (which is easier to type, anyway), though ls is not a command that complains when it gets SIGPIPE.
For a range of lines:
ls -l | sed -n 2,4p
For several ranges of lines:
ls -l | sed -n -e 2,4p -e 20,30p
ls -l | sed -n -e '2,4p;20,30p'
ls -l | head -2 | tail -1
Alternative to the nice head / tail way:
ls -al | awk 'NR==2'
or
ls -al | sed -n '2p'
From sed1line:
# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files
From awk1line:
# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}' # more efficient on large files
For the sake of completeness ;-)
shorter code
find / | awk NR==3
shorter life
find / | awk 'NR==3 {print $0; exit}'
Try this sed version:
ls -l | sed '2 ! d'
It says "delete all the lines that aren't the second one".
You can use awk:
ls -l | awk 'NR==2'
Update
The above code will not get what we want because of off-by-one error: the ls -l command's first line is the total line. For that, the following revised code will work:
ls -l | awk 'NR==3'
Another poster suggested
ls -l | head -2 | tail -1
but if you pipe head into tail, it looks like everything up to line N is processed twice.
Piping tail into head
ls -l | tail -n +2 | head -n1
would be more efficient?
Is Perl easily available to you?
$ perl -n -e 'if ($. == 7) { print; exit(0); }'
Obviously substitute whatever number you want for 7.
Yes, the most efficient way (as already pointed out by Jonathan Leffler) is to use sed with print & quit:
set -o pipefail # cf. help set
time -p ls -l | sed -n -e '2{p;q;}' # only print the second line & quit (on Mac OS X)
echo "$?: ${PIPESTATUS[*]}" # cf. man bash | less -p 'PIPESTATUS'
Hmm
sed did not work in my case.
I propose:
for "odd" lines 1,3,5,7... ls |awk '0 == (NR+1) % 2'
for "even" lines 2,4,6,8 ls |awk '0 == (NR) % 2'
For more completeness..
ls -l | (for ((x=0;x<2;x++)) ; do read ; done ; head -n1)
Throw away lines until you get to the second, then print out the first line after that. So, it prints the 3rd line.
If it's just the second line..
ls -l | (read; head -n1)
Put as many 'read's as necessary.

How do you pipe input through grep to another utility?

I am using 'tail -f' to follow a log file as it's updated; next I pipe the output of that to grep to show only the lines containing a search term ("org.springframework" in this case); finally I'd like to make is piping the output from grep to a third command, 'cut':
tail -f logfile | grep org.springframework | cut -c 25-
The cut command would remove the first 25 characters of each line for me if it could get the input from grep! (It works as expected if I eliminate 'grep' from the chain.)
I'm using cygwin with bash.
Actual results: When I add the second pipe to connect to the 'cut' command, the result is that it hangs, as if it's waiting for input (in case you were wondering).
Assuming GNU grep, add --line-buffered to your command line, eg.
tail -f logfile | grep --line-buffered org.springframework | cut -c 25-
Edit:
I see grep buffering isn't the only problem here, as cut doesn't allow linewise buffering.
you might want to try replacing it with something you can control, such as sed:
tail -f logfile | sed -u -n -e '/org\.springframework/ s/\(.\{0,25\}\).*$/\1/p'
or awk
tail -f logfile | awk '/org\.springframework/ {print substr($0, 0, 25);fflush("")}'
On my system, about 8K was buffered before I got any output. This sequence worked to follow the file immediately:
tail -f logfile | while read line ; do echo "$line"| grep 'org.springframework'|cut -c 25- ; done
What you have should work fine -- that's the whole idea of pipelines. The only problem I see is that, in the version of cut I have (GNU coreutiles 6.10), you should use the syntax cut -c 25- (i.e. use a minus sign instead of a plus sign) to remove the first 24 characters.
You're also searching for different patterns in your two examples, in case that's relevant.

Resources