How can I send the out of the last pipe to two different commands? - bash

So, I have text file with a a bunch of numbers, one number per line to be specific, so I do:-
cat filename.txt|sort -n|head -1 to get the top number and I can do cat filename.txt|sort -n|tail -1 to get the bottom number.
Just to be sure is there a way to send cat filename.txt|sort -n| and its output to two different commands in one line and have the out put (the highest number and the lowest number next to each other)

You can do interesting things with tee and process substitutions, but the order of the output may not be stable (due to timing of processes)
sort -n filename.txt | tee >(tail -1 >/dev/tty) | head -1
In this case, I'd use sed to print the first and last line:
sort -n filename.txt | sed -n '1p; $p'
As #chepner suggests
... | sed -n '1p; $p' | paste - - # tab separated
or
... | awk 'NR == 1 {first = $0} END {print first, $0}' # space separated

There is a useful command tee syntax tee second.txt will output to second.txt
You can combine that with bash executive pipes eg tee >(wc),
So you can do 2 or more commands by eg tee >(wc) | head

Related

The flow of stdout from combined commands

I need to edit a bash script that sorts .vcf files. vcf files are roughly structured as shown below:
## header line
## header line
…
Data line
Data line
…
The script is called vcfsort and is part of a library for manipulating vcf files. It looks like this:
head -1000 $1 | grep "^#"; cat $# | grep -v "^#" | sort -k1,1d -k2,2n
And it is run by writing vcfsort input.vcf > output.vcf.
I understand roughly what it does: since sorting should only be done on the data lines, it gets the header lines:
head -1000 $1 | grep "^#";
And combines it with sorted data lines:
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
I need the head command to read more lines. Instead of calling vcfsort like above, I thought I could just edit the script myself and write it out directly as a command like this:
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
This does not work as expected. My attempt above writes the correct output to stdout, if I leave out > output.vcf. However, if I include it, only the data lines are written to file and the header lines are written to stdout. So, I have a couple of questions:
In this stack overflow answer, it is said that to combine
semicolon-separated commands, they should be enclosed in parentheses. Why is that not the case in the vcfsort script?
Why is $# used in the cat command instead of $1? $# should refer to all of a shell scripts arguments, but since only one is given (the input file), why not just use $1? If there is a reason for this, how can I transfer that to my command line expression?
Why do I only get part of the stdout when I send it to a file?
Could you show me the edits I need to make to get my command to work as intended?
So the script gets first 1000 lines of first file!
Separates header, and basically just copy all comments in those first 1000 lines to output.
Next, it filters all comments lines (leaving only data lines) for all files, and does sorting.
so if you use
vcfsort file1 file2 file3
$1 = "file1" and header from file1 only will be presented in output.
while $# referring to all files: "file1 file2 file3"
if you need to get headers from all files and merge it - I would recommend to use loop.
for file in $#; do
head -1000 $file | grep "^#";
done
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
Why do I only get part of the stdout when I send it to a file?
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
Each command executing separatelly (divided by semicolon ";"). So in example above you just redirecting data lines output after sorting. It doesn't redirect to file header part.
I would recommend to delete redirecting to file and just use:
vcfsort input.vcf > output.vcf
This does not work as expected
May I know what was expected?
There are two command lists, separated by a ;, inside vcfsort:
head -1000 $1 | grep "^#"
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
Each list is a single pipeline. The final two commands in each pipeline inherit their standard output from vcfsort, so that when you run
vcfsort input.vcf > output.vcf
both grep and sort write to output.vcf.
The equivalent using braces would be (replacing ; with a newline for readability)
# Quoting the parameter expansions is important, to protect
# against word-splitting and pathname expansion of the original arguments.
{ head -1000 "$1" | grep "^#"
cat "$#" | grep -v "^#" | sort -k1,1d -k2,2n
} > output.vcf
Output redirections apply only to a single command, not a command list. Here, a command group serves as that single command:
the standard output of the command group is output.vcf, and the two lists in the group inherit that just as before.
Your attempt
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
only opened output.vcf to use as the standard output for sort; the standard output of grep remains whatever standard output it inherits from its parent, namely your terminal.

Deleting each line in a file from index specified in another file in bash [duplicate]

I want to delete one or more specific line numbers from a file. How would I do this using sed?
If you want to delete lines from 5 through 10 and line 12th:
sed -e '5,10d;12d' file
This will print the results to the screen. If you want to save the results to the same file:
sed -i.bak -e '5,10d;12d' file
This will store the unmodified file as file.bak, and delete the given lines.
Note: Line numbers start at 1. The first line of the file is 1, not 0.
You can delete a particular single line with its line number by
sed -i '33d' file
This will delete the line on 33 line number and save the updated file.
and awk as well
awk 'NR!~/^(5|10|25)$/' file
$ cat foo
1
2
3
4
5
$ sed -e '2d;4d' foo
1
3
5
$
This is very often a symptom of an antipattern. The tool which produced the line numbers may well be replaced with one which deletes the lines right away. For example;
grep -nh error logfile | cut -d: -f1 | deletelines logfile
(where deletelines is the utility you are imagining you need) is the same as
grep -v error logfile
Having said that, if you are in a situation where you genuinely need to perform this task, you can generate a simple sed script from the file of line numbers. Humorously (but perhaps slightly confusingly) you can do this with sed.
sed 's%$%d%' linenumbers
This accepts a file of line numbers, one per line, and produces, on standard output, the same line numbers with d appended after each. This is a valid sed script, which we can save to a file, or (on some platforms) pipe to another sed instance:
sed 's%$%d%' linenumbers | sed -f - logfile
On some platforms, sed -f does not understand the option argument - to mean standard input, so you have to redirect the script to a temporary file, and clean it up when you are done, or maybe replace the lone dash with /dev/stdin or /proc/$pid/fd/1 if your OS (or shell) has that.
As always, you can add -i before the -f option to have sed edit the target file in place, instead of producing the result on standard output. On *BSDish platforms (including OSX) you need to supply an explicit argument to -i as well; a common idiom is to supply an empty argument; -i ''.
The shortest, deleting the first line in sed
sed -i '1d' file
As Brian states here, <address><command> is used, <address> is <1> and <command> <d>.
I would like to propose a generalization with awk.
When the file is made by blocks of a fixed size
and the lines to delete are repeated for each block,
awk can work fine in such a way
awk '{nl=((NR-1)%2000)+1; if ( (nl<714) || ((nl>1025)&&(nl<1029)) ) print $0}'
OriginFile.dat > MyOutputCuttedFile.dat
In this example the size for the block is 2000 and I want to print the lines [1..713] and [1026..1029].
NR is the variable used by awk to store the current line number.
% gives the remainder (or modulus) of the division of two integers;
nl=((NR-1)%BLOCKSIZE)+1 Here we write in the variable nl the line number inside the current block. (see below)
|| and && are the logical operator OR and AND.
print $0 writes the full line
Why ((NR-1)%BLOCKSIZE)+1:
(NR-1) We need a shift of one because 1%3=1, 2%3=2, but 3%3=0.
+1 We add again 1 because we want to restore the desired order.
+-----+------+----------+------------+
| NR | NR%3 | (NR-1)%3 | (NR-1)%3+1 |
+-----+------+----------+------------+
| 1 | 1 | 0 | 1 |
| 2 | 2 | 1 | 2 |
| 3 | 0 | 2 | 3 |
| 4 | 1 | 0 | 1 |
+-----+------+----------+------------+
cat -b /etc/passwd | sed -E 's/^( )+(<line_number>)(\t)(.*)/--removed---/g;s/^( )+([0-9]+)(\t)//g'
cat -b -> print lines with numbers
s/^( )+(<line_number>)(\t)(.*)//g -> replace line number to null (remove line)
s/^( )+([0-9]+)(\t)//g #remove numbers the cat printed

Getting head to display all but the last line of a file: command substitution and standard I/O redirection

I have been trying to get the head utility to display all but the last line of standard input. The actual code that I needed is something along the lines of cat myfile.txt | head -n $(($(wc -l)-1)). But that didn't work. I'm doing this on Darwin/OS X which doesn't have the nice semantics of head -n -1 that would have gotten me similar output.
None of these variations work either.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
I tested out more variations and in particular found this to work:
cat <<EOF | echo $(($(wc -l)-1))
>Hola
>Raul
>Como Esta
>Bueno?
>EOF
3
Here's something simpler that also works.
echo "hello world" | echo $(($(wc -w)+10))
This one understandably gives me an illegal line count error. But it at least tells me that the head program is not consuming the standard input before passing stuff on to the subshell/command substitution, a remote possibility, but one that I wanted to rule out anyway.
echo "hello" | head -n $(cat && echo 1)
What explains the behavior of head and wc and their interaction through subshells here? Thanks for your help.
head -n -1 will give you all except the last line of its input.
head is the wrong tool. If you want to see all but the last line, use:
sed \$d
The reason that
# Sample of incorrect code:
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
fails is that wc consumes all of the input and there is nothing left for head to see. wc inherits its stdin from the subshell in which it is running, which is reading from the output of the echo. Once it consumes the input, it returns and then head tries to read the data...but it is all gone. If you want to read the input twice, the data will have to be saved somewhere.
Using sed:
sed '$d' filename
will delete the last line of the file.
$ seq 1 10 | sed '$d'
1
2
3
4
5
6
7
8
9
For Mac OS X specifically, I found an answer from a comment to this Q&A.
Assuming you are using Homebrew, run brew install coreutils then use the ghead command:
cat myfile.txt | ghead -n -1
Or, equivalently:
ghead -n -1 myfile.txt
Lastly, see brew info coreutils if you'd like to use the commands without the g prefix (e.g., head instead of ghead).
cat myfile.txt | echo $(($(wc -l)-1))
This works. It's overly complicated: you could just write echo $(($(wc -l)-1)) <myfile.txt or echo $(($(wc -l <myfile.txt)-1)). The problem is the way you're using it.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
wc consumes all the input as it's counting the lines. So there is no data left to read in the pipe by the time head is started.
If your input comes from a file, you can redirect both wc and head from that file.
head -n $(($(wc -l <myfile.txt) - 1)) <myfile.txt
If your data may come from a pipe, you need to duplicate it. The usual tool to duplicate a stream is tee, but that isn't enough here, because the two outputs from tee are produced at the same rate, whereas here wc needs to fully consume its output before head can start. So instead, you'll need to use a single tool that can detect the last line, which is a more efficient approach anyway.
Conveniently, sed offers a way of matching the last line. Either printing all lines but the last, or suppressing the last output line, will work:
sed -n '$! p'
sed '$ d'
Here is a one-liner that can get you the desired output, and it can be used more generally for getting all lines from a file except the last n lines.
grep -n "" myfile.txt \ # output the line number for each line
| sort -nr \ # reverse the file by using those line numbers
| sed '1,4d' \ # delete first 4 lines (last 4 of the original file)
| sort -n \ # reverse the reversed file (correct the line order)
| sed 's/^[0-9]*://' # remove the added line numbers
Here is the above command in an actual single line and runnable (can't execute the above due to the added comments):
grep -n "" myfile.txt | sort -nr | sed '1,4d' | sort -n | sed 's/^[0-9]*://'
It's a little cumbersome, and this problem can be solved with more comprehensive commands like ghead, but when you can't or don't want to download such tools, it's nice to be able to do this with the more basic options. I've been in situations where it's simply not an option to get better tools.
awk 'NR>1{print p}{p=$0}'
For this job, an awk one-liner is a bit longer than a sed one.

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

Linux commands to output part of input file's name and line count

What Linux commands would you use successively, for a bunch of files, to count the number of lines in a file and output to an output file with part of the corresponding input file as part of the output line. So for example we were looking at file LOG_Yellow and it had 28 lines, the the output file would have a line like this (Yellow and 28 are tab separated):
Yellow 28
wc -l [filenames] | grep -v " total$" | sed s/[prefix]//
The wc -l generates the output in almost the right format; grep -v removes the "total" line that wc generates for you; sed strips the junk you don't want from the filenames.
wc -l * | head --lines=-1 > output.txt
produces output like this:
linecount1 filename1
linecount2 filename2
I think you should be able to work from here to extend to your needs.
edit: since I haven't seen the rules for you name extraction, I still leave the full name. However, unlike other answers I'd prefer to use head rather then grep, which not only should be slightly faster, but also avoids the case of filtering out files named total*.
edit2 (having read the comments): the following does the whole lot:
wc -l * | head --lines=-1 | sed s/LOG_// | awk '{print $2 "\t" $1}' > output.txt
wc -l *| grep -v " total"
send
28 Yellow
You can reverse it if you want (awk, if you don't have space in file names)
wc -l *| egrep -v " total$" | sed s/[prefix]//
| awk '{print $2 " " $1}'
Short of writing the script for you:
'for' for looping through your files.
'echo -n' for printing the current file
'wc -l' for finding out the line count
And dont forget to redirect
('>' or '>>') your results to your
output file

Resources