Collecting commands at the same place in pipe - bash

how can one have multiple commands (collected together) at the same position in a pipe? i.e. what comes before in the pipe gets fed into each of those commands separately and sequentially, and their combined output gets fed into the next step of the pipe. For example, how could I make some variation of this pipe
echo -e "1\n2\n3" | { head -1; tail -1; } | xargs echo
print 1 3, instead of just 1?
Thanks

what comes before in the pipe gets fed into each of those commands separately and sequentially, and their combined output gets fed into the next step of the pipe
This is not possible since of the nature of a pipe. Once data is read from a pipe it gets removed from the pipe. That's why there can be only one reader.
I would copy the output of the first command into a temporary file and than feed it to both commands head and tail. If you launch head and tail in a subshell you can feed their combined output to another command:
Btw, a code block between curly brackets does not launch a subshell. You need to use parentheses. The following command seems the closest to what you want:
echo -e "1\n2\n3" > file.tmp ; ( head -1 file.tmp ; tail -qn1 file.tmp; rm file.tmp ) | next_command
About the special use case: Printing the first and the last line from a file
I would use sed for that purpose:
sed -n '1p;$p'
This will print the first and the last line of a file. However it works only for files which contain at least two lines. If the file contains a single line it would get printed twice. You can get around this restriction using the following command:
sed -n 'x;s/^/1/;x;1p;${x;/11/{x;p}}'
The command above adds a 1 to the hold buffer on every line. At the end of the script (which could be after the first line) it checks if there are at least two 1 in the hold buffer and prints the last line if this is true.

Related

Can I do a Bash wildcard expansion (*) on an entire pipeline of commands?

I am using Linux. I have a directory of many files, I want to use grep, tail and wildcard expansion * in tandem to print the last occurrence of <pattern> in each file:
Input: <some command>
Expected Output:
<last occurrence of pattern in file 1>
<last occurrence of pattern in file 2>
...
<last occurrence of pattern in file N>
What I am trying now is grep "pattern" * | tail -n 1 but the output contains only one line, which is the last occurrence of pattern in the last file. I assume the reason is because the * wildcard expansion happens before pipelining of commands, so the tail runs only once.
Does there exist some Bash syntax so that I can achieve the expected outcome, i.e. let tail run for each file?
I know I can always use a for-loop to solve the problem. I'm just curious if the problem can be solved with a more condensed command.
I've also tried grep -m1 "pattern" <(tac *), and it seems like the aforementioned reasoning still applies: wildcard expansion applies to only to the immediate command it is associated with, and the "outer" command runs only once.
Wildcards are expanded on the command line before any command runs. For example if you have files foo and bar in your directory and run grep pattern * | tail -n1 then bash transforms this into grep pattern foo bar | tail -n1 and runs that. Since there's only one stream of output from grep, there's only one stream of input to tail and it prints the last line of that stream.
If you want to search each file and print the last line of grep's output separately you can use a loop:
for file in * ; do
grep pattern "${file}" | tail -n1
done
The problem with non-loop solutions is that tail doesn't inherently know where the output of one file ends and the output of another file begins, or indeed that there are even files involved on the other end of the pipe. It just knows input is coming in from somewhere and it has to print the last line of that input. If you didn't want a loop, you'd have to use a more powerful tool like awk and perhaps use the fact that grep prepends the names of matched files (if multiple files are matched, or with -H) to delimit the start and end of outputs from each file. But, the work to write an awk program that keeps track of the current file to know when its output ends and print its last line is probably more effort than is worth when the loop solution is so simple.
You can achieve what you want using xargs. For your example it would be:
ls * | xargs -n 1 sh -c 'grep "pattern" $0 | tail -n 1'
Can save you from having to write a loop.
You can do this with awk, although (as tjm3772 pointed out in their answer) it's actually more complicated than the shell for loop. For the record, here's what I came up with:
awk -v pattern="YourPatternHere" '(FNR==1 && line!="") {print line; line=""}; $0~pattern {line=$0}; END {if (line!="") print line}'
Explanation: when it finds a matching line ($0~pattern), it stores that line in the line variable ({line=$0}) (this means that at the end of the file, line will hold the last matching line.
(Note: if you want to just include a literal pattern in the program, remove the -v pattern="YourPatternHere" part and replace $0~pattern with just /YourPatternHere/)
There's no simple trigger to print a match at the end of each file, so that part's split into two pieces: if it's the first line of a file AND line is set because of a match in the previous file ((FNR==1 && line!="")), print line and then clear it so it's not mistaken for a match in the current file ({print line; line=""}). Finally, at the end of the final file (END), print a match found in that last file if there was one ({if (line!="") print line}).
Also, note that the print-at-beginning-of-new-file test must be before the check for a matching line, or else it'll get very confused if the first line of the new file matches.
So... yeah, a shell for loop is simpler (and much easier to get right).

echo last character of text file in Unix/Bash

I need to see the last characters of bunch of text files (or alternatively test whether they are "}" and give a list of files that test negative ). Is there an easy way to do this from the command line.
(Ideally the solution works without reading the whole file from the start because in addition to there being many they can also be quite large.
P.S.: Any answer would be great but I would really appreciate if the function and syntax of everything in the answer can be fully explained.
It can be done fairly easily with tail and then string indexing in bash. For example, you obtain the last line in a file with, tail -n1 file. You will need to store the line in a variable using command-substitution, e.g.
lastln=$(tail -n1 file)
Then it is simply a matter of indexing the last characters, e.g.
echo ${lastln:(-1)}
(note: when indexing from the end of the string, you must put the offset (e.g. -1 in parenthesis (-1) -- or -- you must leave a space before the -1, e.g. echo ${lastln: -1} is also valid.)
You can try this:
for file in file1 file2; do tail -n 1 "$file" | grep -q '}$' || echo "$file"; done
where you should replace file1 file2 with the list of files you want to analyze, e.g. * or the like. Now what happens here? The outer part
for file in file1 file2; do ...; done
is a simple loop over the files, where inside the loop, you can refer to the current file as $file. Then,
tail -n 1 "$file"
prints the last line of the given file and
| grep -q '}$'
redirects the output to grep (turned into silent mode with -q), which looks for '}' immediatly followed by the end of the line ($). The return value of this command can be used to chain another action: when grep returns non-zero (indicating failure, i.e., the pattern is not matched), the last part
|| echo "$file"
is executed, resulting in the list of files you need.

How to extract (read and delete) a line from file with a single command?

I would like to extract the first line from a file, read into a variable and delete right afterwards, with a single command. I know sed can read the first line as follows:
sed '1q' file.txt
or delete it as follows:
sed '1q;d' file.txt
but can I somehow do both with a single command?
The reason for this is that multiple processes will be reading the first line of the file, and I want to minimize the chances of them getting the same line.
It's impossible.
Except you read the manpage, and have Gnu-sed:
echo -e {1..3}"\n" > input
cat input
1
2
3
sed -n '1p;2,$ Woutput' input
1
cat output
2
3
Explanation:
sed -n '1p;2,$ Woutput' input
-n no output by default
1p; print line 1
2,$ from line 2 until $ last line
W (non posix) Write buffer to file
From the man page gnu sed:
w filename
Write the current pattern space to filename.
W filename
Write the first line of the current pattern space to filename. This is a GNU extension.
However, reading and experimenting takes longer, than opening the file in a full blown office suite and deleting the line by hand, or invoking a text-to-speech framework and training it, to do the job.
It doesn't work if invoked in posix style:
sed -n --posix '1p;2,$ Woutput' input
And you still have the hard hanwork of renaming output to input again.
I didn't try to write to input in place, because that could damage my carefully crafted input file - try it on own risk:
sed -n '1p;2,$ Winput' input
However, you might set up a filesystem notify job, which always rename freshly created output files to input again. But I fear you can't do it from within the sed command. Except ... (to be continued)

Using both GNU Utils with Mac Utils in bash

I am working with plotting extremely large files with N number of relevant data entries. (N varies between files).
In each of these files, comments are automatically generated at the start and end of the file and would like to filter these out before recombining them into one grand data set.
Unfortunately, I am using MacOSx, where I encounter some issues when trying to remove the last line of the file. I have read that the most efficient way was to use head/tail bash commands to cut off sections of data. Since head -n -1 does not work for MacOSx I had to install coreutils through homebrew where the ghead command works wonderfully. However the command,
tail -n+9 $COUNTER/test.csv | ghead -n -1 $COUNTER/test.csv >> gfinal.csv
does not work. A less than pleasing workaround was I had to separate the commands, use ghead > newfile, then use tail on newfile > gfinal. Unfortunately, this will take while as I have to write a new file with the first ghead.
Is there a workaround to incorporating both GNU Utils with the standard Mac Utils?
Thanks,
Keven
The problem with your command is that you specify the file operand again for the ghead command, instead of letting it take its input from stdin, via the pipe; this causes ghead to ignore stdin input, so the first pipe segment is effectively ignored; simply omit the file operand for the ghead command:
tail -n+9 "$COUNTER/test.csv" | ghead -n -1 >> gfinal.csv
That said, if you only want to drop the last line, there's no need for GNU head - OS X's own BSD sed will do:
tail -n +9 "$COUNTER/test.csv" | sed '$d' >> gfinal.csv
$ matches the last line, and d deletes it (meaning it won't be output).
Finally, as #ghoti points out in a comment, you could do it all using sed:
sed -n '9,$ {$!p;}' file
Option -n tells sed to only produce output when explicitly requested; 9,$ matches everything from line 9 through (,) the end of the file (the last line, $), and {$!p;} prints (p) every line in that range, except (!) the last ($).
I realize that your question is about using head and tail, but I'll answer as if you're interested in solving the original problem rather than figuring out how to use those particular tools to solve the problem. :)
One method using sed:
sed -e '1,8d;$d' inputfile
At this level of simplicity, GNU sed and BSD sed both work the same way. Our sed script says:
1,8d - delete lines 1 through 8,
$d - delete the last line.
If you decide to generate a sed script like this on-the-fly, beware of your quoting; you will have to escape the dollar sign if you put it in double quotes.
Another method using awk:
awk 'NR>9{print last} NR>1{last=$0}' inputfile
This works a bit differently in order to "recognize" the last line, capturing the previous line and printing after line 8, and then NOT printing the final line.
This awk solution is a bit of a hack, and like the sed solution, relies on the fact that you only want to strip ONE final line of the file.
If you want to strip more lines than one off the bottom of the file, you'd probably want to maintain an array that would function sort of as a buffered FIFO or sliding window.
awk -v striptop=8 -v stripbottom=3 '
{ last[NR]=$0; }
NR > striptop*2 { print last[NR-striptop]; }
{ delete last[NR-striptop]; }
END { for(r in last){if(r<NR-stripbottom+1) print last[r];} }
' inputfile
You specify how much to strip in variables. The last array keeps a number of lines in memory, prints from the far end of the stack, and deletes them as they are printed. The END section steps through whatever remains in the array, and prints everything not prohibited by stripbottom.

Shell script re-directing output with tee command buffers output in some cases and not in others

I've simplified a shell script down to two commands:
Terminal A (Redirect STDIN to a named pipe):
tee -a >>pipe
Terminal B (Read from the pipe used above):
tail -f pipe
The results I don't understand:
Result 1: Start tee, start tail: any input into the first terminal will be buffered and only show up in the 2nd after the tee command is stopped (ctrl-c).
Result 2: Start tee, start tail, stop tee, start tee again: Now only each line is buffered (the result I want). Results show up in terminal 2 at the end of each line of input into terminal 1.
Result 3 (for what it's worth): Start tail first, then tee: same result as #1.
I also wrote a similar script using exec and cat commands and it exhibits the same behavior.
I'm not an expert on this, but the behavior seems straightforward.
Suppose you apply tail to an ordinary text file; it will print the last 10 lines and quit. If you use tail -f, it will print the last 10 lines, then monitor the file; from then on it will print each new line that is appended to the file. This is the line buffering you're looking for.
Now apply tail -f to a named pipe. Whatever you put in the other end is like the initial contents of the file, and tail waits patiently for the end so that it can print the "last" 10 lines. When that process ends, it sends an "end of file" symbol (I don't know what that is, only that it exists) through the pipe, and tail prints-- and starts monitoring. If you then start one or more new processes that write to the pipe, tail takes the new lines as, well, new, and prints them out.
If you want to buffer and print all lines, you could start-and-stop tee to prime the pump, or just use
tail -n +1 -f pipe

Resources