How do I append onto pipes? - bash

So my question is if I can somehow send data to my program and then send the same data AND its result to another program without having to create a temporary file (in my case ouputdata.txt).
Preferably using linux pipes/bash.
I currently do the following:
cat inputdata.txt | ./MyProg > outputdata.txt
cat inputdata.txt outputdata.txt | ./MyProg2

Here is another way, which can be extended to put the output of two programs together:
( Prog1; Prog2; Prog3; ... ) | ProgN
That at least works in Bash.

Choice 1 - fix MyProg to write the merged output from the input and it's own output. Then you can do this.
./MyProg <inputdata.txt | ./MyProg2
Choice 2 - If you can't fix MyProg to write both input and output, you need to merge.
./MyProg <inputdata.txt | cat inputdata.txt - | ./MyProg2

Related

Bash Terminal: Write only specific lines to logfile

I'm running simulation with lots of terminal output, which would exceed my disc space, if I'd save it to a logfile (e.g by "cmd > logfile"). Now I would like to follow the entire terminal output, but at the same time I would like to save specific data/lines/values from this output to a file.
1) Is there a way to that in bash?
2) Or as alternative: Is it possible to save the logfile, extract the needed data and then delete the processed lines to avoid generating a huge logfile?
If you want to save into logfile only the output containing mypattern but you want the see all the output at the terminal, you could issue:
cmd 2>&1 | tee /dev/tty | grep 'mypattern' > logfile
I also assumed that the output of cmd may be directed to the standard output stream as well as to the standard error stream, by adding 2>&1 after cmd.
What criteria are you using to decide which lines to keep?
1
One common filter is to just store stderr.
cmdlist 2>logfile # stdout still to console
2
For a more sophisticated filter, if you have specific patterns you want to save to the log, you can use sed. Here's a simplistic example -
seq 1 100 | sed '/.*[37]$/w tmplog'
This will generate numbers from 1 to 100 and send them all to the console, but capture all numbers that end with 3 or 7 to tmplog. It can also accept more complex lists of commands to help you be more comprehensive -
seq 1 100 | sed '/.*[37]$/w 37.log
/^2/w 37.log'
c.f. the sed manual for more detailed breakdowns.
You probably also want error output, so it might be a good idea to save that too.
seq 1 100 2>errlog | sed '/.*[37]$/w patlog'
3
For a more complex space-saving plan, create a named pipe, and compress the log from that in a background process.
$: mkfifo transfer # creates a queue file on disk that
$: gzip < transfer > log.gz & # reads from FIFO in bg, compresses to log
$: seq 1 100 2>&1 | tee transfer # tee writes one copy to stdout, one to file
This will show all the output as it comes, but also duplicate a copy to the named pipe; gzip will read it from the named pipe and compress it.
3b
You could replace the tee with the sed for double-dipping space reduction if required -
$: mkfifo transfer
$: gzip < transfer > log.gz &
$: seq 1 100 2>&1 | sed '/.*[37]$/w transfer'
I don't really recommend this, as you might filter out something you didn't realize you would need.

Concatenating to stdout, then splitting on stdin - is this possible?

I have am looking to solve the problem of writing a series of very large streams concatenated to stdout, and then reading those streams from stdin again, splitting the streams into their original parts. The limitation I face is that at no time can I create any temporary files on disk.
I tried to use the unxz --singe-stream option, but this isn't having the effect I'm expecting.
To demonstrate what I am trying to achieve, I have two scripts:
user#localhost:~# cat test-source.sh
#!/bin/bash
echo "one" | xz
echo "two" | xz
echo "three" | xz
The above first script is then piped into the second script that is intended to reverse the effect:
user#localhost:~# cat test-sink.sh
#!/bin/bash
unxz --single-stream
unxz --single-stream
unxz --single-stream
The above script is expected to output the following:
one
two
three
Instead I see the following:
user#localhost:~# ./test-source.sh | ./test-sink.sh
one
unxz: (stdin): File format not recognized
unxz: (stdin): File format not recognized
The xz above was just one option I tried, I am open to other suggestions. gzip wants to uncompress the whole stream at once, I need to preserve the boundaries between the streams.
I understand that tar is no good, as it cannot accept a stream to tar from stdin.
Is there any other tool out there that can be used to script this?
I don't know if this will solve your problem or not (since it would require installing some software, which given the nature of this question is maybe not an option), but you inspired to hack together something that does exactly what you were describing:
https://github.com/larsks/muxdemux
You can iteratively produce an output stream from several chunks, as in:
echo "one" | xz | mux
echo "two" | xz | mux
echo "three" | xz | mux
And then pass that to a demux command on the other side to extract the individual components. E.g, a trivial example:
$ (
echo "one" | xz | mux
echo "two" | xz | mux
echo "three" | xz | mux
) | demux -v
INFO:demux:processing stream 0 to stream-0.out
INFO:demux:processing stream 1 to stream-1.out
INFO:demux:processing stream 2 to stream-2.out
This takes the input streams and produces three files in your current directory.
It does other other things, too, like optionally adding a sha256 hash
to each stream for data integrity verification.
As an alternative tool I came up with tarmux, which provides a multiplexer / demultiplexer written in C and based on the tar file format provided by libarchive.
https://github.com/minfrin/tarmux
The test scripts now look like this:
Little-Net:trunk minfrin$ cat ./test-source.sh
#!/bin/bash
echo "one" | tarmux
echo "two" | tarmux
echo "three" | tarmux
And this:
Little-Net:trunk minfrin$ cat ./test-sink.sh
#!/bin/bash
tardemux
tardemux
tardemux
The output of tardemux can be piped into other commands, and at no point does a file touch a disk.
Given your source script script, if I run:
sh test-source.sh | unxz
I get as output:
one
two
three
That seems to be the behavior you're asking for. Your attempt at running unxz --single-stream several times doesn't work because the first unxz process consumes all of the input, even though it only extracts the first stream.

Duplicate stdin to stdout

I am looking for a bash one-liner that duplicates stdin to stdout without interleaving. The only solution I have found so far is to use tee, but that does produced interleaved output. What do I mean by this:
If e.g. a file f reads
a
b
I would like to execute
cat f | HERE_BE_COMMAND
to obtain
a
b
a
b
If I use tee - as the command, the output typically looks something like
a
a
b
b
Any suggestions for a clean solution?
Clarification
The cat f command is just an example of where the input can come from. In reality, it is a command that can (should) only be executed once. I also want to refrain from using temporary files, as the processed data is sort of sensitive and temporary files are always error-prone when the executed command gets interrupted. Furthermore, I am not interested in a solution that involves additional scripts (as stated above, it should be a one-liner) or preparatory commands that need to be executed prior to the actual duplication command.
Solution 1:
<command_which_produces_output> | { a="$(</dev/stdin)"; echo "$a"; echo "$a"; }
In this way, you're saving the content from the standard input in a (choose a better name please), and then echo'ing twice.
Notice $(</dev/stdin) is a similar but more efficient way to do $(cat /dev/stdin).
Solution 2:
Use tee in the following way:
<command_which_produces_output> | tee >(echo "$(</dev/stdin)")
Here, you're firstly writing to the standard output (that's what tee does), and also writing to a FIFO file created by process substitution:
>(echo "$(</dev/stdin)")
See for example the file it creates in my system:
$ echo >(echo "$(</dev/stdin)")
/dev/fd/63
Now, the echo "$(</dev/stdin)" part is just the way I found to firstly read the entire file before printing it. It echo'es the content read from the process substitution's standard input, but once all the input is read (not like cat that prints line by line).
Store the second input in a temp file.
cat f | tee /tmp/showlater
cat /tmp/showlater
rm /tmp/showlater
Update:
As shown in the comments (#j.a.) the solution above will need to be adjusted into the OP's real needs. Calling will be easier in a function and what do you want to do with errors in your initial commands and in the tee/cat/rm ?
I recommend tee /dev/stdout.
cat f | tee /dev/stdout
One possible solution I found is the following awk command:
awk '{d[NR] = $0} END {for (i=1;i<=NR;i++) print d[i]; for (i=1;i<=NR;i++) print d[i]}'
However, I feel there must be a more "canonical" way of doing this using.
a simple bash script ?
But this will store all the stdin, why not store the output to a file a read the file both if you need ?
full=""
while read line
do
echo "$line"
full="$full$line\n"
done
printf $full
The best way would be to store the output in a file and show it later on. Using tee has the advantage of showing the output as it comes:
if tmpfile=$(mktemp); then
commands | tee "$tmpfile"
cat "$tmpfile"
rm "$tmpfile"
else
echo "Error creating temporary file" >&2
exit 1
fi
If the amount of output is limited, you can do this:
output=$(commands); echo "$output$output"

Why piping to the same file doesn't work on some platforms?

In cygwin, the following code works fine
$ cat junk
bat
bat
bat
$ cat junk | sort -k1,1 |tr 'b' 'z' > junk
$ cat junk
zat
zat
zat
But in the linux shell(GNU/Linux), it seems that overwriting doesn't work
[41] othershell: cat junk
cat
cat
cat
[42] othershell: cat junk |sort -k1,1 |tr 'c' 'z'
zat
zat
zat
[43] othershell: cat junk |sort -k1,1 |tr 'c' 'z' > junk
[44] othershell: cat junk
Both environments run BASH.
I am asking this because sometimes after I do text manipulation, because of this caveat, I am forced to make the tmp file. But I know in Perl, you can give "i" flag to overwrite the original file after some operations/manipulations. I just want to ask if there is any foolproof method in unix pipeline to overwrite the file that I am not aware of.
Four main points here:
"Useless use of cat." Don't do that.
You're not actually sorting anything with sort. Don't do that.
Your pipeline doesn't say what you think it does. Don't do that.
You're trying to over-write a file in-place while reading from it. Don't do that.
One of the reasons you are getting inconsistent behavior is that you are piping to a process that has redirection, rather than redirecting the output of the pipeline as a whole. The difference is subtle, but important.
What you want is to create a compound command with Command Grouping, so that you can redirect the input and output of the whole pipeline. In your case, this should work properly:
{ sort -k1,1 | tr 'c' 'z'; } < junk > sorted_junk
Please note that without anything to sort, you might as well skip the sort command too. Then your command can be run without the need for command grouping:
tr 'c' 'z' < junk > sorted_junk
Keep redirections and pipelines as simple as possible. It makes debugging your scripts much easier.
However, if you still want to abuse the pipeline for some reason, you could use the sponge utility from the moreutils package. The man page says:
sponge reads standard input and writes it out to the specified
file. Unlike a shell redirect, sponge soaks up all its input before
opening the output file. This allows constricting pipelines that read
from and write to the same file.
So, your original command line can be re-written like this:
cat junk | sort -k1,1 | tr 'c' 'z' | sponge junk
and since junk will not be overwritten until sponge receives EOF from the pipeline, you will get the results you were expecting.
In general this can be expected to break. The processes in a pipeline are all started up in parallel, so the > junk at the end of the line will usually truncate your input file before the process at the head of the pipelining has finished (or even started) reading from it.
Even if bash under Cygwin let's you get away with this you shouldn't rely on it. The general solution is to redirect to a temporary file and then rename it when the pipeline is complete.
You want to edit that file, you can just use the editor.
ex junk << EOF
%!(sort -k1,1 |tr 'b' 'z')
x
EOF
Overriding the same file in pipeline is not advice, because when you do the mistake you can't get it back (unless you've the backup or it's the under version control).
This happens, because the input and output in pipeline is automatically buffered (which gives you an impression it works), but it actually it's running in parallel. Different platforms could buffer the output in different way (based on the settings), so on some you end up with empty file (because the file would be created at the start), on some other with half-finished file.
The solution is to use some method when the file is only overridden when it encounters an EOF with full buffered and processed input.
This can be achieved by:
Using utility which can soaks up all its input before opening the output file.
This can either be done by sponge (as opposite of unbuffer from expect package).
Avoid using I/O redirection syntax (which can create the empty file before starting the command).
For example using tee (which buffers its standard streams), for example:
cat junk | sort | tee junk
This would only work with sort, because it expects all the input to process the sorting. So if your command doesn't use sort, add one.
Another tool which can be used is stdbuf which modifies buffering operations for its standard streams where you can specify the buffer size.
Use text processor which can edit files in-place (such as sed or ex).
Example:
$ ex -s +'%!sort -k1' -cxa myfile.txt
$ sed -i '' s/foo/bar/g myfile.txt
Using the following simple script, you can make it work like you want to:
$ cat junk | sort -k1,1 |tr 'b' 'z' | overwrite_file.sh junk
overwrite_file.sh
#!/usr/bin/env bash
OUT=$(cat -)
FILENAME="$*"
echo "$OUT" | tee "$FILENAME"
Note that if you don't want the updated file to be send to stdout, you can use this approach instead
overwrite_file_no_output.sh
#!/usr/bin/env bash
OUT=$(cat -)
FILENAME="$*"
echo "$OUT" > "$FILENAME"

How to run the first process from a list in a file deleting the first line as if the file was a queue and I called "pop"?

How to run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?
I'd like to call the first command listed in a simple text file with \n as the separator in a pop-like fashion:
Figure 1:
cmdqueue.lst :
proc_C1
proc_C2
proc_C3
.
.
Figure 2:
Pop the first command via popcmd:
proc_A | proc_B | popcmd cmdqueue.lst | proc_D
Figure 3:
cmdqueue.lst :
proc_C2
proc_C3
proc_C4
.
.
Ooh, that's an amusing one-liner.
Okay, here's the deal. What you want is a program that, when called, prints the first line of the file to stdout, then delete that line from the file. Sounds like a job for sed(1).
Try
proc_A | proc_B | `(head -1 cmdstack.lst; sed -i -e '1d' cmdstack.lst)` | proc_D
I'm sure that someone who had already had their coffee could change the sed program to not need the head(1) call, but that works, and shows off using a subshell ("( foo )" runs in a sub-process.)
pop-cmd.py:
#!/usr/bin/env python
import os, shlex, sys
from subprocess import call
filename = sys.argv[1]
lines = open(filename).readlines()
if lines:
command = lines[0].rstrip()
open(filename, "w").writelines(lines[1:])
if command:
sys.exit(call(shlex.split(command) + sys.argv[2:]))
Example:
proc_A | proc_B | python pop-cmd.py cmdstack.lst | proc_D
I assume that you are constantly appending to the file also, so rewriting the file puts you in danger of overwriting data. For this type of task I think you would be better using individual files for each queue entry, using date/time to determine order, and then as you process each file you could append the data to a log file and then delete the trigger file.
Really need more information in order to suggest a good solution. It's important to know how the file is getting updated. Is it a lot of separate processes, just one process, etc.
I think you would need to rewrite the file - e.g. run a command to list all lines but the first, write that to a temporary file and rename it to the original. That could be done using tail or awk or perl depending on the commands you have available.
If you want to treat a file like a stack, then a better approach would be to have the top of the stack at the end of the file.
Thus you can easily cut off the file at the beginning of the last line (= pop), and simply append to the file as you push.
You can use a little bash script; name it "popcmd":
#!/bin/bash
cmd=`head -n 1 $1`
tail -n +2 $1 > ~tmp~
mv -f ~tmp~ $1
$cmd
edit: Using sed for the middle two lines, like Charlie Martin showed, is much more elegant, of course:
#!/bin/bash
cmd=`head -n 1 $1`
sed -i -e '1d' $1
$cmd
edit: You can use this exactly as in your example usage code:
proc_A | proc_B | popcmd cmdstack.lst | proc_D
You can't write to the beginning of a file, so cutting out line 1 would be a lot of work (rewrite the rest of the file (which isn't actually that much work for the programmer (it's what every other answer post has written for you :) ) ) ).
I'd recommend keeping the whole thing in memory and using a classic stack rather than a file.

Resources