How do these process substitutions work? - bash

Can someone please explain how these process substitutions are working.
(echo "YES")> >(read str; echo "1:${str}:first";)> >(read sstr; echo "2:$sstr:two")> >(read ssstr; echo "3:$ssstr:three")
Output
1:2:3:YES:three:two:first
I've figured out, that the 'ssstr'-Substitution got FD 60, sstr FD 61 and str FD 62. (right to left)
But how is (echo "YES") connected to input of FD60, and output of FD60 with input FD61 and so on and finally FD62 prints out on Terminal ?
All against the direction of the two redirections.
How are they nested, and how connected ?
Makes me crazy.
Ty.

First off, don't actually write code like this :)
The process substitutions are the constructs >(...). The (...)> isn't a specific construct; it's just a subshell followed by an output redirection.
This example is a single command (echo "YES") followed by three output redirections
> >(read str; echo "1:${str}:first";)
> >(read sstr; echo "2:$sstr:two")
> >(read ssstr; echo "3:$ssstr:three")
The last one is the one that is actually applied to the original command; something like echo word >foo >bar >baz would create all three files, but the output of echo would only be written to baz.
Similarly, all three process substitutions start a new process, but the output YES is only written to the last one. So read ssstr gets its input from echo YES.
At this point, I think you are seeing what amounts to undefined behavior. The three process substitutions run in the reverse order they were created, as if the OS pushed each process on to a stack as the next one is created, then schedules them by popping them off the stack, but I don't think that order is guaranteed by anything.
In each case, though, the standard input of each process substitution is fixed to the standard output of the command, which is whichever other process substitution just ran. In other words, the command ends up being similar to
echo YES | {
read ssstr
echo "3:$ssstr:three" | {
read sstr
echo "2:$sstr:two" | {
read str
echo "1:$str:one"
}
}
}

Related

Synchronized Output With Bash's Process Substitution

I have to multiply call an inflexible external tool that takes as arguments some input data and an output file to which it will write the processed data, for example:
some_prog() { echo "modified_$1" > "$2"; }
For varying input, I want to call some_prog, filter the output and write the output of all calls into the same file "out_file". Additionally, I want to add a header line to the output file before each call of some_prog. Given the following dummy filter:
slow_filter() {
read input; sleep "0.000$(($RANDOM % 10))"; echo "filtered_$input"
}
I wrote the following code:
rm -f out_file
for input in test_input{1..8}; do
echo "#Header_for_$input" >> "out_file"
some_prog $input >( slow_filter >> "out_file" )
done
However, this will produce an out_file like this:
#Header_for_test_input1
#Header_for_test_input2
#Header_for_test_input3
#Header_for_test_input4
#Header_for_test_input5
#Header_for_test_input6
#Header_for_test_input7
#Header_for_test_input8
filtered_modified_test_input4
filtered_modified_test_input1
filtered_modified_test_input2
filtered_modified_test_input5
filtered_modified_test_input6
filtered_modified_test_input3
filtered_modified_test_input8
filtered_modified_test_input7
The output I expected was:
#Header_for_test_input1
filtered_modified_test_input1
#Header_for_test_input2
filtered_modified_test_input2
#Header_for_test_input3
filtered_modified_test_input3
#Header_for_test_input4
filtered_modified_test_input4
#Header_for_test_input5
filtered_modified_test_input5
#Header_for_test_input6
filtered_modified_test_input6
#Header_for_test_input7
filtered_modified_test_input7
#Header_for_test_input8
filtered_modified_test_input8
I realized that the >( ) process substitution forks the shell. Is there a way to synchronize the output of the subshells? Or is there another elegant solution to this problem? I want to avoid the obvious approach of writing to different files in each iteration because, in my code, the for loop has a few 100,000 iterations.
Write the header inside the process substitution, specifically in a command group with the filter so that the concatenated output is written to out_file as one stream.
rm -f out_file
for input in test_input{1..8}; do
some_prog "$input" >( { echo "#Header_for_$input"; slow_filter; } >> "out_file" )
done
As process substitution is truly asynchronous and there doesn't appear to be a way to wait for it to complete before executing the next iteration of the loop, I would use an explicit named pipe.
rm -f out_file pipe
mkfifo pipe
for input in test_input{1..8}; do
some_prog "$input" pipe &
echo "#Header_for_$input" >> out_file
slow_filter < pipe >> out_file
done
(If some_prog doesn't work with a named pipe for some reason, you can use a regular file. In that case, you shouldn't run the command in the background.)
Since chepner's approach using a named pipe seems to be very slow in my "real world script" (about 10 times slower than this solution), the easiest and safest way to achieve what I want seems to be a temporary file:
rm -f out_file
tmp_file="$(mktemp --tmpdir my_temp_XXXXX.tmp)"
for input in test_input{1..8}; do
some_prog "$input" "$tmp_file"
{
echo "#Header_for_$input"
slow_filter < "$tmp_file"
} >> out_file
done
rm "$tmp_file"
This way, the temporary file tmp_file gets overwritten in each iteration such that it can be kept in memory if the system's temp directory is a RAM disk.

How do I prepend to a stream in Bash?

Suppose I have the following command in bash:
one | two
one runs for a long time producing a stream of output and two performs a quick operation on each line of that stream, but two doesn't work at all unless the first value it reads tells it how many values to read per line. one does not output that value, but I know what it is in advance (let's say it's 15). I want to send a 15\n through the pipe before the output of one. I do not want to modify one or two.
My first thought was to do:
echo "$(echo 15; one)" | two
That gives me the correct output, but it doesn't stream through the pipe at all until the command one finishes. I want the output to start streaming right away through the pipe, since it takes a long time to execute (months).
I also tried:
echo 15; one | two
Which, of course, outputs 15, but doesn't send it through the pipe to two.
Is there a way in bash to pass '15\n' through the pipe and then start streaming the output of one through the same pipe?
You just need the shell grouping construct:
{ echo 15; one; } | two
The spaces around the braces and the trailing semicolon are required.
To test:
one() { sleep 5; echo done; }
two() { while read line; do date "+%T - $line"; done; }
{ printf "%s\n" 1 2 3; one; } | two
16:29:53 - 1
16:29:53 - 2
16:29:53 - 3
16:29:58 - done
Use command grouping:
{ echo 15; one; } | two
Done!
You could do this with sed:
Example 'one' script, emits one line per second to show it's line buffered and running.
#!/bin/bash
while [ 1 ]; do
echo "TICK $(date)"
sleep 1
done
Then pipe that through this sed command, note that for your specific example 'ArbitraryText' will be the number of fields. I used ArbitraryText so that it's obvious that this is the inserted text. On OSX, -l is unbuffered with GNU Sed I believe it's -u
$ ./one | sed -l '1i\
> ArbitraryText
> '
What this does is it instructs sed to insert one line before processing the rest of your file, everything else will pass through untouched.
The end result is processed line-by-line without chunk buffering (or, waiting for the input script to finish)
ArbitraryText
TICK Fri Jun 28 13:26:56 PDT 2013
...etc
You should be able to then pipe that into 'two' as you would normally.

Bash add to end of file (>>) if not duplicate line

Normally I use something like this for processes I run on my servers
./runEvilProcess.sh >> ./evilProcess.log
However I'm currently using Doxygen and it produces lots of duplicate output
Example output:
QGDict::hashAsciiKey: Invalid null key
QGDict::hashAsciiKey: Invalid null key
QGDict::hashAsciiKey: Invalid null key
So you end up with a very messy log
Is there a way I can only add the line to the log file if the line wasn't the last one added.
A poor example (but not sure how to do in bash)
$previousLine = ""
$outputLine = getNextLine()
if($previousLine != $outputLine) {
$outputLine >> logfile.log
$previousLine = $outputLine
}
If the process returns duplicate lines in a row, pipe the output of your process through uniq:
$ ./t.sh
one
one
two
two
two
one
one
$ ./t.sh | uniq
one
two
one
If the logs are sent to the standard error stream, you'll need to redirect that too:
$ ./yourprog 2>&1 | uniq >> logfile
(This won't help if the duplicates come from multiple runs of the program - but then you can pipe your log file through uniq when reviewing it.)
Create a filter script (filter.sh):
while read line; do
if [ "$last" != "$line" ]; then
echo $line
last=$line
fi
done
and use it:
./runEvilProcess.sh | sh filter.sh >> evillog

Can you echo F-Keys through a Pipe to another program?

I am trying to write a short bash hack that requires piping keystrokes of the F-Keys
basically what I am trying to do is:
(echo "1"; "for x in 1..9; do echo "123<F1>34<F3>"; done; echo "<F1>")|./program
where is the F-key with that #
is this possible? if so can some one point me to the docs, or something
Depending on your terminal, a function-key is just a sequence of characters. You can see what they are with cat:
$ cat
^[OP
^[OQ
^[OR
This is me hitting F1, F2, F3 in sequence. So to echo them into your program, you can just echo those control codes (note the first one there is ctrl-ESC), and you should be all set.
To expand on zigdon's answer, you can use tput to produce the right sequences for your terminal:
f1=$(tput kf1)
f2=$(tput kf2)
# ...
(echo 1; for x in {1..9}; do echo "123${f1}34$f3"; done; echo "$f1") | ./program
The "kf1" and "kf2" names are the terminfo capabilities for the F1 and F2 keys, respectively.

Capturing multiple line output into a Bash variable

I've got a script 'myscript' that outputs the following:
abc
def
ghi
in another script, I call:
declare RESULT=$(./myscript)
and $RESULT gets the value
abc def ghi
Is there a way to store the result either with the newlines, or with '\n' character so I can output it with 'echo -e'?
Actually, RESULT contains what you want — to demonstrate:
echo "$RESULT"
What you show is what you get from:
echo $RESULT
As noted in the comments, the difference is that (1) the double-quoted version of the variable (echo "$RESULT") preserves internal spacing of the value exactly as it is represented in the variable — newlines, tabs, multiple blanks and all — whereas (2) the unquoted version (echo $RESULT) replaces each sequence of one or more blanks, tabs and newlines with a single space. Thus (1) preserves the shape of the input variable, whereas (2) creates a potentially very long single line of output with 'words' separated by single spaces (where a 'word' is a sequence of non-whitespace characters; there needn't be any alphanumerics in any of the words).
Another pitfall with this is that command substitution — $() — strips trailing newlines. Probably not always important, but if you really want to preserve exactly what was output, you'll have to use another line and some quoting:
RESULTX="$(./myscript; echo x)"
RESULT="${RESULTX%x}"
This is especially important if you want to handle all possible filenames (to avoid undefined behavior like operating on the wrong file).
In case that you're interested in specific lines, use a result-array:
declare RESULT=($(./myscript)) # (..) = array
echo "First line: ${RESULT[0]}"
echo "Second line: ${RESULT[1]}"
echo "N-th line: ${RESULT[N]}"
In addition to the answer given by #l0b0 I just had the situation where I needed to both keep any trailing newlines output by the script and check the script's return code.
And the problem with l0b0's answer is that the 'echo x' was resetting $? back to zero... so I managed to come up with this very cunning solution:
RESULTX="$(./myscript; echo x$?)"
RETURNCODE=${RESULTX##*x}
RESULT="${RESULTX%x*}"
Parsing multiple output
Introduction
So your myscript output 3 lines, could look like:
myscript() { echo $'abc\ndef\nghi'; }
or
myscript() { local i; for i in abc def ghi ;do echo $i; done ;}
Ok this is a function, not a script (no need of path ./), but output is same
myscript
abc
def
ghi
Considering result code
To check for result code, test function will become:
myscript() { local i;for i in abc def ghi ;do echo $i;done;return $((RANDOM%128));}
1. Storing multiple output in one single variable, showing newlines
Your operation is correct:
RESULT=$(myscript)
About result code, you could add:
RCODE=$?
even in same line:
RESULT=$(myscript) RCODE=$?
Then
echo $RESULT $RCODE
abc def ghi 66
echo "$RESULT"
abc
def
ghi
echo ${RESULT#Q}
$'abc\ndef\nghi'
printf '%q\n' "$RESULT"
$'abc\ndef\nghi'
but for showing variable definition, use declare -p:
declare -p RESULT RCODE
declare -- RESULT="abc
def
ghi"
declare -- RCODE="66"
2. Parsing multiple output in array, using mapfile
Storing answer into myvar variable:
mapfile -t myvar < <(myscript)
echo ${myvar[2]}
ghi
Showing $myvar:
declare -p myvar
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
Considering result code
In case you have to check for result code, you could:
RESULT=$(myscript) RCODE=$?
mapfile -t myvar <<<"$RESULT"
declare -p myvar RCODE
declare -a myvar=([0]="abc" [1]="def" [2]="ghi")
declare -- RCODE="40"
3. Parsing multiple output by consecutives read in command group
{ read firstline; read secondline; read thirdline;} < <(myscript)
echo $secondline
def
Showing variables:
declare -p firstline secondline thirdline
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
I often use:
{ read foo;read foo total use free foo ;} < <(df -k /)
Then
declare -p use free total
declare -- use="843476"
declare -- free="582128"
declare -- total="1515376"
Considering result code
Same prepended step:
RESULT=$(myscript) RCODE=$?
{ read firstline; read secondline; read thirdline;} <<<"$RESULT"
declare -p firstline secondline thirdline RCODE
declare -- firstline="abc"
declare -- secondline="def"
declare -- thirdline="ghi"
declare -- RCODE="50"
After trying most of the solutions here, the easiest thing I found was the obvious - using a temp file. I'm not sure what you want to do with your multiple line output, but you can then deal with it line by line using read. About the only thing you can't really do is easily stick it all in the same variable, but for most practical purposes this is way easier to deal with.
./myscript.sh > /tmp/foo
while read line ; do
echo 'whatever you want to do with $line'
done < /tmp/foo
Quick hack to make it do the requested action:
result=""
./myscript.sh > /tmp/foo
while read line ; do
result="$result$line\n"
done < /tmp/foo
echo -e $result
Note this adds an extra line. If you work on it you can code around it, I'm just too lazy.
EDIT: While this case works perfectly well, people reading this should be aware that you can easily squash your stdin inside the while loop, thus giving you a script that will run one line, clear stdin, and exit. Like ssh will do that I think? I just saw it recently, other code examples here: https://unix.stackexchange.com/questions/24260/reading-lines-from-a-file-with-bash-for-vs-while
One more time! This time with a different filehandle (stdin, stdout, stderr are 0-2, so we can use &3 or higher in bash).
result=""
./test>/tmp/foo
while read line <&3; do
result="$result$line\n"
done 3</tmp/foo
echo -e $result
you can also use mktemp, but this is just a quick code example. Usage for mktemp looks like:
filenamevar=`mktemp /tmp/tempXXXXXX`
./test > $filenamevar
Then use $filenamevar like you would the actual name of a file. Probably doesn't need to be explained here but someone complained in the comments.
How about this, it will read each line to a variable and that can be used subsequently !
say myscript output is redirected to a file called myscript_output
awk '{while ( (getline var < "myscript_output") >0){print var;} close ("myscript_output");}'

Resources