Difference between x | y and y <(x) in bash? - bash

Is there a difference between command1 | command2 and command2 <(command1)?
For example, git diff | more vs more <(git diff)
My understanding is that both take the stdout of command2 and pipe it to the stdin of command1.

The main difference is that <(...), called "process substitution", is translated by the shell into a filename that is passed as a regular argument to the command; it doesn't send anything to the command's standard input. This means that it can't be used directly with commands such as tr which don't take a filename argument:
$ tr a-z A-Z <(echo hello)
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
However, you can always put another < in front of the <(...) to turn it into an input redirection instead:
$ tr a-z A-Z < <(echo hello)
HELLO
And because it generates a filename, you can use process substitution with commands that take more than one file argument:
$ diff -u <(echo $'foo\nbar\nbaz') <(echo $'foo\nbaz\nzoo')
--- /dev/fd/63 2016-07-15 14:48:52.000000000 -0400
+++ /dev/fd/62 2016-07-15 14:48:52.000000000 -0400
## -1,3 +1,3 ##
foo
-bar
baz
+zoo
The other significant difference is that a pipe creates subshells which can't have side effects in the parent environment:
$ echo hello | read x
$ echo $x
# nothing - x is not set
But with process substitution, only the process inside the parentheses is in a subshell; the surrounding command can still have side effects:
$ read x < <(echo hello)
$ echo $x
hello
Worth mentioning that you can also write into a process with >(...), although there are fewer cases where that's useful:
$ echo hello > >(cat)
hello

a | b takes the stdout output from executable a and feeds it to executable b as b's stdin.
a > b takes the stdout from executable a and redirects/writes it out to file b.
a < b takes the contents of file b and redirects/inputs it to executable a as its stdin
In other words, | pipes output between programs, while < and > pipes files into/out of programs.
Your version with () runs an extra process, while accomplishing essentially the same thing.

Related

What is the meaning of the number of + signs in stderr in Bash when "set -x"

In Bash, you can see
set --help
-x Print commands and their arguments as they are executed.
Here's test code:
# make script
echo '
#!/bin/bash
set -x
n=$(echo "a" | wc -c)
for i in $(seq $n)
do
file=test_$i.txt
eval "ls -l | head -$i"> $file
rm $file
done
' > test.sh
# execute
chmod +x test.sh
./test.sh 2> stderr
# check
cat stderr
Output
+++ echo a
+++ wc -c
++ n=2
+++ seq 2
++ for i in $(seq $n)
++ file=test_1.txt
++ eval 'ls -l | head -1'
+++ ls -l
+++ head -1
++ rm test_1.txt
++ for i in $(seq $n)
++ file=test_2.txt
++ eval 'ls -l | head -2'
+++ ls -l
+++ head -2
++ rm test_2.txt
What is the meaning of the number of + signs at the beginning of each row in the file? It's kind of obvious, but I want to avoid misinterpreting.
In addition, can a single + sign appear there? If so, what is the meaning of it?
The number of + represents subshell nesting depth.
Note that the entire test.sh script is being run in a subshell because it doesn't begin with #!/bin/bash. This has to be on the first line of the script, but it's on the second line because you have a newline at the beginning of the echo argument that contains the script.
When a script is run this way, it's executed by the original shell in a subshell, approximately like
( source test.sh )
Change that to
echo '#!/bin/bash
set -x
n=$(echo "a" | wc -c)
for i in $(seq $n)
do
file=test_$i.txt
eval "ls -l | head -$i"> $file
rm $file
done
' > test.sh
and the top-level commands being run in the script will have a single +.
So for example the command
n=$(echo "a" | wc -c)
produces the output
++ echo a
++ wc -c
+ n=' 2'
echo a and wc -c are executed in the subshell created for the command substitution, so they get two +, while n=<result> is executed in the original shell with a single +.
From man bash:
-x
After expanding each simple command, for command, case command, select command, or arithmetic for command, display the expanded value of PS4, followed by the command and its expanded arguments or associated word list.
So what's PS4 here?
PS4
The value of this parameter is expanded as with PS1 and the value is printed before each command bash displays during an execution trace. The first character of the expanded value of PS4 is replicated multiple times, as necessary, to indicate multiple levels of indirection. The default is + .
The meaning of "indirection" is not further explained, as far as I can find...

replacing newlines with the string '\n' with POSIX tools

Yes I know there are a number of questions (e.g. (0) or (1)) which seem to ask the same, but AFAICS none really answers what I want.
What I want is, to replace any occurrence of a newline (LF) with the string \n, with no implicitly assumed newlines... and this with POSIX only utilities (and no GNU extensions or Bashisms) and input read from stdin with no buffering of that is desired.
So for example:
printf 'foo' | magic
should give foo
printf 'foo\n' | magic
should give foo\n
printf 'foo\n\n' | magic
should give foo\n\n
The usually given answers, don't do this, e.g.:
awk
printf 'foo' | awk 1 ORS='\\n gives foo\n, whereas it should give just foo
so adds an \n when there was no newline.
sed
would work for just foo but in all other cases, like:
printf 'foo\n' | sed ':a;N;$!ba;s/\n/\\n/g' gives foo, whereas it should give foo\n
misses one final newline.
Since I do not want any sort of buffering, I cannot just look whether the input ended in an newline and then add the missing one manually.
And anyway... it would use GNU extensions.
sed -z 's/\n/\\n/g'
does work (even retains the NULs correctly), but again, GNU extension.
tr
can only replace with one character, whereas I need two.
The only working solution I'd have so far is with perl:
perl -p -e 's/\n/\\n/'
which works just as desired in all cases, but as I've said, I'd like to have a solution for environments where just the basic POSIX utilities are there (so no Perl or using any GNU extensions).
Thanks in advance.
The following will work with all POSIX versions of the tools being used and with any POSIX text permissible characters as input whether a terminating newline is present or not:
$ magic() { { cat -u; printf '\n'; } | awk -v ORS= '{print sep $0; sep="\\n"}'; }
$ printf 'foo' | magic
foo$
$ printf 'foo\n' | magic
foo\n$
$ printf 'foo\n\n' | magic
foo\n\n$
The function first adds a newline to the incoming piped data to ensure that what awk is reading is a valid POSIX text file (which must end in a newline) so it's guaranteed to work in all POSIX compliant awks and then the awk command discards that terminating newline that we added and replaces all others with "\n" as required.
The only utility above that has to process input without a terminating newline is cat, but POSIX just talks about "files" as input to cat, not "text files" as in the awk and sed specs, and so every POSIX-compliant version of cat can handle input without a terminating newline.
You can (I think) do this with pure POSIX shell. I am assuming you are working with text, not arbitrary binary data that can include null bytes.
magic () {
while read x; do
printf '%s\\n' "$x"
done
printf '%s' "$x"
}
read assumes POSIX text lines (terminated with a newline), but it still populates x with anything it reads until the end of its input when no linefeed is seen. So as long as read succeeds, you have a proper line (minus the linefeed) in x that you can write back, but with a literal \n instead of a linefeed.
Once the loop breaks, output whatever (if anything) in x after the failed read, but without a trailing literal \n.
$ [ "$(printf foo | magic)" = foo ] && echo passed
passed
$ [ "$(printf 'foo\n' | magic)" = 'foo\n' ] && echo passed
passed
$ [ "$(printf 'foo\n\n' | magic)" = 'foo\n\n' ] && echo passed
passed
Here is a tr + sed solution that should work on any POSIX shell as it doesn't call any gnu utility:
printf 'foo' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo
printf 'foo\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n
printf 'foo\n\n' | tr '\n' '\7' | sed 's/\x7/\\n/g'
foo\n\n
Details:
tr command replaces each line break with \x07
sed command replace each \x07 with \\n

How can I pass input to output in bash?

I am trying to streamline a README, where I can easily pass commands and their outputs to a document. This step seems harder than I thought it would be.
I am trying to pass the input and output to a file, but everything I am trying just either displays echo test or test
The latest iteration, which is becoming absurd is:
echo test | xargs echo '#' | cat <(echo) <(cat -) just shows # test
I would like the results to be:
echo test
# test
You can make a bash function to demonstrate a command and its output like this:
democommand() {
printf '#'
printf ' %q' "$#"
printf '\n'
"$#"
}
This prints "#", then each argument the function was passed (i.e. the command and its arguments) with a space before each one (and the %q makes it quote/escape them as needed), then a newline, and then finally it runs all of its arguments as a command. Here's an example:
$ democommand echo test
# echo test
$ democommand ls
# ls
Desktop Downloads Movies Pictures Sites
Documents Library Music Public
Now, as for why your command didn't work... well, I'm not clear what you thought it was doing, but here's what it's actually doing:
The first command in the pipeline, echo test, simply prints the string "test" to its standard output, which is piped to the next command in the chain.
'xargs echo '#'takes its input ("test") and adds it to the command it's given (echo '#') as additional arguments. Essentially, it executes the commandecho '#' test`. This outputs "# test" to the next command in the chain.
cat <(echo) <(cat -) is rather complicated, so let's break it down:
echo prints a blank line
cat - simply copies its input (which is at this point in the pipeline is still coming from the output of the xargs command, i.e. "# test").
cat <(echo) <(cat -) takes the output of those two <() commands and concatenates them together, resulting in a blank line followed by "# test".
Pass the command as a literal string so that you can both print and evaluate it:
doc() { printf '$ %s\n%s\n' "$1" "$(eval "$1")"; }
Running:
doc 'echo foo | tr f c' > myfile
Will make myfile contain:
$ echo foo | tr f c
coo

Pipe stdout to command which itself needs to read from own stdin

I would like to get the stdout from a process into another process not using stdin, as that one is used for another purpose.
In short I want to accomplish something like that:
echo "a" >&4
cat | grep -f /dev/fd/4
I got it running using an file as source for file descriptor 4, but that is not what I want:
# Variant 1
cat file | grep -f /dev/fd/4 4<pattern
# Variant 2
exec 4<pattern
cat | grep -f /dev/fd/4
exec 4<&-
My best try is that, but I got the following error message:
# Variant 3
cat | (
echo "a" >&4
grep -f /dev/fd/4
) <&4
Error message:
test.sh: line 5: 4: Bad file descriptor
What is the best way to accomplish that?
You don't need to use multiple streams to do this:
$ printf foo > pattern
$ printf '%s\n' foo bar | grep -f pattern
foo
If instead of a static file you want to use the output of a command as the input to -f you can use a process substitution:
$ printf '%s\n' foo bar | grep -f <(echo foo)
foo
For POSIX shells that lack process substitution, (e.g. dash, ash, yash, etc.).
If the command allows string input, (grep allows it), and the input string containing search targets isn't especially large, (i.e. the string doesn't exceed the length limit for the command line), there's always command substitution:
$ printf '%s\n' foo bar baz | grep $(echo foo)
foo
Or if the input file is multi-line, separating quoted search items with '\n' works the same as grep OR \|:
$ printf '%s\n' foo bar baz | grep "$(printf "%s\n" foo bar)"
foo
bar

Get exact output of a shell command

The bash manual says regarding command substitution:
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted.
Demonstration - 3 characters, newlines first:
$ output="$(printf "\n\nx")"; echo -n "$output" | wc -c
3
Here the newlines are not at the end, and do not get removed, so the count is 3.
Demonstration - 3 characters, newlines last:
$ output="$(printf "x\n\n")"; echo -n "$output" | wc -c
1
Here the newlines are removed from the end, so the count is 1.
TL;DR
What is a robust work-around to get the binary-clean output of a command into a variable?
Bonus points for Bourne shell compatibility.
The only way to do it in a "Bourne compatible" way is to use external utilities.
Beside writting one in c, you can use xxd and expr (for example):
$ output="$(printf "x\n\n"; printf "X")" # get the output ending in "X".
$ printf '%s' "${output}" | xxd -p # transform the string to hex.
780a0a58
$ hexstr="$(printf '%s' "${output}" | xxd -p)" # capture the hex
$ expr "$hexstr" : '\(.*\)..' # remove the last two hex ("X").
780a0a
$ hexstr="$(expr "$hexstr" : '\(.*\)..') # capture the shorter str.
$ printf "$hexstr" | xxd -p -r | wc -c # convert back to binary.
3
Shortened:
$ output="$(printf "x\n\n"; printf "X")"
$ hexstr="$(printf '%s' "${output}" | xxd -p )"
$ expr "$hexstr" : '\(.*\)..' | xxd -p -r | wc -c
3
The command xxd is being used for its ability to convert back to binary.
Note that wc will fail with many UNICODE characters (multibyte chars):
$ printf "Voilà" | wc -c
6
$ printf "★" | wc -c
3
It will print the count of bytes, not characters.
The length of a variable ${#var} will also fail in older shells.
Of course, to get this to run in a Bourne shell you must use `…` instead of $(…).
In bash, the ${parameter%word} form of Shell Parameter Expansion can be used:
$ output="$(printf "x\n\n"; echo X)"; echo -n "${output%X}" | wc -c
3
This is substitution is also specified by POSIX.1-2008.

Resources