How to assign output of multiple shell commmands to variable when using tee? - bash

I want to tee and get the results from multiple shell commands connected in the pipeline. I made a simple example to explain the point. Suppose I wanna count the numbers of 'a', 'b' and 'c'.
echo "abcaabbcabc" | tee >(tr -dc 'a' | wc -m) >(tr -dc 'b' | wc -m) >(tr -dc 'c' | wc -m) > /dev/null
Then I tried to assign the result from each count to a shell variable, but they all end up empty.
echo "abcaabbcabc" | tee >(A=$(tr -dc 'a' | wc -m)) >(B=$(tr -dc 'b' | wc -m)) >(C=$(tr -dc 'c' | wc -m)) > /dev/null && echo $A $B $C
What is the right way to do it?

Use files. They are the single most reliable solution. Any of the commands may need different time to run. There is no easy way to synchronize command redirections. Then most reliable way is to use a separate "entity" to collect all the data:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
tee >(tr -dc 'a' | wc -m > "$tmpa") >(tr -dc 'b' | wc -m > "$tmpb") |
tr -dc 'c' | wc -m > "$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
rm "$tmpa" "$tmpb" "$tmpc"
trap '' EXIT
Second way:
You can prepend the data from each stream with a custom prefix. Then sort all lines (basically, buffer them) on the prefix and then read them. The example script will generate only a single number from each process substitution, so it's easy to do:
read -r A B C < <(
echo "abcaabbcabc" |
tee >(
tr -dc 'a' | wc -m | sed 's/^/A /'
) >(
tr -dc 'b' | wc -m | sed 's/^/B /'
) >(
tr -dc 'c' | wc -m | sed 's/^/C /'
) >/dev/null |
sort |
cut -d' ' -f2 |
paste -sd' '
)
echo A="$A" B="$B" C="$C"
Using temporary files with flock to synchronize the output of child processes could look like this:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
(
flock 3
flock 4
flock 5
tee >(
tr -dc 'a' | wc -m |
{ sleep 0.1; cat; } > "$tmpa"
# unblock main thread
flock -u 3
) >(
tr -dc 'b' | wc -m |
{ sleep 0.2; cat; } > "$tmpb"
# unblock main thread
flock -u 4
) >(
tr -dc 'c' | wc -m |
{ sleep 0.3; cat; } > "$tmpc"
# unblock main thread
flock -u 5
) >/dev/null
# wait for subprocesses to finish
# need to re-open the files to block on them
(
flock 3
flock 4
flock 5
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
declare -p A B C

You can use this featured letter frequency analysis
#!/usr/bin/env bash
declare -A letter_frequency
while read -r v k; do
letter_frequency[$k]="$v"
done < <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
for k in "${!letter_frequency[#]}"; do
printf '%c = %d\n' "$k" "${letter_frequency[$k]}"
done
Output:
c = 3
b = 4
a = 4
Or to only assign $A, $B and $C as in your example:
#!/usr/bin/env bash
{
read -r A _
read -r B _
read -r C _
}< <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
printf 'a=%d\nb=%d\nc=%d\n' "$A" "$B" "$C"
grep -o '[[:alnum:]]': split each alphanumeric character on its own line
sort: sort lines of characters
uniq -c: count each instance and output count and character for each
< <( command group; ): the output of this command group is for stdin of the command group before
If you need to count occurrence of non-printable characters, newlines, spaces, tabs, you have to make all these commands output and deal with null delimited lists. It can sure be done with the GNU versions of these tools. I let it to you as an exercise.
Solution to the count arbitrary characters except null:
As demonstrated, works also with Unicode.
#!/usr/bin/env bash
declare -A character_frequency
declare -i v
while read -d '' -r -N 8 v && read -r -d '' -N 1 k; do
character_frequency[$k]="$v"
done < <(
grep --only-matching --null-data . <<<$'a¹bc✓ ✓\n\t\t\u263A☺ ☺ aabbcabc' |
head --bytes -2 | # trim the newline added by grep
sort --zero-terminated | # sort null delimited list
uniq --count --zero-terminated # count occurences of char (null delim)
)
for k in "${!character_frequency[#]}"; do
printf '%q = %d\n' "$k" "${character_frequency[$k]}"
done
Output:
$'\n' = 1
$'\t' = 2
☺ = 3
\ = 7
✓ = 2
¹ = 1
c = 3
b = 4
a = 4

Related

How to find all non-dictionary words in a file in bash/zsh?

I'm trying to find all words in a file that don't exist in the dictionary. If I look for a single word the following works
b=ther; look $b | grep -i "^$b$" | ifne -n echo $b => ther
b=there; look $b | grep -i "^$b$" | ifne -n echo $b => [no output]
However if I try to run a "while read" loop
while read a; do look $a | grep -i "^$a$" | ifne -n echo "$a"; done < <(tr -s '[[:punct:][:space:]]' '\n' <lotr.txt |tr '[:upper:]' '[:lower:]')
The output seems to contain all (?) words in the file. Why doesn't this loop only output non-dictionary words?
Regarding ifne
If stdin is non-empty, ifne -n reprints stdin to stdout. From the manpage:
-n Reverse operation. Run the command if the standard input is empty
Note that if the standard input is not empty, it is passed through
ifne in this case.
strace on ifne confirms this behavior.
Alternative
Perhaps, as an alternative:
1 #!/bin/bash -e
2
3 export PATH=/bin:/sbin:/usr/bin:/usr/sbin
4
5 while read a; do
6 look "$a" | grep -qi "^$a$" || echo "$a"
7 done < <(
8 tr -s '[[:punct:][:space:]]' '\n' < lotr.txt \
9 | tr '[A-Z]' '[a-z]' \
10 | sort -u \
11 | grep .
12 )

Generate random passwords in shell with one special character

I have the following code:
</dev/urandom tr -dc 'A-Za-z0-9##$%&_+=' | head -c 16
which is randomly generating passwords perfectly.
I want two changes:
It should only contain one special character listed above
It should choose a random length
I tried with length = $(($RANDOM%8+9))
then putting length as
</dev/urandom tr -dc 'A-Za-z0-9##$%&_+=' | head -c$length
but got no positive result.
#! /bin/bash
chars='##$%&_+='
{ </dev/urandom LC_ALL=C grep -ao '[A-Za-z0-9]' \
| head -n$((RANDOM % 8 + 9))
echo ${chars:$((RANDOM % ${#chars})):1} # Random special char.
} \
| shuf \
| tr -d '\n'
LC_ALL=C prevents characters like ř from appearing.
grep -o outputs just the matching substring, i.e. a single character.
shuf shuffles the lines. I originally used sort -R, but it kept the same characters together (ff1#22MvbcAA).
## Objective: Generating Random Password:
function random_password () {
[[ ${#1} -gt 0 ]] && { local length=${1}; } || { local length=16; }
export DEFAULT_PASSWORDLENGTH=${length};
export LC_CTYPE=C;
local random="$(
tr -cd "[:graph:]" < /dev/urandom \
| head -c ${length} \
| sed -e 's|\`|~|g' \
-e 's|\$(|\\$(|g';
)";
echo -e "${random}";
return 0;
}; alias random-password='random_password';
$ random-password 32 ;
)W#j*deZ2#eMuhU4TODO&eu&r)&.#~3F
# Warning: Do not consider these other options
# date +%s | sha256sum | base64 | head -c 32 | xargs -0;
# Output: MGFjNDlhMTE2ZWJjOTI4OGI4ZTFiZmEz
# dd if=/dev/urandom count=200 bs=1 2>/dev/null \
# | tr -cd "[:graph:]" \
# | cut -c-${length} \
# | xargs -0;
# Output: AuS*D=!wkHR.4DZ_la

$(shell …) command in Makefile is not executed correctly, but works in bash/sh

I want to count the number of nodes in a graphviz file in a Makefile to use it to start a process for each node.
When I run
grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l
in the shell, it prints the number of nodes as expected.
However, when I use it in my Makefile
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $line; do echo $w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)
${NODES} is always 0.
You'll need to escape the $ sign. Say:
NODES := $(shell grep -- -\> graph.gv | while read line; do for w in $$line; do echo $$w; done; done | grep [Aa-Zz] | sort | uniq | wc -l)

How can I show the results of each step in a pipeline of continuous transformations without looping?

Consider the following example Bash one-liner in which the letters "h", "e" and "o" are removed from the word "hello" one at a time, in that order. Only the two "l" letters remain;
$ echo "hello" | tr -d h | tr -d e | tr -d o
ll
I am trying to find a method for displaying the output of each command to the screen within the one liner, so others running it can see what is going on. Continuing with the above example I would like output as follows;
$ echo "hello" | tr -d h | tr -d e | tr -d o
hello
ello
llo
ll
Is this possible? As per the operation of the one-liner above, we are carrying the output from command to command with the vertical pipe. So I assume I would have to break from the pipe to print to stdout, which would then interrupt the "command chain" I have written. Or perhaps tee can be used here, but I can't seem to achieve the desire affect. UPDATE: tee won't work because it's output is still within the boundaries of the pipe, duh!
Many thanks.
This only works on a terminal:
echo hello | tee /dev/tty |
tr -d h | tee /dev/tty |
tr -d e | tee /dev/tty |
tr -d o
The /dev/tty device redirects output to the current terminal, no matter where the normal output goes.
You can tee the output directly to the terminal:
echo hello | tee /proc/$$/fd/1 |
tr -d h | tee /proc/$$/fd/1 |
tr -d e | tee /proc/$$/fd/1 |
tr -d o
$$ is the shell's PID.
IMO, if this is real world problem and not how is it possible to do in bash, then instead of the bash madness starting X tr and Y tee processes, you can try a simple perl oneliner:
echo hello | perl -nale 'foreach $i (qw(h e o)) {#F=map{s/$i//g;print $_;$_}#F}'
will print:
ello
llo
ll
or if your really want bash (as #chepner suggested)
echo "hello" | while read w; do for c in h e o; do w=${w//$c};echo $w; done;done
or
while read w; do for c in h e o; do w=${w//$c};echo $w; done;done < wordlist
With a small little loop:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done
The . is only used for a brief solution. After reading the question more carefully, I tested and found, that it works in a pipe chain too:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done | less
# pure bash buildins, see chepner's comment
w="hello" ; for c in h e o .; do echo $w; w=${w/$c}; done
But this WILL work:
tmp=`echo "hello"`; echo $tmp; tmp=`echo $tmp | tr -d h`; echo $tmp; tmp=`echo $tmp | tr -d e`; echo $tmp; echo $C | tr -d o
It's ugly, but it works. The solution creates variables to store each pass and then show them all:
tmp=`echo "hello"`
echo $tmp
tmp=`echo $tmp | tr -d h`
echo $tmp
tmp=`echo $tmp | tr -d e`
echo $tmp
echo $C | tr -d o
I cannot find a better solution.
You could do it using a fifo (or a regular file) on which you listen:
mkfifo tmp.fifo
tail -f tmp.fifo
Then just run your command on a separate shell:
echo hello | tee -a tmp.fifo | tr -d e | tee -a tmp.fifo | tr -d o
You can use tee - as - represents the console:
echo "hello" | tee - | tr -d h | tee - | tr -d e | tee - | tr -d o
Hope that helped!
EDIT:
This solution WILL NOT WORK because stdout gets blocked and reused by all the commands that produce output by means of stdout.
I will not delete the answer as it clearly states that it doesn't solve the problem but ALSO SHOWS that this kind of solution is NOT VALID and why. If someone finds himself in the same situation will discover this information, and that will be useful.
Room for one more?
echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) )
or
echo "hello" |
tee >(tr -d h |
tee >(tr -d e |
tee >(tr -d o ) ) )
output is:
hello
ubuntu#ubuntu:~$ ello
llo
ll
a little persuasive coercion is required to eliminate corruption by ubuntu#ubuntu:~$
cat <(echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) ) )
output is:
hello
ello
llo
ll

OS X / Linux: pipe into two processes?

I know about
program1 | program2
and
program1 | tee outputfile | program2
but is there a way to feed program1's output into both program2 and program3?
You can do this with tee and process substitution.
program1 | tee >(program2) >(program3)
The output of program1 will be piped to whatever is inside ( ), in this case program2 and program3.
Intro about parallelisation
This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.
You may have to take care about some particular effects, like order of execution, exection time, etc.
There are some sample at end of this post.
Compatible answer first
As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)
Yes, there is a way to use unnamed pipes.
In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...
For this to I will first run the preparation:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
SED_CMD=`which sed`
Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing
way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).
The syntax NN>&1 (where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).
Try this (tested under dash, busybox and bash) :
(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
or more readable:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
(
(
(
(
seq 1 100000 |
shuf |
tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |
$GZIP_CMD >/tmp/tst.gz
) 4>&1 |
$BZIP2_CMD >/tmp/tst.bz2
) 5>&1 |
$LZMA_CMD >/tmp/tst.lzma
) 6>&1 |
$XZ_CMD >/tmp/tst.xz
) 7>&1 |
$MD5SUM_CMD
2e67f6ad33745dc5134767f0954cbdd6 -
As shuf do random placement, if you try this, you must obtain different result,
ls -ltrS /tmp/tst.*
-rw-r--r-- 1 user user 230516 oct 1 22:14 /tmp/tst.bz2
-rw-r--r-- 1 user user 254811 oct 1 22:14 /tmp/tst.lzma
-rw-r--r-- 1 user user 254892 oct 1 22:14 /tmp/tst.xz
-rw-r--r-- 1 user user 275003 oct 1 22:14 /tmp/tst.gz
but you must be able to compare md5 checksums:
SED_CMD=`which sed`
for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do
${chk#*:} -d < /tmp/tst.${chk%:*} |
$MD5SUM_CMD |
$SED_CMD s/-$/tst.${chk%:*}/
done
2e67f6ad33745dc5134767f0954cbdd6 tst.gz
2e67f6ad33745dc5134767f0954cbdd6 tst.bz2
2e67f6ad33745dc5134767f0954cbdd6 tst.lzma
2e67f6ad33745dc5134767f0954cbdd6 tst.xz
Using bash features
Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}, instead of tee /dev/fd/4 /dev/fd/5 /...
(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |
bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |
xz >/tmp/tst.xz ) 7>&1 | md5sum
29078875555e113b31bd1ae876937d4b -
will work same.
Final check
This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):
(
(
(
(
(
seq 1 100000 |
tee /dev/fd/{4,5,6,7} |
gzip |
wc -c |
sed s/^/gzip:\ \ / >&3
) 4>&1 |
bzip2 |
wc -c |
xargs printf "bzip2: %s\n" >&3
) 5>&1 |
lzma |
wc -c |
perl -pe 's/^/lzma: /' >&3
) 6>&1 |
xz |
wc -c |
awk '{printf "xz: %9s\n",$1}' >&3
) 7>&1 |
wc -c
) 3>&1
gzip: 215157
bzip2: 124009
lzma: 17948
xz: 17992
588895
This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.
Syntax >(...) and <(...)
Recent bash versions permit a new syntax feature.
seq 1 100000 | wc -l
100000
seq 1 100000 > >( wc -l )
100000
wc -l < <( seq 1 100000 )
100000
As | is an unnamed pipe to /dev/fd/0, the syntax <() do generate temporary unnamed pipe with others file descriptor /dev/fd/XX.
md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(
lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz)
29078875555e113b31bd1ae876937d4b /dev/fd/63
29078875555e113b31bd1ae876937d4b /dev/fd/62
29078875555e113b31bd1ae876937d4b /dev/fd/61
29078875555e113b31bd1ae876937d4b /dev/fd/60
More sophisticated demo
This require GNU file utility to be installed. Will determine command to be run by extension or file type.
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $md5 \ $file
done
29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
29078875555e113b31bd1ae876937d4b /tmp/tst.gz
29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
29078875555e113b31bd1ae876937d4b /tmp/tst.xz
This let you do same previous thing by following syntax:
seq 1 100000 |
shuf |
tee >(
echo gzip. $( gzip | wc -c )
) >(
echo gzip, $( wc -c < <(gzip))
) >(
gzip | wc -c | sed s/^/gzip:\ \ /
) >(
bzip2 | wc -c | xargs printf "bzip2: %s\n"
) >(
lzma | wc -c | perl -pe 's/^/lzma: /'
) >(
xz | wc -c | awk '{printf "xz: %9s\n",$1}'
) > >(
echo raw: $(wc -c)
) |
xargs printf "%-8s %9d\n"
raw: 588895
xz: 254556
lzma: 254472
bzip2: 231111
gzip: 274867
gzip, 274867
gzip. 274867
Note I used different way used to compute gzip compressed count.
Note Because this operation was done simultaneously, output order will depend on time required by each command.
Going further about parallelisation
If you run some multi-core or multi-processor computer, try to compare this:
i=1
time for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $((i++)) $md5 \ $file
done |
cat -n
wich may render:
1 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
2 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
3 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
4 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
real 0m0.101s
with this:
time (
i=1 pids=()
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
(
read -a md5 < <($cmd -d <$file|md5sum)
echo $i $md5 \ $file
) & pids+=($!)
((i++))
done
wait ${pids[#]}
) |
cat -n
could give:
1 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
2 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
3 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
4 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
real 0m0.070s
where ordering depend on type used by each fork.
The bash manual mentions how it emulates the >(...) syntax using either named pipes or named file descriptors, so if you don't want to depend on bash, perhaps you could do that manually in your script.
mknod FIFO
program3 < FIFO &
program1 | tee FIFO | program2
wait
rm FIFO
Other answers introduce the concept. Here is an actual demonstration:
$ echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3
$ cat out1
11e001d91e4badcff8fe22aea05a7458 -
$ echo "Leeroy Jenkins" | md5sum
11e001d91e4badcff8fe22aea05a7458 -
$ cat out2
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ echo "Leeroy Jenkins" | sha1sum
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ cat out3
Leeroy Jenkins
Of course you can > /dev/null instead of out3.
You can always try to save output of program1 to a file and then feed it into program2 and program3 input.
program1 > temp; program2 < temp; program3 < temp;
use (;) syntax... try ps aux | (head -n 1; tail -n 1)

Resources