is there a way other than read/echo to read one line from a file descriptor to stdout without closing the fd? - bash

I ran into a situation where I was doing:
outputStuff |
filterStuff |
transformStuff |
doMoreStuff |
endStuff > endFile
I want to be able to insert some debug tracing stuff in a fashion like:
tee debugFile.$((debugNum++)) |
but obviously the pipes create subshells, so I wanted to do this instead.
exec 5< <(seq 1 100)
outputStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff > endFile
ie, I want the debug line I insert to be identical, so I don't have to worry about stepping on various stuff. The read/REPLY echo seems really ugly.. I suppose I could wrap in a function.. but is there a way to read one line from a file descriptor to stdout without closing the fd (like a head -1 would close the fd is I did head -1 <&3)

I tried several things, but in the end #Etan Reisner proved to me (unintentionally) that even if there is a way to do what you asked (clever, Etan), it's not what you actually want. If you want to be sure to read the numbers back sequentially then the reads have to be serialized, which the commands in a pipeline are not.
Indeed, that applies to your original approach as well, since command substitutions are performed in subshells. I think you could reliably do it like this, though:
debugum=1
eval "
outputStuff |
tee debugFile.$((debugNum++)) |
filterStuff |
transformStuff |
doMoreStuff |
tee debugFile.$((debugNum++)) |
endStuff > endFile
"
That way all the substitutions are performed by the parent shell, on the string, before any of the commands is launched.

Put all the reads inside a single compound statement, and redirect input from descriptor 5 for that.
{ outputStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff
} 5< <(seq 1 100) > endFile
Now, file descriptor 5 is opened once (and closed once), and each call to read gets successive lines from that descriptor.
(You can also simplify this a bit; unless you are providing outputStuff with input via the keyboard, there doesn't seem to be a need to use file descriptor 5 instead of standard input, since only outputStuff is reading from standard input. All the other programs are reading their standard input via the pipeline.)

Here's an approach that doesn't need seq at all:
Define a function that constructs your pipeline recursively.
buildWithDebugging() {
local -a nextCmd=( )
local debugNum=$1; shift
while (( $# )) && [[ $1 != '|' ]]; do
nextCmd+=( "$1" )
shift
done
if (( ${#nextCmd[#]} )); then
"${nextCmd[#]}" \
| tee "debugFile.$debugNum" \
| buildWithDebugging "$((debugNum + 1))" "$#"
else
cat # noop
fi
}
...and, to use it:
buildWithDebugging 0 \
outputStuff '|'
filterStuff '|'
transformStuff '|'
doMoreStuff '|'
endStuff > endFile
A more secure version would use pipeline components done in the style of Pascal strings rather than C strings -- which is to say, instead of using literal |s, preceding each string of commands with its length:
buildWithDebugging 0 \
1 outputStuff
3 filterStuff filterArg filterSecondArg
2 transformStuff filterArg
1 doMoreStuff
endStuff > endFile
Building this should be a completely trivial exercise for the reader. :)

At the cost of evenly padding your numbers you can do this with dd though you don't end up with a nicer looking command for your trouble. =)
exec 5< <(seq -w 10 -1 1)
echo -n |
{ echo "d 1:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 2:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 3:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 4:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
cat
You can also just use a shorter variable with read: read -u 5 a;echo $a but that only saves you two characters.

Related

How to find all non-dictionary words in a file in bash/zsh?

I'm trying to find all words in a file that don't exist in the dictionary. If I look for a single word the following works
b=ther; look $b | grep -i "^$b$" | ifne -n echo $b => ther
b=there; look $b | grep -i "^$b$" | ifne -n echo $b => [no output]
However if I try to run a "while read" loop
while read a; do look $a | grep -i "^$a$" | ifne -n echo "$a"; done < <(tr -s '[[:punct:][:space:]]' '\n' <lotr.txt |tr '[:upper:]' '[:lower:]')
The output seems to contain all (?) words in the file. Why doesn't this loop only output non-dictionary words?
Regarding ifne
If stdin is non-empty, ifne -n reprints stdin to stdout. From the manpage:
-n Reverse operation. Run the command if the standard input is empty
Note that if the standard input is not empty, it is passed through
ifne in this case.
strace on ifne confirms this behavior.
Alternative
Perhaps, as an alternative:
1 #!/bin/bash -e
2
3 export PATH=/bin:/sbin:/usr/bin:/usr/sbin
4
5 while read a; do
6 look "$a" | grep -qi "^$a$" || echo "$a"
7 done < <(
8 tr -s '[[:punct:][:space:]]' '\n' < lotr.txt \
9 | tr '[A-Z]' '[a-z]' \
10 | sort -u \
11 | grep .
12 )

Misbehaving head with redirection

In a reply to Piping a file through tail and head via tee, a strange behaviour of head has been observed in the following construct when working with huge files:
#! /bin/bash
for i in {1..1000000} ; do echo $i ; done > /tmp/n
( tee >(sed -n '1,3p' >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(tac | tail -n3 | tac >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(head -n3 >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Not correct!?
Output:
1
2
3
999999
1000000
#
1
2
3
999999
1000000
#
1
2
3
15504
15
Question:
Why does not the last line output the same lines as the previous two lines?
This is because head exits as soon as it transfers three first lines. Subsequently, tee gets killed with SIGPIPE because the reading end of the "FILE" pipe it is writing to is closed, but not until it manages to output some lines to its stdout.
If you execute just this:
tee >(head -n3 >/dev/null) < /tmp/n
You will see what happens better.
OTOH, tac reads the whole file as it has to reverse it, as does sed, probably to be consistent.

How can I show the results of each step in a pipeline of continuous transformations without looping?

Consider the following example Bash one-liner in which the letters "h", "e" and "o" are removed from the word "hello" one at a time, in that order. Only the two "l" letters remain;
$ echo "hello" | tr -d h | tr -d e | tr -d o
ll
I am trying to find a method for displaying the output of each command to the screen within the one liner, so others running it can see what is going on. Continuing with the above example I would like output as follows;
$ echo "hello" | tr -d h | tr -d e | tr -d o
hello
ello
llo
ll
Is this possible? As per the operation of the one-liner above, we are carrying the output from command to command with the vertical pipe. So I assume I would have to break from the pipe to print to stdout, which would then interrupt the "command chain" I have written. Or perhaps tee can be used here, but I can't seem to achieve the desire affect. UPDATE: tee won't work because it's output is still within the boundaries of the pipe, duh!
Many thanks.
This only works on a terminal:
echo hello | tee /dev/tty |
tr -d h | tee /dev/tty |
tr -d e | tee /dev/tty |
tr -d o
The /dev/tty device redirects output to the current terminal, no matter where the normal output goes.
You can tee the output directly to the terminal:
echo hello | tee /proc/$$/fd/1 |
tr -d h | tee /proc/$$/fd/1 |
tr -d e | tee /proc/$$/fd/1 |
tr -d o
$$ is the shell's PID.
IMO, if this is real world problem and not how is it possible to do in bash, then instead of the bash madness starting X tr and Y tee processes, you can try a simple perl oneliner:
echo hello | perl -nale 'foreach $i (qw(h e o)) {#F=map{s/$i//g;print $_;$_}#F}'
will print:
ello
llo
ll
or if your really want bash (as #chepner suggested)
echo "hello" | while read w; do for c in h e o; do w=${w//$c};echo $w; done;done
or
while read w; do for c in h e o; do w=${w//$c};echo $w; done;done < wordlist
With a small little loop:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done
The . is only used for a brief solution. After reading the question more carefully, I tested and found, that it works in a pipe chain too:
w="hello" ; for c in h e o .; do echo $w; w=$(echo $w | tr -d $c); done | less
# pure bash buildins, see chepner's comment
w="hello" ; for c in h e o .; do echo $w; w=${w/$c}; done
But this WILL work:
tmp=`echo "hello"`; echo $tmp; tmp=`echo $tmp | tr -d h`; echo $tmp; tmp=`echo $tmp | tr -d e`; echo $tmp; echo $C | tr -d o
It's ugly, but it works. The solution creates variables to store each pass and then show them all:
tmp=`echo "hello"`
echo $tmp
tmp=`echo $tmp | tr -d h`
echo $tmp
tmp=`echo $tmp | tr -d e`
echo $tmp
echo $C | tr -d o
I cannot find a better solution.
You could do it using a fifo (or a regular file) on which you listen:
mkfifo tmp.fifo
tail -f tmp.fifo
Then just run your command on a separate shell:
echo hello | tee -a tmp.fifo | tr -d e | tee -a tmp.fifo | tr -d o
You can use tee - as - represents the console:
echo "hello" | tee - | tr -d h | tee - | tr -d e | tee - | tr -d o
Hope that helped!
EDIT:
This solution WILL NOT WORK because stdout gets blocked and reused by all the commands that produce output by means of stdout.
I will not delete the answer as it clearly states that it doesn't solve the problem but ALSO SHOWS that this kind of solution is NOT VALID and why. If someone finds himself in the same situation will discover this information, and that will be useful.
Room for one more?
echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) )
or
echo "hello" |
tee >(tr -d h |
tee >(tr -d e |
tee >(tr -d o ) ) )
output is:
hello
ubuntu#ubuntu:~$ ello
llo
ll
a little persuasive coercion is required to eliminate corruption by ubuntu#ubuntu:~$
cat <(echo "hello" | tee >(tr -d h | tee >(tr -d e | tee >(tr -d o) ) ) )
output is:
hello
ello
llo
ll

OS X / Linux: pipe into two processes?

I know about
program1 | program2
and
program1 | tee outputfile | program2
but is there a way to feed program1's output into both program2 and program3?
You can do this with tee and process substitution.
program1 | tee >(program2) >(program3)
The output of program1 will be piped to whatever is inside ( ), in this case program2 and program3.
Intro about parallelisation
This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.
You may have to take care about some particular effects, like order of execution, exection time, etc.
There are some sample at end of this post.
Compatible answer first
As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)
Yes, there is a way to use unnamed pipes.
In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...
For this to I will first run the preparation:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
SED_CMD=`which sed`
Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing
way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).
The syntax NN>&1 (where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).
Try this (tested under dash, busybox and bash) :
(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
or more readable:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
(
(
(
(
seq 1 100000 |
shuf |
tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |
$GZIP_CMD >/tmp/tst.gz
) 4>&1 |
$BZIP2_CMD >/tmp/tst.bz2
) 5>&1 |
$LZMA_CMD >/tmp/tst.lzma
) 6>&1 |
$XZ_CMD >/tmp/tst.xz
) 7>&1 |
$MD5SUM_CMD
2e67f6ad33745dc5134767f0954cbdd6 -
As shuf do random placement, if you try this, you must obtain different result,
ls -ltrS /tmp/tst.*
-rw-r--r-- 1 user user 230516 oct 1 22:14 /tmp/tst.bz2
-rw-r--r-- 1 user user 254811 oct 1 22:14 /tmp/tst.lzma
-rw-r--r-- 1 user user 254892 oct 1 22:14 /tmp/tst.xz
-rw-r--r-- 1 user user 275003 oct 1 22:14 /tmp/tst.gz
but you must be able to compare md5 checksums:
SED_CMD=`which sed`
for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do
${chk#*:} -d < /tmp/tst.${chk%:*} |
$MD5SUM_CMD |
$SED_CMD s/-$/tst.${chk%:*}/
done
2e67f6ad33745dc5134767f0954cbdd6 tst.gz
2e67f6ad33745dc5134767f0954cbdd6 tst.bz2
2e67f6ad33745dc5134767f0954cbdd6 tst.lzma
2e67f6ad33745dc5134767f0954cbdd6 tst.xz
Using bash features
Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}, instead of tee /dev/fd/4 /dev/fd/5 /...
(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |
bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |
xz >/tmp/tst.xz ) 7>&1 | md5sum
29078875555e113b31bd1ae876937d4b -
will work same.
Final check
This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):
(
(
(
(
(
seq 1 100000 |
tee /dev/fd/{4,5,6,7} |
gzip |
wc -c |
sed s/^/gzip:\ \ / >&3
) 4>&1 |
bzip2 |
wc -c |
xargs printf "bzip2: %s\n" >&3
) 5>&1 |
lzma |
wc -c |
perl -pe 's/^/lzma: /' >&3
) 6>&1 |
xz |
wc -c |
awk '{printf "xz: %9s\n",$1}' >&3
) 7>&1 |
wc -c
) 3>&1
gzip: 215157
bzip2: 124009
lzma: 17948
xz: 17992
588895
This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.
Syntax >(...) and <(...)
Recent bash versions permit a new syntax feature.
seq 1 100000 | wc -l
100000
seq 1 100000 > >( wc -l )
100000
wc -l < <( seq 1 100000 )
100000
As | is an unnamed pipe to /dev/fd/0, the syntax <() do generate temporary unnamed pipe with others file descriptor /dev/fd/XX.
md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(
lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz)
29078875555e113b31bd1ae876937d4b /dev/fd/63
29078875555e113b31bd1ae876937d4b /dev/fd/62
29078875555e113b31bd1ae876937d4b /dev/fd/61
29078875555e113b31bd1ae876937d4b /dev/fd/60
More sophisticated demo
This require GNU file utility to be installed. Will determine command to be run by extension or file type.
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $md5 \ $file
done
29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
29078875555e113b31bd1ae876937d4b /tmp/tst.gz
29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
29078875555e113b31bd1ae876937d4b /tmp/tst.xz
This let you do same previous thing by following syntax:
seq 1 100000 |
shuf |
tee >(
echo gzip. $( gzip | wc -c )
) >(
echo gzip, $( wc -c < <(gzip))
) >(
gzip | wc -c | sed s/^/gzip:\ \ /
) >(
bzip2 | wc -c | xargs printf "bzip2: %s\n"
) >(
lzma | wc -c | perl -pe 's/^/lzma: /'
) >(
xz | wc -c | awk '{printf "xz: %9s\n",$1}'
) > >(
echo raw: $(wc -c)
) |
xargs printf "%-8s %9d\n"
raw: 588895
xz: 254556
lzma: 254472
bzip2: 231111
gzip: 274867
gzip, 274867
gzip. 274867
Note I used different way used to compute gzip compressed count.
Note Because this operation was done simultaneously, output order will depend on time required by each command.
Going further about parallelisation
If you run some multi-core or multi-processor computer, try to compare this:
i=1
time for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $((i++)) $md5 \ $file
done |
cat -n
wich may render:
1 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
2 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
3 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
4 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
real 0m0.101s
with this:
time (
i=1 pids=()
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
(
read -a md5 < <($cmd -d <$file|md5sum)
echo $i $md5 \ $file
) & pids+=($!)
((i++))
done
wait ${pids[#]}
) |
cat -n
could give:
1 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
2 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
3 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
4 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
real 0m0.070s
where ordering depend on type used by each fork.
The bash manual mentions how it emulates the >(...) syntax using either named pipes or named file descriptors, so if you don't want to depend on bash, perhaps you could do that manually in your script.
mknod FIFO
program3 < FIFO &
program1 | tee FIFO | program2
wait
rm FIFO
Other answers introduce the concept. Here is an actual demonstration:
$ echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3
$ cat out1
11e001d91e4badcff8fe22aea05a7458 -
$ echo "Leeroy Jenkins" | md5sum
11e001d91e4badcff8fe22aea05a7458 -
$ cat out2
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ echo "Leeroy Jenkins" | sha1sum
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ cat out3
Leeroy Jenkins
Of course you can > /dev/null instead of out3.
You can always try to save output of program1 to a file and then feed it into program2 and program3 input.
program1 > temp; program2 < temp; program3 < temp;
use (;) syntax... try ps aux | (head -n 1; tail -n 1)

How to assign output of multiple shell commmands to variable when using tee?

I want to tee and get the results from multiple shell commands connected in the pipeline. I made a simple example to explain the point. Suppose I wanna count the numbers of 'a', 'b' and 'c'.
echo "abcaabbcabc" | tee >(tr -dc 'a' | wc -m) >(tr -dc 'b' | wc -m) >(tr -dc 'c' | wc -m) > /dev/null
Then I tried to assign the result from each count to a shell variable, but they all end up empty.
echo "abcaabbcabc" | tee >(A=$(tr -dc 'a' | wc -m)) >(B=$(tr -dc 'b' | wc -m)) >(C=$(tr -dc 'c' | wc -m)) > /dev/null && echo $A $B $C
What is the right way to do it?
Use files. They are the single most reliable solution. Any of the commands may need different time to run. There is no easy way to synchronize command redirections. Then most reliable way is to use a separate "entity" to collect all the data:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
tee >(tr -dc 'a' | wc -m > "$tmpa") >(tr -dc 'b' | wc -m > "$tmpb") |
tr -dc 'c' | wc -m > "$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
rm "$tmpa" "$tmpb" "$tmpc"
trap '' EXIT
Second way:
You can prepend the data from each stream with a custom prefix. Then sort all lines (basically, buffer them) on the prefix and then read them. The example script will generate only a single number from each process substitution, so it's easy to do:
read -r A B C < <(
echo "abcaabbcabc" |
tee >(
tr -dc 'a' | wc -m | sed 's/^/A /'
) >(
tr -dc 'b' | wc -m | sed 's/^/B /'
) >(
tr -dc 'c' | wc -m | sed 's/^/C /'
) >/dev/null |
sort |
cut -d' ' -f2 |
paste -sd' '
)
echo A="$A" B="$B" C="$C"
Using temporary files with flock to synchronize the output of child processes could look like this:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
(
flock 3
flock 4
flock 5
tee >(
tr -dc 'a' | wc -m |
{ sleep 0.1; cat; } > "$tmpa"
# unblock main thread
flock -u 3
) >(
tr -dc 'b' | wc -m |
{ sleep 0.2; cat; } > "$tmpb"
# unblock main thread
flock -u 4
) >(
tr -dc 'c' | wc -m |
{ sleep 0.3; cat; } > "$tmpc"
# unblock main thread
flock -u 5
) >/dev/null
# wait for subprocesses to finish
# need to re-open the files to block on them
(
flock 3
flock 4
flock 5
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
declare -p A B C
You can use this featured letter frequency analysis
#!/usr/bin/env bash
declare -A letter_frequency
while read -r v k; do
letter_frequency[$k]="$v"
done < <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
for k in "${!letter_frequency[#]}"; do
printf '%c = %d\n' "$k" "${letter_frequency[$k]}"
done
Output:
c = 3
b = 4
a = 4
Or to only assign $A, $B and $C as in your example:
#!/usr/bin/env bash
{
read -r A _
read -r B _
read -r C _
}< <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
printf 'a=%d\nb=%d\nc=%d\n' "$A" "$B" "$C"
grep -o '[[:alnum:]]': split each alphanumeric character on its own line
sort: sort lines of characters
uniq -c: count each instance and output count and character for each
< <( command group; ): the output of this command group is for stdin of the command group before
If you need to count occurrence of non-printable characters, newlines, spaces, tabs, you have to make all these commands output and deal with null delimited lists. It can sure be done with the GNU versions of these tools. I let it to you as an exercise.
Solution to the count arbitrary characters except null:
As demonstrated, works also with Unicode.
#!/usr/bin/env bash
declare -A character_frequency
declare -i v
while read -d '' -r -N 8 v && read -r -d '' -N 1 k; do
character_frequency[$k]="$v"
done < <(
grep --only-matching --null-data . <<<$'a¹bc✓ ✓\n\t\t\u263A☺ ☺ aabbcabc' |
head --bytes -2 | # trim the newline added by grep
sort --zero-terminated | # sort null delimited list
uniq --count --zero-terminated # count occurences of char (null delim)
)
for k in "${!character_frequency[#]}"; do
printf '%q = %d\n' "$k" "${character_frequency[$k]}"
done
Output:
$'\n' = 1
$'\t' = 2
☺ = 3
\ = 7
✓ = 2
¹ = 1
c = 3
b = 4
a = 4

Resources