Misbehaving head with redirection - bash

In a reply to Piping a file through tail and head via tee, a strange behaviour of head has been observed in the following construct when working with huge files:
#! /bin/bash
for i in {1..1000000} ; do echo $i ; done > /tmp/n
( tee >(sed -n '1,3p' >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(tac | tail -n3 | tac >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(head -n3 >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Not correct!?
Output:
1
2
3
999999
1000000
#
1
2
3
999999
1000000
#
1
2
3
15504
15
Question:
Why does not the last line output the same lines as the previous two lines?

This is because head exits as soon as it transfers three first lines. Subsequently, tee gets killed with SIGPIPE because the reading end of the "FILE" pipe it is writing to is closed, but not until it manages to output some lines to its stdout.
If you execute just this:
tee >(head -n3 >/dev/null) < /tmp/n
You will see what happens better.
OTOH, tac reads the whole file as it has to reverse it, as does sed, probably to be consistent.

Related

Ignoring all but the (multi-line) results of the last query sent to a program

I have an executable that accepts queries from stdin and responds to them, reading until EOF. Additionally I have an input file and a special command, let's call those EXEC, FILE and CMD respectively.
What I need to do is:
Pass FILE to EXEC as input.
Disregard all the output corresponding to commands read from FILE (/dev/null/).
Pass CMD as the last command.
Fetch output for the last command and save it in a variable.
EXEC's output can be multiline for each query.
I know how to pass FILE + CMD into the EXEC:
echo ${CMD} | cat ${FILE} - | ${EXEC}
but I have no idea how to fetch only output resulting from CMD.
Is there a magical one-liner that does this?
After looking around I've found the following partial solution:
mkfifo mypipe
(tail -f mypipe) | ${EXEC} &
cat ${FILE} | while read line; do
echo ${line} > mypipe
done
echo ${CMD} > mypipe
This allows me to redirect my input, but now the output gets printed to screen. I want to ignore all the output produced by EXEC in the while loop and get only what it prints for the last line.
I tried what first came into my mind, which is:
(tail -f mypipe) | ${EXEC} > somefile &
But it didn't work, the file was empty.
This is race-prone -- I'd suggest putting in a delay after the kill, or using an explicit sigil to determine when it's been received. That said:
#!/usr/bin/env bash
# route FD 4 to your output routine
exec 4> >(
output=; trap 'output=1' USR1
while IFS= read -r line; do
[[ $output ]] && printf '%s\n' "$line"
done
); out_pid=$!
# Capture the PID for the process substitution above; note that this requires a very
# new version of bash (4.4?)
[[ $out_pid ]] || { echo "ERROR: Your bash version is too old" >&2; exit 1; }
# Run your program in another process substitution, and close the parent's handle on FD 4
exec 3> >("$EXEC" >&4) 4>&-
# cat your file to FD 3...
cat "$file" >&3
# UGLY HACK: Wait to let your program finish flushing output from those commands
sleep 0.1
# notify the subshell writing output to disk that the ignored input is done...
kill -USR1 "$out_pid"
# UGLY HACK: Wait to let the subprocess actually receive the signal and set output=1
sleep 0.1
# ...and then write the command for which you actually want content logged.
echo "command" >&3
In validating this answer, I'm doing the following:
EXEC=stub_function
stub_function() {
local count line
count=0
while IFS= read -r line; do
(( ++count ))
printf '%s: %s\n' "$count" "$line"
done
}
cat >file <<EOF
do-not-log-my-output-1
do-not-log-my-output-2
do-not-log-my-output-3
EOF
file=file
export -f stub_function
export file EXEC
Output is only:
4: command
You could pipe it into a sed:
var=$(YOUR COMMAND | sed '$!d')
This will put only the last line into the variable
I think, that your proram EXEC does something special (open connection or remember state). When that is not the case, you can use
${EXEC} < ${FILE} > /dev/null
myvar=$(echo ${CMD} | ${EXEC})
Or with normal commands:
# Do not use (printf "==%s==\n" 1 2 3 ; printf "oo%soo\n" 4 5 6) | cat
printf "==%s==\n" 1 2 3 | cat > /dev/null
myvar=$(printf "oo%soo\n" 4 5 6 | cat)
When you need to give all input to one process, perhaps you can think of a marker that you can filter on:
(printf "==%s==\n" 1 2 3 ; printf "%s\n" "marker"; printf "oo%soo\n" 4 5 6) | cat | sed '1,/marker/ d'
You should examine your EXEC what could be used. When it is running SQL, you might use something like
(cat ${FILE}; echo 'select "DamonMarker" from dual;' ; echo ${CMD} ) |
${EXEC} | sed '1,/DamonMarker/ d'
and write this in a var with
myvar=$( (cat ${FILE}; echo 'select "DamonMarker" from dual;' ; echo ${CMD} ) |
${EXEC} | sed '1,/DamonMarker/ d' )

Redirect output to mulitple files with tee and grep

I spent a lot of hours to get this to run:
redirecting output from a script STDOUT + STDERR toLogfile 1 and a grep to Logfile 2
The First logfile should contain the complete output, and the second logfile only Start and End-Lines (grep).
I tried different syntax, but nothing works.
./run.sh 2>&1 | tee -a /var/log/log1.log | (grep 'START|END') > /var/log/myscripts.log
./run.sh 2>&1 | tee -a /var/log/log1.log | grep 'Start' > /var/log/myscripts.log
./run.sh 2>&1 | tee -a /var/log/log1.log | egrep 'Start' > /var/log/myscripts.log
./run.sh 2>&1 | tee -a /var/log/log1.log | grep -E 'Start' > /var/log/myscripts.log
the output will be redirected only to the first log. The second log is empty.
I don't know why; do you have any ideas?
Example-Lines from Output
this should be complete in the log1.log
(the script is java startet via shell script)
26.09.2014 20:38:51 | start script > load_stats.sh
26.09.2014 20:38:51 | [DB DATA]
26.09.2014 20:38:51 | Host > locahost
26.09.2014 20:38:51 | User > leroy
... more ...
26.09.2014 20:39:23 | fin script > load_stats.sh
I want to grep this in myscripts.log
26.09.2014 20:38:51 | start script > load_stats.sh
26.09.2014 20:39:23 | fin script > load_stats.sh
I think the problem is the format, timestamp, whitespaces.
I thought grep 'word' will catch me this both lines, but it doesn't.
Stupid.
./run.sh 2>&1 | tee -a /var/log/log1.log | sed -nE '/(start script|end script)/p' >> /var/log/myscripts.log
did not work, log1 ok, mysrctips.log empty
tail -f -n 500 /var/log/log1.log | sed -nE '/(start script|end script)/p'
works well in the shell. but in combination of all it doesn't.
execute a script > redirect to log 1 > redirect and filter(grep,egrep,sed,..) to log 2
This works fine for me:
$ cat <<_DATA | tee out1 | grep -E 'START|END' > out2
hello
START1
foo
END2
bar
_DATA
$ cat out1
hello
START1
foo
END2
bar
$ cat out2
START1
END2
This should work
./run_test.sh 2>&1 | tee -a /var/log/log1.log | grep -E 'start script|fin Main-Job' > /var/log/myscripts.log
# ^^ append ^^^^^^^ - same as egrep ^overwrite/create
prints
26.09.2014 20:38:51 | start script > load_stats.sh
26.09.2014 20:39:23 | fin Main-Job > load_stats.sh
if want
overwrite/create the log1 - delete the -a
append to myscripts.log use >>
the egrep is deprecated - therefore is better to use grep -E
Also, instead of the grep you can use sed too, like:
sed -nE '/(start script|fin Main-Job)/p'

is there a way other than read/echo to read one line from a file descriptor to stdout without closing the fd?

I ran into a situation where I was doing:
outputStuff |
filterStuff |
transformStuff |
doMoreStuff |
endStuff > endFile
I want to be able to insert some debug tracing stuff in a fashion like:
tee debugFile.$((debugNum++)) |
but obviously the pipes create subshells, so I wanted to do this instead.
exec 5< <(seq 1 100)
outputStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff |
tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff > endFile
ie, I want the debug line I insert to be identical, so I don't have to worry about stepping on various stuff. The read/REPLY echo seems really ugly.. I suppose I could wrap in a function.. but is there a way to read one line from a file descriptor to stdout without closing the fd (like a head -1 would close the fd is I did head -1 <&3)
I tried several things, but in the end #Etan Reisner proved to me (unintentionally) that even if there is a way to do what you asked (clever, Etan), it's not what you actually want. If you want to be sure to read the numbers back sequentially then the reads have to be serialized, which the commands in a pipeline are not.
Indeed, that applies to your original approach as well, since command substitutions are performed in subshells. I think you could reliably do it like this, though:
debugum=1
eval "
outputStuff |
tee debugFile.$((debugNum++)) |
filterStuff |
transformStuff |
doMoreStuff |
tee debugFile.$((debugNum++)) |
endStuff > endFile
"
That way all the substitutions are performed by the parent shell, on the string, before any of the commands is launched.
Put all the reads inside a single compound statement, and redirect input from descriptor 5 for that.
{ outputStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
filterStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
transformStuff | tee debugFile.$(read -u 5;echo $REPLY;) |
doMoreStuff |
endStuff
} 5< <(seq 1 100) > endFile
Now, file descriptor 5 is opened once (and closed once), and each call to read gets successive lines from that descriptor.
(You can also simplify this a bit; unless you are providing outputStuff with input via the keyboard, there doesn't seem to be a need to use file descriptor 5 instead of standard input, since only outputStuff is reading from standard input. All the other programs are reading their standard input via the pipeline.)
Here's an approach that doesn't need seq at all:
Define a function that constructs your pipeline recursively.
buildWithDebugging() {
local -a nextCmd=( )
local debugNum=$1; shift
while (( $# )) && [[ $1 != '|' ]]; do
nextCmd+=( "$1" )
shift
done
if (( ${#nextCmd[#]} )); then
"${nextCmd[#]}" \
| tee "debugFile.$debugNum" \
| buildWithDebugging "$((debugNum + 1))" "$#"
else
cat # noop
fi
}
...and, to use it:
buildWithDebugging 0 \
outputStuff '|'
filterStuff '|'
transformStuff '|'
doMoreStuff '|'
endStuff > endFile
A more secure version would use pipeline components done in the style of Pascal strings rather than C strings -- which is to say, instead of using literal |s, preceding each string of commands with its length:
buildWithDebugging 0 \
1 outputStuff
3 filterStuff filterArg filterSecondArg
2 transformStuff filterArg
1 doMoreStuff
endStuff > endFile
Building this should be a completely trivial exercise for the reader. :)
At the cost of evenly padding your numbers you can do this with dd though you don't end up with a nicer looking command for your trouble. =)
exec 5< <(seq -w 10 -1 1)
echo -n |
{ echo "d 1:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 2:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 3:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
{ echo "d 4:$(dd bs=1 count=3 2>/dev/null <&5)"; cat; } |
cat
You can also just use a shorter variable with read: read -u 5 a;echo $a but that only saves you two characters.

OS X / Linux: pipe into two processes?

I know about
program1 | program2
and
program1 | tee outputfile | program2
but is there a way to feed program1's output into both program2 and program3?
You can do this with tee and process substitution.
program1 | tee >(program2) >(program3)
The output of program1 will be piped to whatever is inside ( ), in this case program2 and program3.
Intro about parallelisation
This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.
You may have to take care about some particular effects, like order of execution, exection time, etc.
There are some sample at end of this post.
Compatible answer first
As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)
Yes, there is a way to use unnamed pipes.
In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...
For this to I will first run the preparation:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
SED_CMD=`which sed`
Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing
way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).
The syntax NN>&1 (where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).
Try this (tested under dash, busybox and bash) :
(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
or more readable:
GZIP_CMD=`which gzip`
BZIP2_CMD=`which bzip2`
LZMA_CMD=`which lzma`
XZ_CMD=`which xz`
MD5SUM_CMD=`which md5sum`
(
(
(
(
seq 1 100000 |
shuf |
tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |
$GZIP_CMD >/tmp/tst.gz
) 4>&1 |
$BZIP2_CMD >/tmp/tst.bz2
) 5>&1 |
$LZMA_CMD >/tmp/tst.lzma
) 6>&1 |
$XZ_CMD >/tmp/tst.xz
) 7>&1 |
$MD5SUM_CMD
2e67f6ad33745dc5134767f0954cbdd6 -
As shuf do random placement, if you try this, you must obtain different result,
ls -ltrS /tmp/tst.*
-rw-r--r-- 1 user user 230516 oct 1 22:14 /tmp/tst.bz2
-rw-r--r-- 1 user user 254811 oct 1 22:14 /tmp/tst.lzma
-rw-r--r-- 1 user user 254892 oct 1 22:14 /tmp/tst.xz
-rw-r--r-- 1 user user 275003 oct 1 22:14 /tmp/tst.gz
but you must be able to compare md5 checksums:
SED_CMD=`which sed`
for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do
${chk#*:} -d < /tmp/tst.${chk%:*} |
$MD5SUM_CMD |
$SED_CMD s/-$/tst.${chk%:*}/
done
2e67f6ad33745dc5134767f0954cbdd6 tst.gz
2e67f6ad33745dc5134767f0954cbdd6 tst.bz2
2e67f6ad33745dc5134767f0954cbdd6 tst.lzma
2e67f6ad33745dc5134767f0954cbdd6 tst.xz
Using bash features
Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}, instead of tee /dev/fd/4 /dev/fd/5 /...
(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |
bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |
xz >/tmp/tst.xz ) 7>&1 | md5sum
29078875555e113b31bd1ae876937d4b -
will work same.
Final check
This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):
(
(
(
(
(
seq 1 100000 |
tee /dev/fd/{4,5,6,7} |
gzip |
wc -c |
sed s/^/gzip:\ \ / >&3
) 4>&1 |
bzip2 |
wc -c |
xargs printf "bzip2: %s\n" >&3
) 5>&1 |
lzma |
wc -c |
perl -pe 's/^/lzma: /' >&3
) 6>&1 |
xz |
wc -c |
awk '{printf "xz: %9s\n",$1}' >&3
) 7>&1 |
wc -c
) 3>&1
gzip: 215157
bzip2: 124009
lzma: 17948
xz: 17992
588895
This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.
Syntax >(...) and <(...)
Recent bash versions permit a new syntax feature.
seq 1 100000 | wc -l
100000
seq 1 100000 > >( wc -l )
100000
wc -l < <( seq 1 100000 )
100000
As | is an unnamed pipe to /dev/fd/0, the syntax <() do generate temporary unnamed pipe with others file descriptor /dev/fd/XX.
md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(
lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz)
29078875555e113b31bd1ae876937d4b /dev/fd/63
29078875555e113b31bd1ae876937d4b /dev/fd/62
29078875555e113b31bd1ae876937d4b /dev/fd/61
29078875555e113b31bd1ae876937d4b /dev/fd/60
More sophisticated demo
This require GNU file utility to be installed. Will determine command to be run by extension or file type.
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $md5 \ $file
done
29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
29078875555e113b31bd1ae876937d4b /tmp/tst.gz
29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
29078875555e113b31bd1ae876937d4b /tmp/tst.xz
This let you do same previous thing by following syntax:
seq 1 100000 |
shuf |
tee >(
echo gzip. $( gzip | wc -c )
) >(
echo gzip, $( wc -c < <(gzip))
) >(
gzip | wc -c | sed s/^/gzip:\ \ /
) >(
bzip2 | wc -c | xargs printf "bzip2: %s\n"
) >(
lzma | wc -c | perl -pe 's/^/lzma: /'
) >(
xz | wc -c | awk '{printf "xz: %9s\n",$1}'
) > >(
echo raw: $(wc -c)
) |
xargs printf "%-8s %9d\n"
raw: 588895
xz: 254556
lzma: 254472
bzip2: 231111
gzip: 274867
gzip, 274867
gzip. 274867
Note I used different way used to compute gzip compressed count.
Note Because this operation was done simultaneously, output order will depend on time required by each command.
Going further about parallelisation
If you run some multi-core or multi-processor computer, try to compare this:
i=1
time for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
read -a md5 < <($cmd -d <$file|md5sum)
echo $((i++)) $md5 \ $file
done |
cat -n
wich may render:
1 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
2 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
3 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
4 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
real 0m0.101s
with this:
time (
i=1 pids=()
for file in /tmp/tst.*;do
cmd=$(which ${file##*.}) || {
cmd=$(file -b --mime-type $file)
cmd=$(which ${cmd#*-})
}
(
read -a md5 < <($cmd -d <$file|md5sum)
echo $i $md5 \ $file
) & pids+=($!)
((i++))
done
wait ${pids[#]}
) |
cat -n
could give:
1 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz
2 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2
3 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
4 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma
real 0m0.070s
where ordering depend on type used by each fork.
The bash manual mentions how it emulates the >(...) syntax using either named pipes or named file descriptors, so if you don't want to depend on bash, perhaps you could do that manually in your script.
mknod FIFO
program3 < FIFO &
program1 | tee FIFO | program2
wait
rm FIFO
Other answers introduce the concept. Here is an actual demonstration:
$ echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3
$ cat out1
11e001d91e4badcff8fe22aea05a7458 -
$ echo "Leeroy Jenkins" | md5sum
11e001d91e4badcff8fe22aea05a7458 -
$ cat out2
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ echo "Leeroy Jenkins" | sha1sum
5ed25619ce04b421fab94f57438d6502c66851c1 -
$ cat out3
Leeroy Jenkins
Of course you can > /dev/null instead of out3.
You can always try to save output of program1 to a file and then feed it into program2 and program3 input.
program1 > temp; program2 < temp; program3 < temp;
use (;) syntax... try ps aux | (head -n 1; tail -n 1)

How to assign output of multiple shell commmands to variable when using tee?

I want to tee and get the results from multiple shell commands connected in the pipeline. I made a simple example to explain the point. Suppose I wanna count the numbers of 'a', 'b' and 'c'.
echo "abcaabbcabc" | tee >(tr -dc 'a' | wc -m) >(tr -dc 'b' | wc -m) >(tr -dc 'c' | wc -m) > /dev/null
Then I tried to assign the result from each count to a shell variable, but they all end up empty.
echo "abcaabbcabc" | tee >(A=$(tr -dc 'a' | wc -m)) >(B=$(tr -dc 'b' | wc -m)) >(C=$(tr -dc 'c' | wc -m)) > /dev/null && echo $A $B $C
What is the right way to do it?
Use files. They are the single most reliable solution. Any of the commands may need different time to run. There is no easy way to synchronize command redirections. Then most reliable way is to use a separate "entity" to collect all the data:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
tee >(tr -dc 'a' | wc -m > "$tmpa") >(tr -dc 'b' | wc -m > "$tmpb") |
tr -dc 'c' | wc -m > "$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
rm "$tmpa" "$tmpb" "$tmpc"
trap '' EXIT
Second way:
You can prepend the data from each stream with a custom prefix. Then sort all lines (basically, buffer them) on the prefix and then read them. The example script will generate only a single number from each process substitution, so it's easy to do:
read -r A B C < <(
echo "abcaabbcabc" |
tee >(
tr -dc 'a' | wc -m | sed 's/^/A /'
) >(
tr -dc 'b' | wc -m | sed 's/^/B /'
) >(
tr -dc 'c' | wc -m | sed 's/^/C /'
) >/dev/null |
sort |
cut -d' ' -f2 |
paste -sd' '
)
echo A="$A" B="$B" C="$C"
Using temporary files with flock to synchronize the output of child processes could look like this:
tmpa=$(mktemp) tmpb=$(mktemp) tmpc=$(mktemp)
trap 'rm "$tmpa" "$tmpb" "$tmpc"' EXIT
echo "abcaabbcabc" |
(
flock 3
flock 4
flock 5
tee >(
tr -dc 'a' | wc -m |
{ sleep 0.1; cat; } > "$tmpa"
# unblock main thread
flock -u 3
) >(
tr -dc 'b' | wc -m |
{ sleep 0.2; cat; } > "$tmpb"
# unblock main thread
flock -u 4
) >(
tr -dc 'c' | wc -m |
{ sleep 0.3; cat; } > "$tmpc"
# unblock main thread
flock -u 5
) >/dev/null
# wait for subprocesses to finish
# need to re-open the files to block on them
(
flock 3
flock 4
flock 5
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
) 3<"$tmpa" 4<"$tmpb" 5<"$tmpc"
A=$(<"$tmpa")
B=$(<"$tmpb")
C=$(<"$tmpc")
declare -p A B C
You can use this featured letter frequency analysis
#!/usr/bin/env bash
declare -A letter_frequency
while read -r v k; do
letter_frequency[$k]="$v"
done < <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
for k in "${!letter_frequency[#]}"; do
printf '%c = %d\n' "$k" "${letter_frequency[$k]}"
done
Output:
c = 3
b = 4
a = 4
Or to only assign $A, $B and $C as in your example:
#!/usr/bin/env bash
{
read -r A _
read -r B _
read -r C _
}< <(
grep -o '[[:alnum:]]' <<<"abcaabbcabc" |
sort |
uniq -c
)
printf 'a=%d\nb=%d\nc=%d\n' "$A" "$B" "$C"
grep -o '[[:alnum:]]': split each alphanumeric character on its own line
sort: sort lines of characters
uniq -c: count each instance and output count and character for each
< <( command group; ): the output of this command group is for stdin of the command group before
If you need to count occurrence of non-printable characters, newlines, spaces, tabs, you have to make all these commands output and deal with null delimited lists. It can sure be done with the GNU versions of these tools. I let it to you as an exercise.
Solution to the count arbitrary characters except null:
As demonstrated, works also with Unicode.
#!/usr/bin/env bash
declare -A character_frequency
declare -i v
while read -d '' -r -N 8 v && read -r -d '' -N 1 k; do
character_frequency[$k]="$v"
done < <(
grep --only-matching --null-data . <<<$'a¹bc✓ ✓\n\t\t\u263A☺ ☺ aabbcabc' |
head --bytes -2 | # trim the newline added by grep
sort --zero-terminated | # sort null delimited list
uniq --count --zero-terminated # count occurences of char (null delim)
)
for k in "${!character_frequency[#]}"; do
printf '%q = %d\n' "$k" "${character_frequency[$k]}"
done
Output:
$'\n' = 1
$'\t' = 2
☺ = 3
\ = 7
✓ = 2
¹ = 1
c = 3
b = 4
a = 4

Resources