Is there a shell command to delay a buffer? - bash

I am looking for a shell command X such as, when I execute:
command_a | X 5000 | command_b
the stdout of command_a is written in stdin of command_b (at least) 5 seconds later.
A kind of delaying buffer.
As far as I know, buffer/mbuffer can write at constant rate (a fixed number of bytes per second). Instead, I would like a constant delay in time (t=0 is when X read a command_a output chunk, at t=5000 it must write this chunk to command_b).
[edit] I've implemented it: https://github.com/rom1v/delay

I know you said you're looking for a shell command, but what about using a subshell to your advantage? Something like:
command_a | (sleep 5; command_b)
So to grep a file cat-ed through (I know, I know, bad use of cat, but just an example):
cat filename | (sleep 5; grep pattern)
A more complete example:
$ cat testfile
The
quick
brown
fox
$ cat testfile | (sleep 5; grep brown)
# A 5-second sleep occurs here
brown
Or even, as Michale Kropat recommends, a group command with sleep would also work (and is arguably more correct). Like so:
$ cat testfile | { sleep 5; grep brown; }
Note: don't forget the semicolon after your command (here, the grep brown), as it is necessary!

As it seemed such a command dit not exist, I implemented it in C:
https://github.com/rom1v/delay
delay [-b <dtbufsize>] <delay>

Something like this?
#!/bin/bash
while :
do
read line
sleep 5
echo $line
done
Save the file as "slowboy", then do
chmod +x slowboy
and run as
command_a | ./slowboy | command_b

This might work
time_buffered () {
delay=$1
while read line; do
printf "%d %s\n" "$(date +%s)" "$line"
done | while read ts line; do
now=$(date +%s)
if (( now - ts < delay)); then
sleep $(( now - ts ))
fi
printf "%s\n" "$line"
done
}
commandA | time_buffered 5 | commandB
The first loop tags each line of its input with a timestamp and immediately feeds it to the second loop. The second loop checks the timestamp of each line, and will sleep if necessary until $delay seconds after it was first read before outputting the line.

Your question intrigued me, and I decided to come back and play with it. Here is a basic implementation in Perl. It's probably not portable (ioctl), tested on Linux only.
The basic idea is:
read available input every X microseconds
store each input chunk in a hash, with current timestamp as key
also push current timestamp on a queue (array)
lookup oldest timestamps on queue and write + discard data from the hash if delayed long enough
repeat
Max buffer size
There is a max size for stored data. If reached, additional data will not be read until space becomes available after writing.
Performance
It is probably not fast enough for your requirements (several Mb/s). My max throughput was 639 Kb/s, see below.
Testing
# Measure max throughput:
$ pv < /dev/zero | ./buffer_delay.pl > /dev/null
# Interactive manual test, use two terminal windows:
$ mkfifo data_fifo
terminal-one $ cat > data_fifo
terminal-two $ ./buffer_delay.pl < data_fifo
# now type in terminal-one and see it appear delayed in terminal-two.
# It will be line-buffered because of the terminals, not a limitation
# of buffer_delay.pl
buffer_delay.pl
#!/usr/bin/perl
use strict;
use warnings;
use IO::Select;
use Time::HiRes qw(gettimeofday usleep);
require 'sys/ioctl.ph';
$|++;
my $delay_usec = 3 * 1000000; # (3s) delay in microseconds
my $buffer_size_max = 10 * 1024 * 1024 ; # (10 Mb) max bytes our buffer is allowed to contain.
# When buffer is full, incoming data will not be read
# until space becomes available after writing
my $read_frequency = 10; # Approximate read frequency in Hz (will not be exact)
my %buffer; # the data we are delaying, saved in chunks by timestamp
my #timestamps; # keys to %buffer, used as a queue
my $buffer_size = 0; # num bytes currently in %buffer, compare to $buffer_size_max
my $time_slice = 1000000 / $read_frequency; # microseconds, min time for each discrete read-step
my $sel = IO::Select->new([\*STDIN]);
my $overflow_unread = 0; # Num bytes waiting when $buffer_size_max is reached
while (1) {
my $now = sprintf "%d%06d", gettimeofday; # timestamp, used to label incoming chunks
# input available?
if ($overflow_unread || $sel->can_read($time_slice / 1000000)) {
# how much?
my $available_bytes;
if ($overflow_unread) {
$available_bytes = $overflow_unread;
}
else {
$available_bytes = pack("L", 0);
ioctl (STDIN, FIONREAD(), $available_bytes);
$available_bytes = unpack("L", $available_bytes);
}
# will it fit?
my $remaining_space = $buffer_size_max - $buffer_size;
my $try_to_read_bytes = $available_bytes;
if ($try_to_read_bytes > $remaining_space) {
$try_to_read_bytes = $remaining_space;
}
# read input
if ($try_to_read_bytes > 0) {
my $input_data;
my $num_read = read (STDIN, $input_data, $try_to_read_bytes);
die "read error: $!" unless defined $num_read;
exit if $num_read == 0; # EOF
$buffer{$now} = $input_data; # save input
push #timestamps, $now; # save the timestamp
$buffer_size += length $input_data;
if ($overflow_unread) {
$overflow_unread -= length $input_data;
}
elsif (length $input_data < $available_bytes) {
$overflow_unread = $available_bytes - length $input_data;
}
}
}
# write + delete any data old enough
my $then = $now - $delay_usec; # when data is old enough
while (scalar #timestamps && $timestamps[0] < $then) {
my $ts = shift #timestamps;
print $buffer{$ts} if defined $buffer{$ts};
$buffer_size -= length $buffer{$ts};
die "Serious problem\n" unless $buffer_size >= 0;
delete $buffer{$ts};
}
# usleep any remaining time up to $time_slice
my $time_left = (sprintf "%d%06d", gettimeofday) - $now;
usleep ($time_slice - $time_left) if $time_slice > $time_left;
}
Feel free to post comments and suggestions below!

Related

BASH: How to write values generated by a for loop to a file quickly

I have a for loop in bash that writes values to a file. However, because there are a lot of values, the process takes a long time, which I think can be saved by improving the code.
nk=1152
nb=24
for k in $(seq 0 $((nk-1))); do
for i in $(seq 0 $((nb-1))); do
for j in $(seq 0 $((nb-1))); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
I've moved the output action to after the entire loop is done rather than echo -e "$k\t$i\t$j" >> file.dat to avoid opening and closing the file many times. However, the speed the script writes to the file is still rather slow, ~ 10kbps.
Is there a better way to improve the IO?
Many thanks
Jacek
It looks like the seq calls are fairly punishing since that is a separate process. Try this just using shell math instead:
for ((k=0;k<=$nk-1;k++)); do
for ((i=0;i<=$nb-1;i++)); do
for ((j=0;j<=$nb-1;j++)); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
It takes just 7.5s on my machine.
Another way is to compute the sequences just once and use them repeatedly, saving a lot of shell calls:
nk=1152
nb=24
kseq=$(seq 0 $((nk-1)))
bseq=$(seq 0 $((nb-1)))
for k in $kseq; do
for i in $bseq; do
for j in $bseq; do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
This is not really "better" than the first option, but it shows how much of the time is spent spinning up instances of seq versus actually getting stuff done.
Bash isn't always the best for this. Consider this Ruby equivalent which runs in 0.5s:
#!/usr/bin/env ruby
nk=1152
nb=24
nk.times do |k|
nb.times do |i|
nb.times do |j|
puts "%d\t%d\t%d" % [ k, i, j ]
end
end
end
What is the most time consuming is calling seq in a nested loop. Keep in mind that each time you call seq it loads command from disk, fork a process to run it, capture the output, and store the whole output sequence into memory.
Instead of calling seq you could use an arithmetic loop:
#!/usr/bin/env bash
declare -i nk=1152
declare -i nb=24
declare -i i j k
for ((k=0; k<nk; k++)); do
for (( i=0; i<nb; i++)); do
for (( j=0; j<nb; j++)); do
printf '%d\t%d\t%d\n' "$k" "$i" "$j"
done
done
done > file.dat
Running seq in a subshell consumes most of the time.
Switch to a different language that provides all the needed features without shelling out. For example, in Perl:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $nk = 1152;
my $nb = 24;
for my $k (0 .. $nk - 1) {
for my $i (0 .. $nb - 1) {
for my $j (0 .. $nb - 1) {
say "$k\t$i\t$j"
}
}
}
The original bash solution runs for 22 seconds, the Perl one finishes in 0.1 seconds. The output is identical.
#Jacek : I don't think the I/O is the problem, but the number of child processes spawned. I would store the result of the seq 0 $((nb-1)) into an array and loop over the array, i.e.
nb_seq=( $(seq 0 $((nb-1)) )
...
for i in "${nb_seq[#]}"; do
for j in "${nb_seq[#]}"; do
seq is bad) once i've done this function special for this case:
$ que () { printf -v _N %$1s; _N=(${_N// / 1}); printf "${!_N[*]}"; }
$ que 10
0 1 2 3 4 5 6 7 8 9
And you can try to write first all to a var and then whole var into a file:
store+="$k\t$i\t$j\n"
printf "$store" > file
No. it's even worse like that)

Show average file output speed once for loop is complete, for benchmarking purposes?

Sorry for being unclear my follow mates,
So to elaborate and possibly answer my own question, while Distro1Analysis.txt is being written to, calculate output speed in kb/s and when output is done then average output speed and print to screen.
The second part, its own question really, is quite simple, I'm not a computer scientist or advanced programmer, but I am certain there's an relatively easy way to improve the overall execution speed of the script which asking what is the speed culprit, how the script was written, the chosen programs, the mix of programs (i.e., is it faster to use 3 instances of the same program as opposed to one instance of 3 different programs...) For instance, could recursive-ness be used and how?
I was orignally going to ask how to benchmark the speed of a program to run one command, but it seemed simpler to use an overarching (global) benchmark hence the question. But any help you can provide would be useful.
Rdepends Version
ps -A &>> Distro1Analysis.txt && sudo service --status-all &>> Distro1Analysis.txt && \
for z in $(dpkg -l | awk '/^[hi]i/{print $2}' | grep -v '^lib'); do \
printf "\n$z:" && \
aptitude show $z | grep -E 'Uncompressed Size' && \
result=$(apt-rdepends 2>/dev/null $z | grep -v "Depends")
final=$(apt show 2>/dev/null $result | grep -E "Package|Installed-Size" | sed "/APT/d;s/Installed-Size: //");
if [[ (${#final} -le 700) ]]; then echo $final; else :; fi done &>> Distro1Analysis.txt
Depends Version
ps -A &>> Distro1Analysis.txt && sudo service --status-all &>> Distro1Analysis.txt && \
for z in $(dpkg -l | awk '/^[hi]i/{print $2}' | grep -v '^lib'); do \
printf "\n$z:" && \
aptitude show $z | grep -E 'Uncompressed Size' && \
printf "\n" && \
apt show 2>/dev/null $(aptitude search '!~i?reverse-depends("^'$z'$")' -F "%p" | \
sed 's/:i386$//') | grep -E 'Package|Installed-Size' | sed '/APT/d;s/^.*Package:/\t&/;N;s/\n/ /'; done &>> Distro1Analysis.txt
calculate output speed in kb/s and when output is done then average
output speed and print to screen
Here's an answer that's basically
Starting your script to run in the background.
Checking the size of its output file every two seconds with du -b.
Run the following bash script like so: $ bash scriptoutmon.sh subscript.sh Distro1Analysis.txt 12 10 2
scriptoutmon.sh usage:
$1 : Path to the subscript to run
$2 : Path to output file to monitor
$3 : How long to run scriptoutmon.sh script in seconds.
$4 : How long to run the subscript ($1)
$5 : Tick length for displayed updates in seconds.
scriptoutmon.sh:
#!/bin/bash
# Date: 2020-04-13T23:03Z
# Author: Steven Baltakatei Sandoval
# License: GPLv3+ https://www.gnu.org/licenses/gpl-3.0.en.html
# Description: Runs subscript and measures change in file size of a specified file.
# Usage: scriptoutmon.sh [ path to subscript ] [ path to subscript output file ] [ script TTL (s) ] [ subscript TTL (s) ] [ tick size (s) ]
# References:
# [1]: Adrian Pronk (2013-02-22). "Floating point results in Bash integer division". https://stackoverflow.com/a/15015920
# [2]: chronitis (2012-11-15). "bc: set number of digits after decimal point". https://askubuntu.com/a/217575
# [3]: ypnos (2020-02-12). "Differences of size in du -hs and du -b". https://stackoverflow.com/a/60196741
# == Function Definitions ==
echoerr() { echo "$#" 1>&2; } # display message via stderr
getSize() { echo $(du -b "$1" | awk '{print $1}'); } # output file size in bytes. See [3].
# == Initialize settings ==
SUBSCRIPT_PATH="$1" # path to subscript to run
SUBSCRIPT_OUTPUT_PATH="$2" # path to output file generated by subscript
SCRIPT_TTL="$3" # set script time-to-live in seconds
SUBSCRIPT_TTL="$4" # set subscript time-to-live in seconds
TICK_SIZE="$5" # update tick size (in seconds)
# == Perform work ==
timeout $SUBSCRIPT_TTL bash "$SUBSCRIPT_PATH" & # run subscript for SCRIPT_TTL seconds.
# note: SUBSCRIPT_OUTPUT_PATH should be path of output file generated by subscript.sh .
if [ -f $SUBSCRIPT_OUTPUT_PATH ]; then SUBSCRIPT_OUTPUT_INITIAL_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH"); else SUBSCRIPT_OUTPUT_INITIAL_SIZE="0"; fi # save initial size if file exists.
echoerr "Running $(basename "$SUBSCRIPT_PATH") and then monitoring rate of file size changes to $(basename "$SUBSCRIPT_OUTPUT_PATH")." # explain displayed output
# Calc and display subscript output file size changes
while [ $SECONDS -lt $SCRIPT_TTL ]; do # loop while script age (in seconds) less than SCRIPT_TTL.
if [ $SECONDS -ge $TICK_SIZE ]; then # if after first tick
OUTPUT_PREVIOUS_SIZE="$OUTPUT_CURRENT_SIZE" ; # save size previous tick
OUTPUT_CURRENT_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") ; # save size current tick
BYTES_WRITTEN=$(( $OUTPUT_CURRENT_SIZE - $OUTPUT_PREVIOUS_SIZE )) ; # calc size difference between current and previous ticks.
WRITE_SPEED_BYTES_PER_SECOND=$(($BYTES_WRITTEN / $TICK_SIZE)) ; # calc write speed in bytes per second
WRITE_SPEED_KILOBYTES_PER_SECOND=$( echo "scale=3; $WRITE_SPEED_BYTES_PER_SECOND / 1000" | bc -l ) ; # calc write speed in kilobytes per second. See [1], [2].
echo "File size change rate (KB/sec):"$WRITE_SPEED_KILOBYTES_PER_SECOND ;
else # if first tick
OUTPUT_CURRENT_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") # save size current tick (initial)
fi
sleep "$TICK_SIZE"; # wait a tick
done
SUBSCRIPT_OUTPUT_FINAL_SIZE=$(getSize "$SUBSCRIPT_OUTPUT_PATH") # save final size
# == Display results ==
SUBSCRIPT_OUTPUT_TOTAL_CHANGE_BYTES=$(( $SUBSCRIPT_OUTPUT_FINAL_SIZE - $SUBSCRIPT_OUTPUT_INITIAL_SIZE )) # calc total size change in bytes
SUBSCRIPT_OUTPUT_TOTAL_CHANGE_KILOBYTES=$( echo "scale=3; $SUBSCRIPT_OUTPUT_TOTAL_CHANGE_BYTES / 1000" | bc -l ) # calc total size change in kilobytes. See [1], [2].
echoerr "$SUBSCRIPT_OUTPUT_TOTAL_CHANGE_KILOBYTES kilobytes added to $SUBSCRIPT_OUTPUT_PATH size in $SUBSCRIPT_TTL seconds."
exit 0;
You should get output like this:
baltakatei#debianwork:/tmp$ bash scriptoutmon.sh subscript.sh Distro1Analysis.txt 12 10 2
Running subscript.sh and then monitoring rate of file size changes to Distro1Analysis.txt.
File size change rate (KB/sec):6.302
File size change rate (KB/sec):.351
File size change rate (KB/sec):.376
File size change rate (KB/sec):.345
File size change rate (KB/sec):.335
15.419 kilobytes added to Distro1Analysis.txt size in 10 seconds.
baltakatei#debianwork:/tmp$
Increase $3 and $4 to monitor the script longer (perhaps to let it finish its work).
The second part, its own question really
I'd suggest making it a separate question.

Calculate average execution time of a program using Bash

To get the execution time of any executable, say a.out, I can simply write time ./a.out. This will output a real time, user time and system time.
Is it possible write a bash script that runs the program numerous times and calculates and outputs the average real execution time?
You could write a loop and collect the output of time command and pipe it to awk to compute the average:
avg_time() {
#
# usage: avg_time n command ...
#
n=$1; shift
(($# > 0)) || return # bail if no command given
for ((i = 0; i < n; i++)); do
{ time -p "$#" &>/dev/null; } 2>&1 # ignore the output of the command
# but collect time's output in stdout
done | awk '
/real/ { real = real + $2; nr++ }
/user/ { user = user + $2; nu++ }
/sys/ { sys = sys + $2; ns++}
END {
if (nr>0) printf("real %f\n", real/nr);
if (nu>0) printf("user %f\n", user/nu);
if (ns>0) printf("sys %f\n", sys/ns)
}'
}
Example:
avg_time 5 sleep 1
would give you
real 1.000000
user 0.000000
sys 0.000000
This can be easily enhanced to:
sleep for a given amount of time between executions
sleep for a random time (within a certain range) between executions
Meaning of time -p from man time:
-p
When in the POSIX locale, use the precise traditional format
"real %f\nuser %f\nsys %f\n"
(with numbers in seconds) where the number of decimals in the
output for %f is unspecified but is sufficient to express the
clock tick accuracy, and at least one.
You may want to check out this command-line benchmarking tool as well:
sharkdp/hyperfine
Total execution time vs sum of single execution time
Care! dividing sum of N rounded execution time is imprecise!
Instead, we could divide total execution time of N iteration (by N)
avg_time_alt() {
local -i n=$1
local foo real sys user
shift
(($# > 0)) || return;
{ read foo real; read foo user; read foo sys ;} < <(
{ time -p for((;n--;)){ "$#" &>/dev/null ;} ;} 2>&1
)
printf "real: %.5f\nuser: %.5f\nsys : %.5f\n" $(
bc -l <<<"$real/$n;$user/$n;$sys/$n;" )
}
Nota: This uses bc instead of awk to compute the average. For this, we would create a temporary bc file:
printf >/tmp/test-pi.bc "scale=%d;\npi=4*a(1);\nquit\n" 60
This would compute ΒΆ with 60 decimals, then exit quietly. (You can adapt number of decimals for your host.)
Demo:
avg_time_alt 1000 sleep .001
real: 0.00195
user: 0.00008
sys : 0.00016
avg_time_alt 1000 bc -ql /tmp/test-pi.bc
real: 0.00172
user: 0.00120
sys : 0.00058
Where codeforester's function will anser:
avg_time 1000 sleep .001
real 0.000000
user 0.000000
sys 0.000000
avg_time 1000 bc -ql /tmp/test-pi.bc
real 0.000000
user 0.000000
sys 0.000000
Alternative, inspired by choroba's answer, using Linux's/proc
Ok, you could consider:
avgByProc() {
local foo start end n=$1 e=$1 values times
shift;
export n;
{
read foo;
read foo;
read foo foo start foo
} < /proc/timer_list;
mapfile values < <(
for((;n--;)){ "$#" &>/dev/null;}
read -a endstat < /proc/self/stat
{
read foo
read foo
read foo foo end foo
} </proc/timer_list
printf -v times "%s/100/$e;" ${endstat[#]:13:4}
bc -l <<<"$[end-start]/10^9/$e;$times"
)
printf -v fmt "%-7s: %%.5f\\n" real utime stime cutime cstime
printf "$fmt" ${values[#]}
}
This is based on /proc:
man 5 proc | grep [su]time\\\|timer.list | sed 's/^/> /'
(14) utime %lu
(15) stime %lu
(16) cutime %ld
(17) cstime %ld
/proc/timer_list (since Linux 2.6.21)
Then now:
avgByProc 1000 sleep .001
real : 0.00242
utime : 0.00015
stime : 0.00021
cutime : 0.00082
cstime : 0.00020
Where utime and stime represent user time and system time for bash himself and cutime and cstime represent child user time and child system time wich are the most interesting.
Nota: In this case (sleep) command won't use a lot of ressources.
avgByProc 1000 bc -ql /tmp/test-pi.bc
real : 0.00175
utime : 0.00015
stime : 0.00025
cutime : 0.00108
cstime : 0.00032
This become more clear...
Of course, as accessing timer_list and self/stat successively but not atomicaly, differences between real (nanosecs based) and c?[su]time (based in ticks ie: 1/100th sec) may appear!
From bashoneliners
adapted to transform (,) to (.) for i18n support
hardcoded to 10, adapt as needed
returns only the "real" value, the one you most likely want
Oneliner
for i in {1..10}; do time $#; done 2>&1 | grep ^real | sed s/,/./ | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'
I made a "fuller" version
outputs the results of every execution so you know the right thing is executed
shows every run time, so you glance for outliers
But really, if you need advanced stuff just use hyperfine.
GREEN='\033[0;32m'
PURPLE='\033[0;35m'
RESET='\033[0m'
# example: perf sleep 0.001
# https://serverfault.com/questions/175376/redirect-output-of-time-command-in-unix-into-a-variable-in-bash
perfFull() {
TIMEFORMAT=%R # `time` outputs only a number, not 3 lines
export LC_NUMERIC="en_US.UTF-8" # `time` outputs `0.100` instead of local format, like `0,100`
times=10
echo -e -n "\nWARMING UP ${PURPLE}$#${RESET}"
$# # execute passed parameters
echo -e -n "RUNNING ${PURPLE}$times times${RESET}"
exec 3>&1 4>&2 # redirects subshell streams
durations=()
for _ in `seq $times`; {
durations+=(`{ time $# 1>&3 2>&4; } 2>&1`) # passes stdout through so only `time` is caputured
}
exec 3>&- 4>&- # reset subshell streams
printf '%s\n' "${durations[#]}"
total=0
for duration in "${durations[#]}"; {
total=$(bc <<< "scale=3;$total + $duration")
}
average=($(bc <<< "scale=3;$total/$times"))
echo -e "${GREEN}$average average${RESET}"
}
It's probably easier to record the start and end time of the execution and divide the difference by the number of executions.
#!/bin/bash
times=10
start=$(date +%s)
for ((i=0; i < times; i++)) ; do
run_your_executable_here
done
end=$(date +%s)
bc -l <<< "($end - $start) / $times"
I used bc to calculate the average, as bash doesn't support floating point arithmetics.
To get more precision, you can switch to nanoseconds:
start=$(date +%s.%N)
and similarly for $end.

While loop computed hash compare in bash?

I am trying to write a script to count the number of zero fill sectors for a dd image file. This is what I have so far, but it is throwing an error saying it cannot open file #hashvalue#. Is there a better way to do this or what am I missing? Thanks in advance.
count=1
zfcount=0
while read Stuff; do
count+=1
if [ $Stuff == "bf619eac0cdf3f68d496ea9344137e8b" ]; then
zfcount+=1
fi
echo $Stuff
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
echo "Total Sector Count Is: $count"
echo "Zero Fill Sector Count is: $zfcount"
Doing this in bash is going to be extremely slow -- on the order of 20 minutes for a 1GB file.
Use another language, like Python, which can do this in a few seconds (if storage can keep up):
python -c '
import sys
total=0
zero=0
file = open(sys.argv[1], "r")
while True:
a=file.read(512)
if a:
total = total + 1
if all(x == "\x00" for x in a):
zero = zero + 1
else:
break
print "Total sectors: " + str(total)
print "Zeroed sectors: " + str(zero)
' yourfilehere
Your error message comes from this line:
done < "$(dd if=test.dd bs=512 2> /dev/null | md5sum | cut -d ' ' -f 1)"
What that does is reads your entire test.dd, calculates the md5sum of that data, and parses out just the hash value, then, by merit of being included inside $( ... ), it substitutes that hash value in place, so you end up with that line essentially acting like this:
done < e6e8c42ec6d41563fc28e50080b73025
(except, of course, you have a different hash). So, your shell attempts to read from a file named like the hash of your test.dd image, can't find the file, and complains.
Also, it appears that you are under the assumption that dd if=test.dd bs=512 ... will feed you 512-byte blocks one at a time to iterate over. This is not the case. dd will read the file in bs-sized blocks, and write it in the same sized blocks, but it does not insert a separator or synchronize in any way with whatever is on the other side of its pipe line.

How to make a pipe loop in bash

Assume that I have programs P0, P1, ...P(n-1) for some n > 0. How can I easily redirect the output of program Pi to program P(i+1 mod n) for all i (0 <= i < n)?
For example, let's say I have a program square, which repeatedly reads a number and than prints the square of that number, and a program calc, which sometimes prints a number after which it expects to be able to read the square of it. How do I connect these programs such that whenever calc prints a number, square squares it returns it to calc?
Edit: I should probably clarify what I mean with "easily". The named pipe/fifo solution is one that indeed works (and I have used in the past), but it actually requires quite a bit of work to do properly if you compare it with using a bash pipe. (You need to get a not yet existing filename, make a pipe with that name, run the "pipe loop", clean up the named pipe.) Imagine you could no longer write prog1 | prog2 and would always have to use named pipes to connect programs.
I'm looking for something that is almost as easy as writing a "normal" pipe. For instance something like { prog1 | prog2 } >&0 would be great.
After spending quite some time yesterday trying to redirect stdout to stdin, I ended up with the following method. It isn't really nice, but I think I prefer it over the named pipe/fifo solution.
read | { P0 | ... | P(n-1); } >/dev/fd/0
The { ... } >/dev/fd/0 is to redirect stdout to stdin for the pipe sequence as a whole (i.e. it redirects the output of P(n-1) to the input of P0). Using >&0 or something similar does not work; this is probably because bash assumes 0 is read-only while it doesn't mind writing to /dev/fd/0.
The initial read-pipe is necessary because without it both the input and output file descriptor are the same pts device (at least on my system) and the redirect has no effect. (The pts device doesn't work as a pipe; writing to it puts things on your screen.) By making the input of the { ... } a normal pipe, the redirect has the desired effect.
To illustrate with my calc/square example:
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
read | { calc | square; } >/dev/fd/0
Running the above code gives the following output:
square(0) = 0
got 0
square(1) = 1
got 1
square(2) = 4
got 4
square(3) = 9
got 9
square(4) = 16
got 16
square(5) = 25
got 25
square(6) = 36
got 36
square(7) = 49
got 49
square(8) = 64
got 64
square(9) = 81
got 81
sum 285
Of course, this method is quite a bit of a hack. Especially the read part has an undesired side-effect: termination of the "real" pipe loop does not lead to termination of the whole. I couldn't think of anything better than read as it seems that you can only determine that the pipe loop has terminated by try to writing write something to it.
A named pipe might do it:
$ mkfifo outside
$ <outside calc | square >outside &
$ echo "1" >outside ## Trigger the loop to start
This is a very interesting question. I (vaguely) remember an assignment very similar in college 17 years ago. We had to create an array of pipes, where our code would get filehandles for the input/output of each pipe. Then the code would fork and close the unused filehandles.
I'm thinking you could do something similar with named pipes in bash. Use mknod or mkfifo to create a set of pipes with unique names you can reference then fork your program.
My solutions uses pipexec (Most of the function implementation comes from your answer):
square.sh
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
square $#
calc.sh
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
calc $#
The command
pipexec [ CALC /bin/bash calc.sh ] [ SQUARE /bin/bash square.sh ] \
"{CALC:1>SQUARE:0}" "{SQUARE:1>CALC:0}"
The output (same as in your answer)
square(0) = 0
got 0
square(1) = 1
got 1
square(2) = 4
got 4
square(3) = 9
got 9
square(4) = 16
got 16
square(5) = 25
got 25
square(6) = 36
got 36
square(7) = 49
got 49
square(8) = 64
got 64
square(9) = 81
got 81
sum 285
Comment: pipexec was designed to start processes and build arbitrary pipes in between. Because bash functions cannot be handled as processes, there is the need to have the functions in separate files and use a separate bash.
Named pipes.
Create a series of fifos, using mkfifo
i.e fifo0, fifo1
Then attach each process in term to the pipes you want:
processn < fifo(n-1) > fifon
I doubt sh/bash can do it.
ZSH would be a better bet, with its MULTIOS and coproc features.
A command stack can be composed as string from an array of arbitrary commands
and evaluated with eval. The following example gives the result 65536.
function square ()
{
read n
echo $((n*n))
} # ---------- end of function square ----------
declare -a commands=( 'echo 4' 'square' 'square' 'square' )
#-------------------------------------------------------------------------------
# build the command stack using pipes
#-------------------------------------------------------------------------------
declare stack=${commands[0]}
for (( COUNTER=1; COUNTER<${#commands[#]}; COUNTER++ )); do
stack="${stack} | ${commands[${COUNTER}]}"
done
#-------------------------------------------------------------------------------
# run the command stack
#-------------------------------------------------------------------------------
eval "$stack"

Resources