terminate a shell script without waiting for early parts of pipeline - bash

Consider the following script:
#!/bin/bash
function long_running {
for i in $(seq 1 10); do
echo foo
sleep 100
done
}
long_running | head -n 1
This produces the expected output (one line "foo") but sleeps (for the specified 100 seconds) before terminating. I would like the script to terminate immediately when head does. How can I force bash to actually quit immediately? Even changing the last line to
long_running | (head -n 1; exit)
or similar doesn't work; I can't get set -e, another common suggestion, to work even if I force a failure with, say, (head -n 1; false) or the like.
(This is a simplified version of my real code (obviously) which doesn't sleep; just creates a fairly complex set of nested pipelines searching for various solutions to a constraint problem; as I only need one and don't care which I get, I'd like to be able to make the script terminate by adding head -n 1 to the invocation...)

How about sending the function to head like this -
#!/bin/bash
function long_running {
for i in $(seq 1 10); do
echo foo
sleep 100
done
}
head -n 1 <(long_running)
Obviously if you will increase the -n to a greater number, the sleep would kick in but would exit once head is completed.

Related

bash: manager processes run

I have multiple files in $tmpdir/$i.dirlist with entries command rsync.
Each file have (depending on the amount) 10 sometimes 50 and even 150 entries of rsync.
I'm wondering now how to manage it by FOR or WHILE loop with IF sequence to run from each files ($tmpdir/$i.dirlist - if we have have 100 files) only 2 entries and wait for complet some processes and if total of all running process of rsync are less than 200 processes - launched new entries, maintaining a fixed number of processes defined in the parameter. In this case 200
Any idea? how to do it?
Edit:
about rsync entry.
In each file $tmpdir/*.dirlist is (in this example 200) entries with directory
path like:
==> /tmp/rsync.23611/0.dirlist <==
system/root/etc/ssl
system/root/etc/dbus-1
system/root/etc/lirc
system/root/etc/sysctl.d
==> /tmp/rsync.23611/1.dirlist <==
system/root/etc/binfmt.d
system/root/etc/cit
system/root/etc/gdb
==> /tmp/rsync.23611/2.dirlist <==
system/root/usr/local
system/root/usr/bin
system/root/usr/lib
now to run it i use simply for
for i in $(seq 1 $rsyncs); do
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done
with an example of use
for ARG in $*; do
command $ARG &
NPROC=$(($NPROC+1))
if [ "$NPROC" -ge 4 ]; then
wait
NPROC=0
fi
done
Assuming the maximum value of $i is 100, with your code above you are still below the maximum you want to allow of 200 processes.
So a solution would be to run twice as much processes. I suggest you to divide your main loop for i in $(seq 1 $rsyncs); do ... in two loops running concurrently, introduced by resp. for i in $(seq 1 2 $rsyncs); do ... for the odd values of $i, and for i in $(seq 2 2 $rsyncs); do ... for the even values of $i.
for i in $(seq 1 2 $rsyncs); do # i = 1 3 5 ...
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done & # added an ampersand here
for i in $(seq 2 2 $rsyncs); do # i = 2 4 6 ...
while read r; do
rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 |
tee $tmpdir/$i.dirlist.log ;
done < $tmpdir/$i.dirlist &
done
Edit: Since my approach above doesn't convince you, let us try something completely different. First, create a list of all the processes you want to run and store these in an array:
processes=() # create an empty bash array
for i in $(sed 1 $rsyncs); do
while read r; do
# add the full rsync command line to the array
processes+=("rsync $rsyncopts backup#$host:$remotepath/$ri $r 2>&1 | tee $tmpdir/$i.dirlist.log");
done < $tmpdir/$i.dirlist
done
Once you have that array, launch say 200 processes, and then enter a loop to wait a process to finish and launch the next one:
for ((j=0;j<200;j++)); do
$processes[$j]& # launch processes in background
done
while [ ! -z "$processes[$j]" ] ; do
wait # wait one process finishes
$processes[((j++))]& # launch one more process
done
Please try this and tell us.

Waiting for the last 10 parallel started jobs to finish in bash [duplicate]

This question already has answers here:
Parallelize Bash script with maximum number of processes
(16 answers)
Closed 5 years ago.
I have the following bash script, which starts a program several times in parallel and passes a control variable to each execution.
The program is utilizes several resources, so after it was started 10 times in parallel I want to wait till the last 10 started, are finished.
I am currently doing this very roughly, by just waiting after 10 iterations for the longest time possible that 10 parallel started programs are finished.
Is there a straight forward way to implement this behavior?
steps=$((500/20))
echo $steps
START=0
for((i=START;i < steps; i++))
do
for((j=START;j < steps;j++))
do
for((k=START;k < steps;k++))
do
n=$(($j*steps +$k))
idx=$(($i*$steps*$steps + $n))
if ! ((idx % 10)); then
echo "waiting for the last 10 programs"
sleep 10
else
./someprogram $idx &
fi
done
done
done
Well, since you already have a code in place to check the 10th iteration (idx % 10), the wait builtin seems perfect. From the docs:
wait: wait [-n] [id ...]
[...] Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero.
So, by waiting each time idx % 10 == 0, you are actually waiting for all previous child processes to finish. And if you are not spawning anything else than someprogram, then you'll be waiting for those (up to 10) last to finish.
Your script with wait:
#!/bin/bash
steps=$((500/20))
START=0
for ((i=START; i<steps; i++)); do
for ((j=START; j<steps; j++)); do
for ((k=START; k<steps; k++)); do
idx=$((i*steps*steps + j*steps + k))
if ! ((idx % 10)); then
wait
else
./someprogram $idx &
fi
done
done
done
Also, notice you don't have to use $var (dollar prefix) inside arithmetic expansion $((var+1)).
I'm assuming above your actual script does some additional processing before calling someprogram, but if all you need is to call someprogram on consecutive indexes, 10 instances at a time, you might consider using xargs or GNU parallel.
For example, with xargs:
seq 0 1000 | xargs -n1 -P10 ./someprogram
or, with additional arguments to someprogram:
seq 0 1000 | xargs -n1 -P10 -I{} ./someprogram --option --index='{}' someparam
With GNU parallel:
seq 0 1000 | parallel -P10 ./someprogram '{}'
seq 0 1000 | parallel -P10 ./someprogram --option --index='{}' someparam

How to loop argument in bash to call a function

I'd like to apologize if my question had already been asked, but english isn't my native language and I didn't find the answer. I'd like to have a bash script that executes a program I'll call MyProgram, and I want it to run with a fixed number of arguments which consist in random numbers. I'd like to have something like this:
./MyProgram for(i = 0; i < 1000; i++) $(($RANDOM%200-100))
How should I go about this?
You (mostly) just have the loop and the actual program call inverted.
for ((i=0; i < 1000; i++)); do
./MyProgram $((RANDOM%200 - 100))
done
If, however, you actually want 1000 different arguments passed to a single call, you have to build up a list first.
args=()
for ((i=0; i < 1000; i++)); do
args+=( $((RANDOM%200 - 100)) )
done
./MyProgram "${args[#]}"
The
$RANDOM % 200 - 100
is the same as the next perl
perl -E 'say int(200*rand() -100) for (1..1000)'
e.g. the
perl -E 'say int(200*rand() -100) for (1..1000)' | xargs -n1 ./MyProgram
will run like:
./MyProgram -10
./MyProgram 13
... 1000 times ...
./MyProgram 55
./MyProgram -31
if you need 1000 args
./MyProgram $(perl -E 'say int(200*rand() -100) for (1..1000)')
will produce
./MyProgram 5 -41 -81 -79 -14 ... 1000 numbers ... -63 -9 95 -9 -29
In addition to what #chepner says, you can also use the for ... in style of for loop. This looks like:
for a in one two three; do
echo "${a}"
done
which would produce the result:
one
two
three
In other words, the list of words after the in part, separated by spaces, is looped over, with each iteration of the loop having a different word in the variable a.
To call your program 1000 times (or just modify to produce the list of arguments to run it once as in #chepner's answer) you could then do:
for a in $(seq 1 1000); do
./MyProgram $((RANDOM%200 - 100))
done
where the output of the seq command is providing the list of values to loop over. Although the traditional for loop may be more immediately obvious to many programmers, I like for ... in because it can be applied in lots of situations. A crude and mostly pointless ls, for example:
for a in *; do
echo "${a}"
done
for ... in is probably the bit of "advanced" bash that I find the most useful, and make use of it very frequently.

How to make shell script non blocking?

Just a simple question:
Why following one-linear not working? How to make IO non blocking?
$ while true; do date; sleep 1; done | tail -f
The problem is not with non-blocking IO; it's with your choice of tail.
This prints out each line with a colon (all of them):
while true; do date; sleep 1; done | grep :
The problem with tail is that it goes to the last 10 ten lines and then starts following. But in your case, it never reaches the end, so it can't print the last ten.
If what you want to do is continually replace the last line with the date you could do this:
while true; do echo -en "\r"`date`; sleep 1; done

How to make a pipe loop in bash

Assume that I have programs P0, P1, ...P(n-1) for some n > 0. How can I easily redirect the output of program Pi to program P(i+1 mod n) for all i (0 <= i < n)?
For example, let's say I have a program square, which repeatedly reads a number and than prints the square of that number, and a program calc, which sometimes prints a number after which it expects to be able to read the square of it. How do I connect these programs such that whenever calc prints a number, square squares it returns it to calc?
Edit: I should probably clarify what I mean with "easily". The named pipe/fifo solution is one that indeed works (and I have used in the past), but it actually requires quite a bit of work to do properly if you compare it with using a bash pipe. (You need to get a not yet existing filename, make a pipe with that name, run the "pipe loop", clean up the named pipe.) Imagine you could no longer write prog1 | prog2 and would always have to use named pipes to connect programs.
I'm looking for something that is almost as easy as writing a "normal" pipe. For instance something like { prog1 | prog2 } >&0 would be great.
After spending quite some time yesterday trying to redirect stdout to stdin, I ended up with the following method. It isn't really nice, but I think I prefer it over the named pipe/fifo solution.
read | { P0 | ... | P(n-1); } >/dev/fd/0
The { ... } >/dev/fd/0 is to redirect stdout to stdin for the pipe sequence as a whole (i.e. it redirects the output of P(n-1) to the input of P0). Using >&0 or something similar does not work; this is probably because bash assumes 0 is read-only while it doesn't mind writing to /dev/fd/0.
The initial read-pipe is necessary because without it both the input and output file descriptor are the same pts device (at least on my system) and the redirect has no effect. (The pts device doesn't work as a pipe; writing to it puts things on your screen.) By making the input of the { ... } a normal pipe, the redirect has the desired effect.
To illustrate with my calc/square example:
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
read | { calc | square; } >/dev/fd/0
Running the above code gives the following output:
square(0) = 0
got 0
square(1) = 1
got 1
square(2) = 4
got 4
square(3) = 9
got 9
square(4) = 16
got 16
square(5) = 25
got 25
square(6) = 36
got 36
square(7) = 49
got 49
square(8) = 64
got 64
square(9) = 81
got 81
sum 285
Of course, this method is quite a bit of a hack. Especially the read part has an undesired side-effect: termination of the "real" pipe loop does not lead to termination of the whole. I couldn't think of anything better than read as it seems that you can only determine that the pipe loop has terminated by try to writing write something to it.
A named pipe might do it:
$ mkfifo outside
$ <outside calc | square >outside &
$ echo "1" >outside ## Trigger the loop to start
This is a very interesting question. I (vaguely) remember an assignment very similar in college 17 years ago. We had to create an array of pipes, where our code would get filehandles for the input/output of each pipe. Then the code would fork and close the unused filehandles.
I'm thinking you could do something similar with named pipes in bash. Use mknod or mkfifo to create a set of pipes with unique names you can reference then fork your program.
My solutions uses pipexec (Most of the function implementation comes from your answer):
square.sh
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
square $#
calc.sh
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
calc $#
The command
pipexec [ CALC /bin/bash calc.sh ] [ SQUARE /bin/bash square.sh ] \
"{CALC:1>SQUARE:0}" "{SQUARE:1>CALC:0}"
The output (same as in your answer)
square(0) = 0
got 0
square(1) = 1
got 1
square(2) = 4
got 4
square(3) = 9
got 9
square(4) = 16
got 16
square(5) = 25
got 25
square(6) = 36
got 36
square(7) = 49
got 49
square(8) = 64
got 64
square(9) = 81
got 81
sum 285
Comment: pipexec was designed to start processes and build arbitrary pipes in between. Because bash functions cannot be handled as processes, there is the need to have the functions in separate files and use a separate bash.
Named pipes.
Create a series of fifos, using mkfifo
i.e fifo0, fifo1
Then attach each process in term to the pipes you want:
processn < fifo(n-1) > fifon
I doubt sh/bash can do it.
ZSH would be a better bet, with its MULTIOS and coproc features.
A command stack can be composed as string from an array of arbitrary commands
and evaluated with eval. The following example gives the result 65536.
function square ()
{
read n
echo $((n*n))
} # ---------- end of function square ----------
declare -a commands=( 'echo 4' 'square' 'square' 'square' )
#-------------------------------------------------------------------------------
# build the command stack using pipes
#-------------------------------------------------------------------------------
declare stack=${commands[0]}
for (( COUNTER=1; COUNTER<${#commands[#]}; COUNTER++ )); do
stack="${stack} | ${commands[${COUNTER}]}"
done
#-------------------------------------------------------------------------------
# run the command stack
#-------------------------------------------------------------------------------
eval "$stack"

Resources