Use bash wait in for-loop [duplicate] - bash

This question already has answers here:
how to write a process-pool bash shell
(14 answers)
How to run a fixed number of processes in a loop?
(4 answers)
What simple mechanism for synchronous Unix pooled processes?
(1 answer)
Closed 4 years ago.
(I have searched and expected this question to have been asked before but couldn't find anything like this although there are plenty of similar questions)
I want this for-loop to run in 3 different threads/processes and wait seem to be the right command
for file in 1.txt 2.txt 3.text 4.txt 5.txt
do something lengthy &
i=$((i + 1))
wait $!
done
But this construct, I guess, just starts one thread and then wait until it is done before it starts the next thread. I could place wait outside the loop but how do I then
Access the pids?
Limit it to 3 threads?

The jobs builtin can list the currently running background jobs, so you can use that to limit how many you create. To limit your jobs to three, try something like this:
for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
if [ $(jobs -r | wc -l) -ge 3 ]; then
wait $(jobs -r -p | head -1)
fi
# Start a slow background job here:
(echo Begin processing $file; sleep 10; echo Done with $file)&
done
wait # wait for the last jobs to finish

The GNU Parallel might be worth a look.
My first attempt,
parallel -j 3 'bash -c "sleep {}; echo {};"' ::: 4 1 2 5 3
can be, according to the inventor of parallel, be shortened to
parallel -j3 sleep {}\; echo {} ::: 4 1 2 5 3
1
2
4
3
5
and masking the semicolon, more friendly to type, like this:
parallel -j3 sleep {}";" echo {} ::: 4 1 2 5 3
works too.
It doesn't look trivial and I only tested it 2 times so far, once to answer this question. parallel --help shows a source where there is more info, the man page is a little bit shocking. :)
parallel -j 3 "something lengthy {}" ::: {1..5}.txt
might work, depending on something lengthy being a program (fine) or just bashcode (afaik, you can't just call a bash function in parallel with parallel).
On xUbuntu-Linux 16.04, parallel wasn't installed but in the repo.

Building on Rob Davis' answer:
#!/bin/bash
qty=3
for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
while [ `jobs -r | wc -l` -ge $qty ]; do
sleep 1
# jobs #(if you want an update every second on what is running)
done
echo -n "Begin processing $file"
something_lengthy $file &
echo $!
done
wait

You can use a subshell approach example
( (sleep 10) &
p1=$!
(sleep 20) &
p2=$!
(sleep 15) &
p3=$!
wait
echo "all finished ..." )
Note wait call wait for all child inside a subshell, you can use modulo operator (%) with 3 and use the reminder to check for 1st 2nd and 3rd process id (if needed) or can use it to run 3 parallel thread.
Hope this helps.

Related

bash: how may I improve for loop refered to the variable? [duplicate]

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?
If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done
Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done
Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...
If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.
Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.
A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world
Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done
for _ in {1..10}; do command; done
Note the underscore instead of using a variable.
If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here
xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.
For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst
xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world
All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$
I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done
You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done
Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.
For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.
The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$
A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.
How about the alternate form of for mentioned in (bashref)Looping Constructs?

Parallel runs in N-process batches in BASH

I have a for loop and I want to process it 4 times in parallel at a time.
I tried the following code from the page https://unix.stackexchange.com/questions/103920/parallelize-a-bash-for-loop:
task(){
sleep 0.5; echo "$1";
}
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
I have stored the above file as test.sh, the output I get it is as follows:
path$ ./test.sh
a
b
c
d
path$ e
f
g
and the cursor doesn't come back to my terminal after 'g', it waits/ sleeps indefinitely.I want the cursor come back to my terminal and also I don't understand why the output 'e' has my path preceding it, shouldn't the output be displayed as 'a' to 'g' continuously and the code should stop?
It's pretty hard to understand what you want, but I think you want to do 7 things, called a,b,c...g in parallel, no more than 4 instances at a time.
If so, you could try this:
echo {a..g} | xargs -P4 -n1 bash -c 'echo "$1"; sleep 2' {}
That sends the letters a..g into xargs which then starts a new bash shell for each letter, passing one letter (-n1) to the shell as {}. The bash shell picks up the parameter (its first parameter being $1) and echoes it then waits 2 seconds before exiting - so you can see the pause.
The -P4 tells xargs to run 4 instances of bash at a time in parallel.
Here is a little video of it running. The first one uses -P4 and runs in groups of 4, the second sequence uses -P2 and does 2 at a time:
Or, more simply, if you don't mind spending 10 seconds installing GNU Parallel:
parallel -j4 -k 'echo {}; sleep 2' ::: {a..g}
If you press enter toy can see than you are in normal shell. If you want to wait batched process before exit the script just add wait at the end of script

How to run given function in Bash in parallel?

There have been some similar questions, but my problem is not "run several programs in parallel" - which can be trivially done with parallel or xargs.
I need to parallelize Bash functions.
Let's imagine code like this:
for i in "${list[#]}"
do
for j in "${other[#]}"
do
# some processing in here - 20-30 lines of almost pure bash
done
done
Some of the processing requires calls to external programs.
I'd like to run some (4-10) tasks, each running for different $i. Total number of elements in $list is > 500.
I know I can put the whole for j ... done loop in external script, and just call this program in parallel, but is it possible to do without splitting the functionality between two separate programs?
sem is part of GNU Parallel and is made for this kind of situation.
for i in "${list[#]}"
do
for j in "${other[#]}"
do
# some processing in here - 20-30 lines of almost pure bash
sem -j 4 dolong task
done
done
If you like the function better GNU Parallel can do the dual for loop in one go:
dowork() {
echo "Starting i=$1, j=$2"
sleep 5
echo "Done i=$1, j=$2"
}
export -f dowork
parallel dowork ::: "${list[#]}" ::: "${other[#]}"
Edit: Please consider Ole's answer instead.
Instead of a separate script, you can put your code in a separate bash function. You can then export it, and run it via xargs:
#!/bin/bash
dowork() {
sleep $((RANDOM % 10 + 1))
echo "Processing i=$1, j=$2"
}
export -f dowork
for i in "${list[#]}"
do
for j in "${other[#]}"
do
printf "%s\0%s\0" "$i" "$j"
done
done | xargs -0 -n 2 -P 4 bash -c 'dowork "$#"' --
An efficient solution that can also run multi-line commands in parallel:
for ...your_loop...; do
if test "$(jobs | wc -l)" -ge 8; then
wait -n
fi
{
command1
command2
...
} &
done
wait
In your case:
for i in "${list[#]}"
do
for j in "${other[#]}"
do
if test "$(jobs | wc -l)" -ge 8; then
wait -n
fi
{
your
commands
here
} &
done
done
wait
If there are 8 bash jobs already running, wait will wait for at least one job to complete. If/when there are less jobs, it starts new ones asynchronously.
Benefits of this approach:
It's very easy for multi-line commands. All your variables are automatically "captured" in scope, no need to pass them around as arguments
It's relatively fast. Compare this, for example, to parallel (I'm quoting official man):
parallel is slow at starting up - around 250 ms the first time and 150 ms after that.
Only needs bash to work.
Downsides:
There is a possibility that there were 8 jobs when we counted them, but less when we started waiting. (It happens if a jobs finishes in those milliseconds between the two commands.) This can make us wait with fewer jobs than required. However, it will resume when at least one job completes, or immediately if there are 0 jobs running (wait -n exits immediately in this case).
If you already have some commands running asynchronously (&) within the same bash script, you'll have fewer worker processes in the loop.

Parallel processing or threading in Shell scripting

I am writing a script in shell in which a command is running and taking 2 min. everytime. Also, there is nothing we can do with this. But if i want to run this command 100 times in script then total time would be 200min. and this will create a big issue. Nobody want to wait for 200min. What i want is to run all 100 commands parallely so that output will come in 2min or may be some more time but dont take 200min.
it will be appreciated, if any body can help me on this in any way.
GNU Parallel is what you want, unless you want to reinvent the wheel. Here are some more detailed examples, but the short of it:
ls | parallel gzip # gzip all files in a directory
... run all 100 commands parallely so that output will come in 2min
This is only possible if you have 200 processors on your system.
There's no such utility/command in shell script to run commands in parallel. What you can do is run your command in background:
for ((i=0;i<200;i++))
do
MyCommand &
done
With & (background), each execution is scheduled as soon as possible. But this doesn't guarantee that your code will be executed in less 200 min. It depends how many processors are there on your system.
If you have only one processor and each execution of the command (that takes 2min) is doing some computation for 2 min, then processor is doing some work, meaning there's no cycles wasted. In this case, running the commands in parallel is not going help because, there's only one processor which is also not free. So, the processes will be just waiting for their turn to be executed.
If you have more than one processors, then the above method (for loop) might help in reducing the total execution time.
As #KingsIndian said, you can background tasks, which sort of lets them run in parallel. Beyond this, you can also keep track of them by process ID:
#!/bin/bash
# Function to be backgrounded
track() {
sleep $1
printf "\nFinished: %d\n" "$1"
}
start=$(date '+%s')
rand3="$(jot -s\ -r 3 5 10)"
# If you don't have `jot` (*BSD/OSX), substitute your own numbers here.
#rand3="5 8 10"
echo "Random numbers: $rand3"
# Make an associative array in which you'll record pids.
declare -A pids
# Background an instance of the track() function for each number, record the pid.
for n in $rand3; do
track $n &
pid=$!
echo "Backgrounded: $n (pid=$pid)"
pids[$pid]=$n
done
# Watch your stable of backgrounded processes.
# If a pid goes away, remove it from the array.
while [ -n "${pids[*]}" ]; do
sleep 1
for pid in "${!pids[#]}"; do
if ! ps "$pid" >/dev/null; then
unset pids[$pid]
echo "unset: $pid"
fi
done
if [ -z "${!pids[*]}" ]; then
break
fi
printf "\rStill waiting for: %s ... " "${pids[*]}"
done
printf "\r%-25s \n" "Done."
printf "Total runtime: %d seconds\n" "$((`date '+%s'` - $start))"
You should also take a look at the Bash documentation on coprocesses.

Is there a better way to run a command N times in bash?

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?
If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done
Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done
Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...
If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.
Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.
A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world
Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done
for _ in {1..10}; do command; done
Note the underscore instead of using a variable.
If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here
xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.
For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst
xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world
All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$
I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done
You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done
Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.
For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.
The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$
A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.
How about the alternate form of for mentioned in (bashref)Looping Constructs?

Resources