Parallel command inside for loop - Bash [duplicate] - bash

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?

This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)

If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output

Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

Related

bash: how may I improve for loop refered to the variable? [duplicate]

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?
If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done
Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done
Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...
If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.
Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.
A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world
Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done
for _ in {1..10}; do command; done
Note the underscore instead of using a variable.
If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here
xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.
For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst
xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world
All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$
I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done
You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done
Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.
For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.
The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$
A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.
How about the alternate form of for mentioned in (bashref)Looping Constructs?

trouble capturing output of a subshell that has been backgrounded

Attempting to make a "simple" parallel function in bash. The problem is currently that when the line to capture the output is backgrounded, the output is lost. If that line is not backgrounded, the output is captured fine, but this of course defeats the purpose of the function.
#!/usr/bin/env bash
cluster="${1:-web100s}"
hosts=($(inventory.pl bash "$cluster" | sort -V))
cmds="${2:-uptime}"
parallel=10
cx=0
total=0
for host in "${hosts[#]}"; do
output[$total]=$(echo -en "$host: ")
echo "${output[$total]}"
output[$total]+=$(ssh -o ConnectTimeout=5 "$host" "$cmds") &
cx=$((cx + 1))
total=$((total + 1))
if [[ $cx -gt $parallel ]]; then
wait >&/dev/null
cx=0
fi
done
echo -en "***** DONE *****\n Results\n"
for ((i=0; i<= $total; i++)); do
echo "${output[$i]}"
done
That's because your command (the assignment) is run in a subshell, so this assignment can't influence the parent shell. This boils down to this:
a=something
a='hello senorsmile' &
echo "$a"
Can you guess what the output is? the output is, of course,
something
and not hello senorsmile. The only way for the subshell to communicate with the parent shell is to use an IPC (interprocess communication), in one form or another. I don't have any solution to propose, I only tried to explain why it fails.
If you think of it, it should make sense. What do you think of this?
a=$( echo a; sleep 1000000000; echo b ) &
The command immediately returns (after forking)... but the output is only going to be fully available in... over 31 years.
Assigning a shell variable in the background this way is effectively meaningless. Bash does have built in co-processing which should work for you:
http://www.gnu.org/software/bash/manual/bashref.html#Coprocesses

How to run given function in Bash in parallel?

There have been some similar questions, but my problem is not "run several programs in parallel" - which can be trivially done with parallel or xargs.
I need to parallelize Bash functions.
Let's imagine code like this:
for i in "${list[#]}"
do
for j in "${other[#]}"
do
# some processing in here - 20-30 lines of almost pure bash
done
done
Some of the processing requires calls to external programs.
I'd like to run some (4-10) tasks, each running for different $i. Total number of elements in $list is > 500.
I know I can put the whole for j ... done loop in external script, and just call this program in parallel, but is it possible to do without splitting the functionality between two separate programs?
sem is part of GNU Parallel and is made for this kind of situation.
for i in "${list[#]}"
do
for j in "${other[#]}"
do
# some processing in here - 20-30 lines of almost pure bash
sem -j 4 dolong task
done
done
If you like the function better GNU Parallel can do the dual for loop in one go:
dowork() {
echo "Starting i=$1, j=$2"
sleep 5
echo "Done i=$1, j=$2"
}
export -f dowork
parallel dowork ::: "${list[#]}" ::: "${other[#]}"
Edit: Please consider Ole's answer instead.
Instead of a separate script, you can put your code in a separate bash function. You can then export it, and run it via xargs:
#!/bin/bash
dowork() {
sleep $((RANDOM % 10 + 1))
echo "Processing i=$1, j=$2"
}
export -f dowork
for i in "${list[#]}"
do
for j in "${other[#]}"
do
printf "%s\0%s\0" "$i" "$j"
done
done | xargs -0 -n 2 -P 4 bash -c 'dowork "$#"' --
An efficient solution that can also run multi-line commands in parallel:
for ...your_loop...; do
if test "$(jobs | wc -l)" -ge 8; then
wait -n
fi
{
command1
command2
...
} &
done
wait
In your case:
for i in "${list[#]}"
do
for j in "${other[#]}"
do
if test "$(jobs | wc -l)" -ge 8; then
wait -n
fi
{
your
commands
here
} &
done
done
wait
If there are 8 bash jobs already running, wait will wait for at least one job to complete. If/when there are less jobs, it starts new ones asynchronously.
Benefits of this approach:
It's very easy for multi-line commands. All your variables are automatically "captured" in scope, no need to pass them around as arguments
It's relatively fast. Compare this, for example, to parallel (I'm quoting official man):
parallel is slow at starting up - around 250 ms the first time and 150 ms after that.
Only needs bash to work.
Downsides:
There is a possibility that there were 8 jobs when we counted them, but less when we started waiting. (It happens if a jobs finishes in those milliseconds between the two commands.) This can make us wait with fewer jobs than required. However, it will resume when at least one job completes, or immediately if there are 0 jobs running (wait -n exits immediately in this case).
If you already have some commands running asynchronously (&) within the same bash script, you'll have fewer worker processes in the loop.

Set variables in parallel in bash

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?
This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)
If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output
Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

Is there a better way to run a command N times in bash?

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?
If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done
Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done
Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...
If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.
Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.
A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world
Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done
for _ in {1..10}; do command; done
Note the underscore instead of using a variable.
If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here
xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.
For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst
xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world
All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$
I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done
You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done
Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.
For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.
The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$
A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.
How about the alternate form of for mentioned in (bashref)Looping Constructs?

Resources