Set variables in parallel in bash

Set variables in parallel in bash - bash

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?

This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)

If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output

Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

Related

How can I save environment variables in a file using BASH? [duplicate]

I have two shell scripts that I'd like to invoke from a C program. I would like shell variables set in the first script to be visible in the second. Here's what it would look like:
a.sh:
var=blah
<save vars>
b.sh:
<restore vars>
echo $var
The best I've come up with so far is a variant on "set > /tmp/vars" to save the variables and "eval $(cat /tmp/vars)" to restore them. The "eval" chokes when it tries to restore a read-only variable, so I need to grep those out. A list of these variables is available via "declare -r". But there are some vars which don't show up in this list, yet still can't be set in eval, e.g. BASH_ARGC. So I need to grep those out, too.
At this point, my solution feels very brittle and error-prone, and I'm not sure how portable it is. Is there a better way to do this?

One way to avoid setting problematic variables is by storing only those which have changed during the execution of each script. For example,
a.sh:
set > /tmp/pre
foo=bar
set > /tmp/post
grep -v -F -f/tmp/pre /tmp/post > /tmp/vars
b.sh:
eval $(cat /tmp/vars)
echo $foo
/tmp/vars contains this:
PIPESTATUS=([0]="0")
_=
foo=bar
Evidently evaling the first two lines has no adverse effect.

If you can use a common prefix on your variable names, here is one way to do it:
# save the variables
yourprefix_width=1200
yourprefix_height=2150
yourprefix_length=1975
yourprefix_material=gravel
yourprefix_customer_array=("Acme Plumbing" "123 Main" "Anytown")
declare -p $(echo ${!yourprefix#}) > varfile
# load the variables
while read -r line
do
if [[ $line == declare\ * ]]
then
eval "$line"
fi
done < varfile
Of course, your prefix will be shorter. You could do further validation upon loading the variables to make sure that the variable names conform to your naming scheme.
The advantage of using declare is that it is more secure than just using eval by itself.
If you need to, you can filter out variables that are marked as readonly or select variables that are marked for export.
Other commands of interest (some may vary by Bash version):
export - without arguments, lists all exported variables using a declare format
declare -px - same as the previous command
declare -pr - lists readonly variables

If it's possible for a.sh to call b.sh, it will carry over if they're exported. Or having a parent set all the values necessary and then call both. That's the most secure and sure method I can think of.
Not sure if it's accepted dogma, but:
bash -c 'export foo=bar; env > xxxx'
env `cat xxxx` otherscript.sh
The otherscript will have the env printed to xxxx ...
Update:
Also note:
man execle
On how to set environment variables for another system call from within C, if you need to do that. And:
man getenv
and http://www.crasseux.com/books/ctutorial/Environment-variables.html

An alternative to saving and restoring shell state would be to make the C program and the shell program work in parallel: the C program starts the shell program, which runs a.sh, then notifies the C program (perhaps passing some information it's learned from executing a.sh), and when the C program is ready for more it tells the shell program to run b.sh. The shell program would look like this:
. a.sh
echo "information gleaned from a"
arguments_for_b=$(read -r)
. b.sh
And the general structure of the C program would be:
set up two pairs of pipes, one for C->shell and one for shell->C
fork, exec the shell wrapper
read information gleaned from a on the shell->C pipe
more processing
write arguments for b on the C->shell pipe
wait for child process to end

I went looking for something similar and couldn't find it either, so I made the two scripts below. To start, just say shellstate, then probably at least set -i and set -o emacs which this reset_shellstate doesn't do for you. I don't know a way to ask bash which variables it thinks are special.
~/bin/reset_shellstate:
#!/bin/bash
__="$PWD/shellstate_${1#_}"
trap '
declare -p >"'"$__"'"
trap >>"'"$__"'"
echo cd \""$PWD"\" >>"'"$__"'" # setting PWD did this already, but...
echo set +abefhikmnptuvxBCEHPT >>"'"$__"'"
echo set -$- >>"'"$__"'" # must be last before sed, see $s/s//2 below
sed -ri '\''
$s/s//2
s,^trap --,trap,
/^declare -[^ ]*r/d
/^declare -[^ ]* [A-Za-z0-9_]*[^A-Za-z0-9_=]/d
/^declare -[^ ]* [^= ]*_SESSION_/d
/^declare -[^ ]* BASH[=_]/d
/^declare -[^ ]* (DISPLAY|GROUPS|SHLVL|XAUTHORITY)=/d
/^declare -[^ ]* WINDOW(ID|PATH)=/d
'\'' "'"$__"'"
shopt -op >>"'"$__"'"
shopt -p >>"'"$__"'"
declare -f >>"'"$__"'"
echo "Shell state saved in '"$__"'"
' 0
unset __
~/bin/shellstate:
#!/bin/bash
shellstate=shellstate_${1#_}
test -s $shellstate || reset_shellstate $1
shift
bash --noprofile --init-file shellstate_${1#_} -is "$#"
exit $?

bash: how may I improve for loop refered to the variable? [duplicate]

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?

If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done

Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done

Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...

If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.

Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.

A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world

Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done

for _ in {1..10}; do command; done
Note the underscore instead of using a variable.

If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here

xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.

For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst

xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world

All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$

I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done

You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done

Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.

For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.

The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$

A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.

How about the alternate form of for mentioned in (bashref)Looping Constructs?

Parallel command inside for loop - Bash [duplicate]

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?

This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)

If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output

Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

trouble capturing output of a subshell that has been backgrounded

Attempting to make a "simple" parallel function in bash. The problem is currently that when the line to capture the output is backgrounded, the output is lost. If that line is not backgrounded, the output is captured fine, but this of course defeats the purpose of the function.
#!/usr/bin/env bash
cluster="${1:-web100s}"
hosts=($(inventory.pl bash "$cluster" | sort -V))
cmds="${2:-uptime}"
parallel=10
cx=0
total=0
for host in "${hosts[#]}"; do
output[$total]=$(echo -en "$host: ")
echo "${output[$total]}"
output[$total]+=$(ssh -o ConnectTimeout=5 "$host" "$cmds") &
cx=$((cx + 1))
total=$((total + 1))
if [[ $cx -gt $parallel ]]; then
wait >&/dev/null
cx=0
fi
done
echo -en "***** DONE *****\n Results\n"
for ((i=0; i<= $total; i++)); do
echo "${output[$i]}"
done

That's because your command (the assignment) is run in a subshell, so this assignment can't influence the parent shell. This boils down to this:
a=something
a='hello senorsmile' &
echo "$a"
Can you guess what the output is? the output is, of course,
something
and not hello senorsmile. The only way for the subshell to communicate with the parent shell is to use an IPC (interprocess communication), in one form or another. I don't have any solution to propose, I only tried to explain why it fails.
If you think of it, it should make sense. What do you think of this?
a=$( echo a; sleep 1000000000; echo b ) &
The command immediately returns (after forking)... but the output is only going to be fully available in... over 31 years.

Assigning a shell variable in the background this way is effectively meaningless. Bash does have built in co-processing which should work for you:
http://www.gnu.org/software/bash/manual/bashref.html#Coprocesses

Is there a better way to run a command N times in bash?

I occasionally run a bash command line like this:
n=0; while [[ $n -lt 10 ]]; do some_command; n=$((n+1)); done
To run some_command a number of times in a row -- 10 times in this case.
Often some_command is really a chain of commands or a pipeline.
Is there a more concise way to do this?

If your range has a variable, use seq, like this:
count=10
for i in $(seq $count); do
command
done
Simply:
for run in {1..10}; do
command
done
Or as a one-liner, for those that want to copy and paste easily:
for run in {1..10}; do command; done

Using a constant:
for ((n=0;n<10;n++)); do
some_command;
done
Using a variable (can include math expressions):
x=10; for ((n=0; n < (x / 2); n++)); do some_command; done

Another simple way to hack it:
seq 20 | xargs -Iz echo "Hi there"
run echo 20 times.
Notice that seq 20 | xargs -Iz echo "Hi there z" would output:
Hi there 1
Hi there 2
...

If you're using the zsh shell:
repeat 10 { echo 'Hello' }
Where 10 is the number of times the command will be repeated.

Using GNU Parallel you can do:
parallel some_command ::: {1..1000}
If you do not want the number as argument and only run a single job at a time:
parallel -j1 -N0 some_command ::: {1..1000}
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html). You command line
with love you for it.

A simple function in the bash config file (~/.bashrc often) could work well.
function runx() {
for ((n=0;n<$1;n++))
do ${*:2}
done
}
Call it like this.
$ runx 3 echo 'Hello world'
Hello world
Hello world
Hello world

Another form of your example:
n=0; while (( n++ < 10 )); do some_command; done

for _ in {1..10}; do command; done
Note the underscore instead of using a variable.

If you are OK doing it periodically, you could run the following command to run it every 1 sec indefinitely. You can put other custom checks in place to run it n number of times.
watch -n 1 some_command
If you wish to have visual confirmation of changes, append --differences prior to the ls command.
According to the OSX man page, there's also
The --cumulative option makes highlighting "sticky", presenting a
running display of all positions that have ever changed. The -t
or --no-title option turns off the header showing the interval,
command, and current time at the top of the display, as well as the
following blank line.
Linux/Unix man page can be found here

xargs is fast:
#!/usr/bin/bash
echo "while loop:"
n=0; time while (( n++ < 10000 )); do /usr/bin/true ; done
echo -e "\nfor loop:"
time for ((n=0;n<10000;n++)); do /usr/bin/true ; done
echo -e "\nseq,xargs:"
time seq 10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nyes,xargs:"
time yes x | head -n10000 | xargs -I{} -P1 -n1 /usr/bin/true
echo -e "\nparallel:"
time parallel --will-cite -j1 -N0 /usr/bin/true ::: {1..10000}
On a modern 64-bit Linux, gives:
while loop:
real 0m2.282s
user 0m0.177s
sys 0m0.413s
for loop:
real 0m2.559s
user 0m0.393s
sys 0m0.500s
seq,xargs:
real 0m1.728s
user 0m0.013s
sys 0m0.217s
yes,xargs:
real 0m1.723s
user 0m0.013s
sys 0m0.223s
parallel:
real 0m26.271s
user 0m4.943s
sys 0m3.533s
This makes sense, as the xargs command is a single native process that spawns the /usr/bin/true command multiple time, instead of the for and while loops that are all interpreted in Bash. Of course this only works for a single command; if you need to do multiple commands in each iteration the loop, it will be just as fast, or maybe faster, than passing sh -c 'command1; command2; ...' to xargs
The -P1 could also be changed to, say, -P8 to spawn 8 processes in parallel to get another big boost in speed.
I don't know why GNU parallel is so slow. I would have thought it would be comparable to xargs.

For one, you can wrap it up in a function:
function manytimes {
n=0
times=$1
shift
while [[ $n -lt $times ]]; do
$#
n=$((n+1))
done
}
Call it like:
$ manytimes 3 echo "test" | tr 'e' 'E'
tEst
tEst
tEst

xargs and seq will help
function __run_times { seq 1 $1| { shift; xargs -i -- "$#"; } }
the view :
abon#abon:~$ __run_times 3 echo hello world
hello world
hello world
hello world

All of the existing answers appear to require bash, and don't work with a standard BSD UNIX /bin/sh (e.g., ksh on OpenBSD).
The below code should work on any BSD:
$ echo {1..4}
{1..4}
$ seq 4
sh: seq: not found
$ for i in $(jot 4); do echo e$i; done
e1
e2
e3
e4
$

I solved with this loop, where repeat is an integer that represents the loops's number
repeat=10
for n in $(seq $repeat);
do
command1
command2
done

You can use this command to repeat your command 10 times or more
for i in {1..10}; do **your command**; done
for example
for i in {1..10}; do **speedtest**; done

Yet another answer: Use parameter expansion on empty parameters:
# calls curl 4 times
curl -s -w "\n" -X GET "http:{,,,}//www.google.com"
Tested on Centos 7 and MacOS.

For loops are probably the right way to do it, but here is a fun alternative:
echo -e {1..10}"\n" |xargs -n1 some_command
If you need the iteration number as a parameter for your invocation, use:
echo -e {1..10}"\n" |xargs -I# echo now I am running iteration #
Edit: It was rightly commented that the solution given above would work smoothly only with simple command runs (no pipes, etc.). you can always use a sh -c to do more complicated stuff, but not worth it.
Another method I use typically is the following function:
rep() { s=$1;shift;e=$1;shift; for x in `seq $s $e`; do c=${#//#/$x};sh -c "$c"; done;}
now you can call it as:
rep 3 10 echo iteration #
The first two numbers give the range. The # will get translated to the iteration number. Now you can use this with pipes too:
rep 1 10 "ls R#/|wc -l"
with give you the number of files in directories R1 .. R10.

The script file
bash-3.2$ cat test.sh
#!/bin/bash
echo "The argument is arg: $1"
for ((n=0;n<$1;n++));
do
echo "Hi"
done
and the output below
bash-3.2$ ./test.sh 3
The argument is arg: 3
Hi
Hi
Hi
bash-3.2$

A little bit naive but this is what I usually remember off the top of my head:
for i in 1 2 3; do
some commands
done
Very similar to #joe-koberg's answer. His is better especially if you need many repetitions, just harder for me to remember other syntax because in last years I'm not using bash a lot. I mean not for scripting at least.

How about the alternate form of for mentioned in (bashref)Looping Constructs?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Set variables in parallel in bash - bash

If you want to avoid writing files, you can use GNU parallel: #!/bin/bash output=(`parallel -k --gnu echo {1} ::: {1..5}`) for i in ${output[#]} do echo $i done The -k is to preserve the order of the output

Use parset from GNU Parallel: #!/bin/bash typeset -A output parset output echo {} ::: {1..5} for x in {1..5} do echo ${output[$x]} done

Related

How can I save environment variables in a file using BASH? [duplicate]

bash: how may I improve for loop refered to the variable? [duplicate]

Parallel command inside for loop - Bash [duplicate]

trouble capturing output of a subshell that has been backgrounded

Is there a better way to run a command N times in bash?

Categories

Resources