Using GNU parallel to parallelize script with various arguments - parallel-processing

I am interested in running a script in parallel and I've started looking over GNU parallel tool, however I'm having a bit of trouble. My script doSomething takes 3 arguments and I would like to run the script in parallel on different values for the arguments. How can I do this?
I've tried: parallel ./doSomething {1} {2} {3} ::: {0..5} ::: {0..5} ::: {0..5} but it just seems to hang.
Any help would be greatly appreciated, thanks!

Please try:
parallel --gnu echo ./doSomething {1} {2} {3} ::: {0..5} ::: {0..5} ::: {0..5}
If that works as expected, then your command is blocking because ./doSomething behaves differently when called from GNU Parallel than when it is called directly. One of the reasons why that may happen is if ./doSomething depends on having an tty connected.

Related

Correct order of parallel execution of shell `time` command

I need to execute the command below (as part of a script) but I don't know in what order to put things so that it executes correctly. What I am trying to do is to give file.smt2 as input to optimathsat, execute it, get the execution time. But I want this to be done several times in parallel using all CPU cores.
parallel -j+0 time Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < file.smt2 &>results.csv
I added #!/bin/bash -x at the beginning of my file to look at what is happening and this was the output:
+ parallel -j+0 time file.smt2
parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot.
parallel: Warning: ::: or :::: or -a or to pipe data into parallel.
...from the 1st line, I can tell the order is wrong. From line 2,3 and 4 the syntax is lacking. How can I fix this?
So I take it you do not care about the results, but only the timing:
seq $(parallel --number-of-threads) |
parallel -j+0 -N0 --joblog my.log 'Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < file.smt2'
cat my.log
-N0 inserts 0 arguments.
Consider reading GNU Parallel 2018 (printed, online) - at least chapter 1+2. Your command line will thank you for it.

How do I use GNU parallel retries feature without passing any parameters?

I like the feature
parallel -q --retries 5 ./myprogram
But GNU parallel doesn't seem to work unless I pass it a set of args. So I have do something like this
seq 1 | parallel -q --retries 5 ./myprogram
Is there a way to tell GNU Parallel I don't want to pass it args, and just want to use it as a wrapper for retries?
Is there a bash way to do retries 5 without doing a bash for loop testing exit code?
You clearly know you are abusing GNU Parallel :) and thus should not be surprised if there is no elegant way of doing it.
One way to do it is to use -N0
parallel -N0 -q --retries 5 ./myprogram ::: dummy

Parallel command inside for loop - Bash [duplicate]

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?
This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)
If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output
Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

How to pass part of an argument to a gnu parallel command

I'm trying to run a GNU parallel command and pass it a bunch of dates, something like this but a more complex command:
parallel '/some/binary {}' ::: 20131017 20131018
this works, but then i need the dates to span two different months and the command should look like this for argument 20131018:
'/some/binary 201310/20131018'
so it split off the first part of the argument..how can I achieve this effect? Thinking in terms of bash variables I imagine:
'/some/binary {:4}/{}' ::: 20130910 20131018 etc...
The command for parallel is interpreted as a shell command, so you can just do
parallel --gnu 'var="{}"; /some/binary "${var:0:6}/$var"' ::: 20131017 20131018
This will execute
/some/binary 201310/20131017
/some/binary 201310/20131018
From 20140722 you can:
parallel /some/binary '{=s/..$//=}'/{} ::: 20131017 20131018
For pure ugly, don't forget about awk to munge data with the result piped to parallel:
$ echo 20131017 > foo
$ echo 20131018 >> foo
$ awk '{printf "%s/%s\n", substr($1,0,4), $1}' foo | parallel echo
Ugly aside, this is pipeline friendly. A plain print along with some OFS magic would work more cleanly than printf I suspect. You could alternately use sed if that's your jam.
That said, I'd personally modify /some/binary to not expect such wonky input.

Set variables in parallel in bash

Here's an example program:
#!/bin/bash
for x in {1..5}
do
output[$x]=$(echo $x) &
done
wait
for x in {1..5}
do
echo ${output[$x]}
done
I would expect this to run and print out the values assigned to each member of the output array, but it prints nothing. Removing the & correctly assigns the variables. Must I use different syntax to achieve this in parallel?
This
output[$x]=$(echo $x) &
puts the whole assignment in a background task (sub-process) and that's why you're not seeing the result, since it's not propogated to the parent process.
You can use wait to wait for subprocesses, but returning results (other than status codes) is going to be difficult. Perhaps you can write intermediate results to a file, and collect those results after all processes have finished ? (not nice, I appreciate)
If you want to avoid writing files, you can use GNU parallel:
#!/bin/bash
output=(`parallel -k --gnu echo {1} ::: {1..5}`)
for i in ${output[#]}
do
echo $i
done
The -k is to preserve the order of the output
Use parset from GNU Parallel:
#!/bin/bash
typeset -A output
parset output echo {} ::: {1..5}
for x in {1..5}
do
echo ${output[$x]}
done

Resources