parallel execution with a fixed order - macos

#!/bin/bash
doone() {
tracelength="$1"
short="$2"
long="$3"
ratio="$4"
echo "$tracelength $short $long $ratio" >> results.csv
python3 main.py "$tracelength" "$short" "$long" "$ratio" >> file.smt2
gtime -f "%U" /Users/Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < file.smt2
}
export -f doone
step=0.1
parallel doone \
::: 200 300 \
:::: <(seq 0 $step 0.2) \
::::+ <(seq 1 -$step 0.8) \
:::: <(seq 0 $step 0.1) \
::: {1..2} &> results.csv
I need the data given in the results.csv to be in order. Every job prints its inputs which are the 3 variable mentioned at the beginning : $tracelength, $short, $long and $ratio, and then the associated execution time of that job; all in one line. So far my results look something like this:
0.00
0.00
0.00
0.00
200 0 1 0
200 0 1 0.1
200 0.1 0.9 0
how can I fix the order? and why is the execution time always 0.00? file.smt2 is a big file, and in no way can the execution time be 0.00.

It is really a bad idea to append to the same file in parallel. You are going to have race conditions all over the place.
You are doing that with both results.csv and file.smt2.
So if you write to a file in doone make sure it is has a unique name (e.g. by using myfile.$$).
To see if race conditions are your problem, you can make GNU Parallel run one job at a time: parallel --jobs 1.
If the problem goes away by that, then you can probably get away with:
doone() {
tracelength="$1"
short="$2"
long="$3"
ratio="$4"
# No >> is needed here, as all output is sent to results.csv
echo "$tracelength $short $long $ratio"
tmpfile=file.smt.$$
cp file.smt2 $tmpfile
python3 main.py "$tracelength" "$short" "$long" "$ratio" >> $tmpfile
# Be aware that the output from gtime and optimathsat will be put into results.csv - making results.csv not a CSV-file
gtime -f "%U" /Users/Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < $tmpfile
rm $tmpfile
}
If results.csv is just a log file, consider using parallel --joblog my.log instead.
If the problem does not go away by that, then your problem is elsewhere. In that case make an MCVE (https://stackoverflow.com/help/mcve): Your example is not complete as you refer to file.smt2 and optimathsat without providing them, thus we cannot run your example.

Related

parralelizing a part of a script shell

#!/bin/bash
for tracelength in 10 20 50 100 ; do
step=0.1
short=0
long=1
for firstloop in {1..10}; do
ratio=0
for secondloop in {1..10} ; do
for repeat in {1..20} ; do
echo $tracelength $short $long $ratio >results.csv
python3 main.py "$tracelength" "$short" "$long" "$ratio" > file.smt2
/usr/bin/time /Users/Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < file.smt2 > results.csv
done
ratio=$(echo "scale=10; $ratio + $step" | bc)
done
short=$(echo "scale=10; $short + $step" | bc)
long=$(echo "scale=10; $long - $step" | bc)
done
done
I want to parallelize the inside loop (repeat).
I have installed GNU parallel and I know some of the basics but because the loop has more than one command I have no idea how I can parallelize them.
I transfered the content of the loop to another script which I guess is the way to go but my 3 commands need to take the variables (tracelength, ratio, short, long) and run according to them. any idea how to pass the parameters from a script to a subscript. or do you maybe have a better idea of parallelization?
I am editing the question because I used the answer below but the now my execution time is always 0.00 regadless of how big file.smt2 is.
this is the new version of code:
#!/bin/bash
doone() {
tracelength="$1"
short="$2"
long="$3"
ratio="$4"
#echo "$tracelength $short $long $ratio" >> results.csv
python3 main.py "$tracelength" "$short" "$long" "$ratio" >> file.smt2
gtime -f "%U" /Users/Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat < file.smt2
}
export -f doone
step=0.2
parallel doone \
::: 200 300 \
:::: <(seq 0 $step 0.5) \
::::+ <(seq 1 -$step 0.5) \
:::: <(seq 0 $step 0.5) \
::: {1..5} &> results.csv
In your original code you overwrite results.csv again and again. I assume that is a mistake and that you instead want it combined into a big csvfile:
doone() {
tracelength="$1"
short="$2"
long="$3"
ratio="$4"
echo "$tracelength $short $long $ratio"
python3 main.py "$tracelength" "$short" "$long" "$ratio" |
/usr/bin/time /Users/Desktop/optimathsat-1.5.1-macos-64-bit/bin/optimathsat
}
export -f doone
step=0.1
parallel doone \
::: 10 20 50 100 \
:::: <(seq 0 $step 0.9) \
::::+ <(seq 1 -$step 0.1) \
:::: <(seq 0 $step 0.9) \
::: {1..20} > results.csv
If you want a csvfile per run:
parallel --results outputdir/ doone \
::: 10 20 50 100 \
:::: <(seq 0 $step 0.9) \
::::+ <(seq 1 -$step 0.1) \
:::: <(seq 0 $step 0.9) \
::: {1..20}
If you want a csv file containing the arguments and run time use:
parallel --results output.csv doone \
::: 10 20 50 100 \
:::: <(seq 0 $step 0.9) \
::::+ <(seq 1 -$step 0.1) \
:::: <(seq 0 $step 0.9) \
::: {1..20}

Howto do floating point compasiosn in an if-statement within a GNU parallel block?

I want to run a batch process in parallel. For this I pipe a list to parallel. When I've an if-statement, that compares two floating point numbers (taken form here), the code doesn't run anymore. How can this be solved.
LIMIT=25
ps | parallel -j2 '
echo "Do stuff for {} to determine NUM"
NUM=33.3333 # set to demonstrate
if (( $(echo "$NUM > $LIMIT" | bc -l) )); then
echo "react..."
fi
echo "Do stuff..."
'
Prints:
Do stuff for \ \ PID\ TTY\ \ \ \ \ \ \ \ \ \ TIME\ CMD to determine NUM
Do stuff...
(standard_in) 2: syntax error
#... snipp
LIMIT is not set inside parallel shell. Running echo "$NUM > $LIMIT" | bc -l exapands to echo "123 > " | bc -l which results in syntax error reported by bc. You need to export/pass/put it's value to the shell run from inside parallel. Try this:
LIMIT=25
ps | parallel -j2 '
LIMIT="'"$LIMIT"'"
echo "Do stuff for {} to determine NUM"
NUM=33.3333 # set to demonstrate
if (( $(echo "$NUM > $LIMIT" | bc -l) )); then
echo "react..."
fi
echo "Do stuff..."
'
Or better use env_parallel, designed for such problems.
Side note: GNU parallel was designed for executing jobs in parallel using one or more computers. For scripts running on one computer it is better to stick with the xargs command, which is more commonly available (so you don't need to install some package each time you move your script to another machine).
While GNU Parallel is designed to deal correctly with commands spanning multiple lines, I personally find that hard to read. I prefer using a function:
doit() {
arg="$1"
echo "Do stuff for $a to determine NUM"
NUM=33.3333 # set to demonstrate
if (( $(echo "$NUM > $LIMIT" | bc -l) )); then
echo "react..."
fi
echo "Do stuff..."
}
export -f doit
LIMIT=25
export LIMIT
ps | parallel -j2 doit
Instead of the exports you can use env_parallel:
ps | env_parallel -j2 doit
If your environment is too big, use env_parallel --session before starting:
#!/bin/bash
env_parallel --session
# Define functions and variables _after_ running --session
doit() {
[...]
}
LIMIT=25
ps | env_parallel -j2 doit

gnu parellel re-run when it fails with a while loop

Assuming we have a csv file
1
2
3
4
Here is the code:
cat A.csv | while read A; do
echo "echo $A" > $A.sh
echo "$A.sh"
done | xargs -I {} parallel --joblog test.log --jobs 2 -k sh ::: {}
The above is a simplified case. But pretty much get the bulk part. The parallel here will run like this:
parallel --joblog test.log --jobs 2 -k sh ::: 1.sh 2.sh 3.sh 4.sh
Now assume 3.sh failed for some reasons. Is there going to be any easy way to rerun the failed 3.sh in the current shell script setting within the same line of parallel command? I have tried the following, but it doesnt works and quite lengthy.
cat A.csv | while read A; do
echo "echo $A" > $A.sh
echo "$A.sh"
done | xargs -I {} parallel --joblog test.log --jobs 2 -k sh ::: {}
# The above will do this:
# parallel --joblog test.log --jobs 2 -k sh ::: 1.sh 2.sh 3.sh 4.sh
cat A.csv | while read A; do
echo "echo $A" > $A.sh
echo "$A.sh"
done | xargs -I {} parallel --resume-failed --joblog test.log --jobs 2 -k sh ::: {}
# The above will do this:
# parallel --resume-failed --joblog test.log --jobs 2 -k sh ::: 1.sh 2.sh 3.sh 4.sh
######## 2017-09-25
Thanks Ole. I have tried the following
doit() {
myarg="$1"
if [ $myarg -eq 3 ]
then
exit 1
else
echo do crazy stuff with "$myarg"
fi
}
export -f doit
parallel -k --retries 3 --joblog ole.log doit :::: A.csv
It returns the log file like this:
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
1 : 1506362303.003 0.016 0 22 0 0 doit 1
2 : 1506362303.006 0.013 0 22 0 0 doit 2
3 : 1506362303.026 0.002 0 0 1 0 doit 3
4 : 1506362303.014 0.006 0 22 0 0 doit 4
However, I dont see the doit 3 being retried 3 times as expected. Could you enlighten? Thanks.
First: Generating .sh files seems like a bad idea. You can most likely just make a function instead:
doit() {
myarg="$1"
echo do crazy stuff with "$myarg"
}
export -f doit
To retry a failing command use --retries:
parallel --retries 3 doit :::: file.csv
If your CSV-file has multiple columns, use --colsep:
parallel --retries 3 --colsep '\t' doit :::: file.csv
Using this:
doit() {
myarg="$1"
if [ $myarg -eq 3 ] ; then
echo do not do crazy stuff with "$myarg"
exit 1
else
echo do crazy stuff with "$myarg"
fi
}
export -f doit
This will retry '3' job 3 times:
parallel -k --retries 3 --joblog ole.log doit ::: 1 2 3 4
It will only log the last time. To be convinced this actually runs thrice, run the output unbuffered:
parallel -u --retries 3 --joblog ole.log doit ::: 1 2 3 4

Is this the faster way to test cpu load using shell scripting?

I'm relatively new to shell scripting and I'm in the process of writing my own health checking scripts using bash.
Is the following script to test cpu load the best I can have in terms of performance, readability and maintainability?
#!/bin/sh
getloadavg5 () {
echo $(cat /proc/loadavg | cut -f2 -d' ')
}
getnumcpus () {
echo $(cat /proc/cpuinfo | grep '^processor' | wc -l)
}
awk \
-v failthold=0.8 \
-v warnthold=0.7 \
-v loadavg=$(getloadavg5) \
-v numcpus=$(getnumcpus) \
'BEGIN {
ratio=loadavg/numcpus
if (ratio >= failthold) exit 2
if (ratio >= warnthold) exit 1
exit 0
}'
This might be more suitable for the code review stackexchange, but without condoning the use of load averages in this way, here are some ideas:
#!/bin/sh
read -r one five fifteen rest < /proc/loadavg
cpus=$(grep -c '^processor' /proc/cpuinfo)
awk \
-v failthold=0.8 \
-v warnthold=0.7 \
-v loadavg="$five" \
-v numcpus="$cpus" \
'BEGIN {
ratio=loadavg/numcpus
if (ratio >= failthold) exit 2
if (ratio >= warnthold) exit 1
exit 0
}'
It doesn't have any of the unnecessary cats/echos.
It also happens to run faster thanks to forking 1 or 2 times (depending on shell) instead of ~10, but if performance is an issue then shell scripts should be avoided in general.

Bash increment input file name

I have been trying to write a better bash script to run a specified program repeatedly with different input files. This is the basic brute force version that works, but i want to be able to use a loop to change the argument before ".txt".
#!/bin/bash
./a.out 256.txt >> output.txt
./a.out 512.txt >> output.txt
./a.out 1024.txt >> output.txt
./a.out 2048.txt >> output.txt
./a.out 4096.txt >> output.txt
./a.out 8192.txt >> output.txt
./a.out 16384.txt >> output.txt
./a.out 32768.txt >> output.txt
./a.out 65536.txt >> output.txt
./a.out 131072.txt >> output.txt
./a.out 262144.txt >> output.txt
./a.out 524288.txt >> output.txt
I attempted to make a for loop and change the argument:
#!/bin/bash
arg=256
for((i=1; i<12; i++))
{
#need to raise $args to a power of i
./a.out $args.txt << output.txt
}
but i get an error on my ./a.out stating that ".txt" does not exist. What is the proper way to raise args to a power of i and use that as the argument to ./a.out?
This is all that you need to do:
for ((i=256; i<=524288; i*=2)); do ./a.out "$i.txt"; done > output.txt
Every time the loop iterates, i is multiplied by 2, which produces the sequence that you want. Rather than redirecting the output of each iteration separately to the file, I have also moved the redirection outside the loop. This way, the file will only contain the contents from the loop.
In your question, $args is empty (I guess that you meant to put $arg), which is why your filename is just .txt. Also, you have used << rather than >>, which I assumed was a typo.
Check this:
seq 12 | xargs -i echo "256 * 2 ^ ({} - 1)" | bc | xargs -i echo ./a.out {}.txt
If it's OK, then drop echo and add >> output.txt
seq 12 | xargs -i echo "256 * 2 ^ ({} - 1)" | bc | xargs -i ./a.out {}.txt >> output.txt

Resources