I'm trying to run multiple rake tasks named on the command line in parallel and running into some issues.
A demo Rakefile is:
task :one do
puts 'one'
sh 'sleep 100'
end
task :two do
puts 'two'
end
Running rake --jobs 2 one two results in this:
one
sleep 100
<sleeps, does not execute task "two">
Meanwhile, a similar Makefile does what I would expect:
one:
echo hi
sleep 100
two:
echo two
sleep 100
.PHONY: one two
make --jobs 2 one two results in:
echo hi
echo two
two
hi
sleep 100
sleep 100
Based on the docs, I would expect tasks named on the command line to be treated like prerequisites and executed in parallel.
What else do I need to do to get rake to run targets in parallel?
Related
I need to run n times the same test case and count the number of times the test pass or fail. I wrote the following snippet of code:
CONCATNAME=$CLASSNAME”#”$METHODNAME
PASSTEST=0
FAILTEST=0
TOSEARCH="BUILD SUCCESS"
i=0
for i in $( seq 1 $NROUNDS )
do
echo "exec command: mvn -Dtest="$CONCATNAME" test"
message=$(mvn -Dtest=$CONCATNAME test)
if echo "$message" | grep -q "$TOSEARCH"; then
PASSTEST=$((PASSTEST+1))
else
FAILTEST=$((FAILTEST+1))
fi
done
but the execution takes several minutes to complete. Is it possible without modifying the POM.xml file to run the mvn -Dtest command in parallel?
Is it possible in Bash to spawn multiple processes and after the last process finishes, report how many of the processes terminated correctly/didn't core dump?
Or would it be better to do this in Python?
(I'd ideally like to report which command failed, if any)
You can hopefully leverage GNU Parallel and its failure handling. General example:
parallel ::: ./processA ./processB ./processC
Specific example... here I run 3 simple jobs, each surrounded by single quotes and set it up to stop once all jobs are complete or failed:
parallel --halt soon,fail=100% ::: 'echo 0 && exit 0' 'echo 1 && exit 1' 'echo 2 && exit 2'
Output
0
1
parallel: This job failed:
echo 1 && exit 1
2
parallel: This job failed:
echo 2 && exit 2
By default, it will run N jobs in parallel, where N is the number of cores your CPU has, if you just want the jobs to be run sequentially, use:
parallel -j 1 ...
Obviously you could pipe the output through grep -c "This job failed" to count the failures.
Assuming you have a file with the commands:
cmd1
cmd2
cmd3
Then this will give you the number of failed jobs as long as you have at most 100 failures:
cat file | parallel
a=$?; echo $((`wc -l <file`-$a))
To get exactly which jobs failed use --joblog.
cat file | parallel --joblog my.log
# Find non-zero column 7
grep -v -P '(.*\t){6}0\t.*\t' my.log
It's easy.
First run your jobs in the background. Remember the pids.
Then for each child execute wait $pid and see the wait exit status, which is equal to the exit status of the childs pid you pass to it.
If the exit status is zero, it means the child terminated successfully.
#!/bin/bash
exit 0 &
childs+=($!)
exit 1 &
childs+=($!)
exit 2 &
childs+=($!)
echo 1 &
childs+=($!)
successes=0
for i in "${childs[#]}"; do
wait $i
if (($? == 0)); then
((successes++))
fi
done
# will print that 2 processes (exit 0 and echo 1) terminated successfully
printf "$successes processes terminated correctly and didn't core dump\n"
I want to distribute the work from a master server to multiple worker servers using batches.
Ideally I would have a tasks.txt file with the list of tasks to execute
cmd args 1
cmd args 2
cmd args 3
cmd args 4
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
and each worker server will connect using ssh, read the file and mark each line as in progress or done
#cmd args 1 #worker1 - done
#cmd args 2 #worker2 - in progress
#cmd args 3 #worker3 - in progress
#cmd args 4 #worker1 - in progress
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
I know how to make the ssh connection, read the file, and execute remotely but don't know how to make the read and write an atomic operation, in order to not have cases where 2 servers start the same task, and how to update the line.
I would like for each worker to go to the list of tasks and lock the next available task in the list rather than the server actively commanding the workers, as I will have a flexible number of workers clones that I will start or close according to how fast I will need the tasks to complete.
UPDATE:
and my ideea for the worker script would be :
#!/bin/bash
taskCmd=""
taskLine=0
masterSSH="ssh usr#masterhost"
tasksFile="/path/to/tasks.txt"
function getTask(){
while [[ $taskCmd == "" ]]
do
sleep 1;
taskCmd_and_taskLine=$($masterSSH "#read_and_lock_next_available_line $tasksFile;")
taskCmd=${taskCmd_and_taskLine[0]}
taskLine=${taskCmd_and_taskLine[1]}
done
}
function updateTask(){
message=$1
$masterSSH "#update_currentTask $tasksFile $taskLine $message;"
}
function doTask(){
return $taskCmd;
}
while [[ 1 -eq 1 ]]
do
getTask
updateTask "in progress"
doTask
taskErrCode=$?
if [[ $taskErrCode -eq 0 ]]
then
updateTask "done, finished successfully"
else
updateTask "done, error $taskErrCode"
fi
taskCmd="";
taskLine=0;
done
You can use flock to concurrently access the file:
exec 200>>/some/any/file ## create a file descriptor
flock -w 30 200 ## concurrently access /some/any/file, timeout of 30 sec.
You can point the file descriptor to your tasks list or any other file, but of course the same file in order to flock work. The lock will me removed as soon as the process that created it is done or fail. You can also remove the lock by yourself when you don't need it anymore:
flock -u 200
An usage sample:
ssh user#x.x.x.x '
set -e
exec 200>>f
echo locking...
flock -w 10 200
echo working...
sleep 5
'
set -e fails the script if any step fails. Play with the sleep time and execute this script in parallel. Just one sleep will execute at a time.
Check if you are reinventing GNU Parallel:
parallel -S worker1 -S worker2 command ::: arg1 arg2 arg3
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
try to implement something like
while read line; do
echo $line
#check if the line contains the # char, if not execute the ssh, else nothing to do
checkAlreadyDone=$(grep "^#" $line)
if [ -z "${checkAlreadyDone}" ];then
<insert here the command to execute ssh call>
<here, if everything has been executed without issue, you should
add a commad to update the file taskList.txt
one option could be to insert a sed command but it should be tested>
else
echo "nothing to do for $line"
fi
done < taskList.txt
Regards
Claudio
I think I have successfully implemented one: https://github.com/guo-yong-zhi/DistributedTaskQueue
It is mainly based on bash, ssh and flock, and python3 is required for string processing.
Need help in a bash script.
The goal is:
- Run several commands in parallel
- Exit 1 if any command return not-zero exit status
I.e.
Run with middle command has error:
$ ./parallel_commands "echo 1" "_echo 2" "echo 3" && echo "OK"
1
3
./parallel_commands: line 4: _echo: command not found
OK <- Incorrect
Run with all commands have errors:
$ ./parallel_commands "_echo 1" "_echo 2" "_echo 3" && echo "OK"
./parallel_commands: line 4: _echo: command not found
./parallel_commands: line 4: _echo: command not found
./parallel_commands: line 4: _echo: command not found
-> Result is fail -> Correct
Bash script:
#!/bin/bash
for cmd in "$#"; do {
$cmd & pid=$!
PID_LIST+=" $pid";
} done
trap "kill $PID_LIST" SIGINT
wait $PID_LIST
Thanks.
You are probably looking for something like this using GNU Parallel:
parallel ::: "echo 1" "_echo 2" "echo 3" && echo OK
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
I have 3 commands that I am trying to run when the system start as a cronjob.
# Sleep at startup
sleep 2m
#command num 1:
./trace.out
sleep 5
#Command num 2:
java -jar file.jar
sleep 5
#Command num 3:
sh ./script.sh
is there any way to make this script more efficient using a loop, some way to make sure every script is running before executing the next one.
I would use && between each command as it executes each command, only if the previous one succeeded! For example:
# Sleep at startup
sleep 2m
./trace.out && java -jar file.jar && sh ./script.sh