Need help in a bash script.
The goal is:
- Run several commands in parallel
- Exit 1 if any command return not-zero exit status
I.e.
Run with middle command has error:
$ ./parallel_commands "echo 1" "_echo 2" "echo 3" && echo "OK"
1
3
./parallel_commands: line 4: _echo: command not found
OK <- Incorrect
Run with all commands have errors:
$ ./parallel_commands "_echo 1" "_echo 2" "_echo 3" && echo "OK"
./parallel_commands: line 4: _echo: command not found
./parallel_commands: line 4: _echo: command not found
./parallel_commands: line 4: _echo: command not found
-> Result is fail -> Correct
Bash script:
#!/bin/bash
for cmd in "$#"; do {
$cmd & pid=$!
PID_LIST+=" $pid";
} done
trap "kill $PID_LIST" SIGINT
wait $PID_LIST
Thanks.
You are probably looking for something like this using GNU Parallel:
parallel ::: "echo 1" "_echo 2" "echo 3" && echo OK
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Related
Is it possible in Bash to spawn multiple processes and after the last process finishes, report how many of the processes terminated correctly/didn't core dump?
Or would it be better to do this in Python?
(I'd ideally like to report which command failed, if any)
You can hopefully leverage GNU Parallel and its failure handling. General example:
parallel ::: ./processA ./processB ./processC
Specific example... here I run 3 simple jobs, each surrounded by single quotes and set it up to stop once all jobs are complete or failed:
parallel --halt soon,fail=100% ::: 'echo 0 && exit 0' 'echo 1 && exit 1' 'echo 2 && exit 2'
Output
0
1
parallel: This job failed:
echo 1 && exit 1
2
parallel: This job failed:
echo 2 && exit 2
By default, it will run N jobs in parallel, where N is the number of cores your CPU has, if you just want the jobs to be run sequentially, use:
parallel -j 1 ...
Obviously you could pipe the output through grep -c "This job failed" to count the failures.
Assuming you have a file with the commands:
cmd1
cmd2
cmd3
Then this will give you the number of failed jobs as long as you have at most 100 failures:
cat file | parallel
a=$?; echo $((`wc -l <file`-$a))
To get exactly which jobs failed use --joblog.
cat file | parallel --joblog my.log
# Find non-zero column 7
grep -v -P '(.*\t){6}0\t.*\t' my.log
It's easy.
First run your jobs in the background. Remember the pids.
Then for each child execute wait $pid and see the wait exit status, which is equal to the exit status of the childs pid you pass to it.
If the exit status is zero, it means the child terminated successfully.
#!/bin/bash
exit 0 &
childs+=($!)
exit 1 &
childs+=($!)
exit 2 &
childs+=($!)
echo 1 &
childs+=($!)
successes=0
for i in "${childs[#]}"; do
wait $i
if (($? == 0)); then
((successes++))
fi
done
# will print that 2 processes (exit 0 and echo 1) terminated successfully
printf "$successes processes terminated correctly and didn't core dump\n"
I have a bash script where I would like to run two processes in parallel, and have the script fail if either of the processes return non-zero. A minimal example of my initial attempt is:
#!/bin/bash
set -e
(sleep 3 ; true ) &
(sleep 4 ; false ) &
wait %1 && wait %2
echo "Still here, exit code: $?"
As expected this doesn't print the message because wait %1 && wait %2 fails and the script exits due to the set -e. However, if the waits are reversed such that the first one has the non-zero status (wait %2 && wait %1), the message is printed:
$ bash wait_test.sh
Still here, exit code: 1
Putting each wait on its own line works as I want and exits the script if either of the processes fail, but the fact that it doesn't work with && makes me suspect that I'm misunderstanding something here.
Can anyone explain what's going on?
You can achieve what you want quite elegantly with GNU Parallel and its "fail handling".
In general, it will run as many jobs in parallel as you have CPU cores.
In your case, try this, which says "exit with failed status if one or more jobs failed":
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 1
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
parallel: This job failed:
echo Job 2; exit 1
GNU Parallel exit status: 1
Now run it such that no job fails:
#!/bin/bash
cat <<EOF | parallel --halt soon,fail=1
echo Job 1; exit 0
echo Job 2; exit 0
EOF
echo GNU Parallel exit status: $?
Sample Output
Job 1
Job 2
GNU Parallel exit status: 0
If you dislike the heredoc syntax, you can put the list of jobs in a file called jobs.txt like this:
echo Job 1; exit 0
echo Job 2; exit 0
Then run with:
parallel --halt soon,fail=1 < jobs.txt
From bash manual section about usage of set
-e Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non- zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.
tl;dr
In a bash script, for a command list like this
command1 && command2
command1 is run in a separate environment, so it cannot affect the script's execution environment. but command2 is run in the current environment, so it can affect
I have a shell script that parses a flatfile and for each line in it, executes a hive script in parallel.
xargs -P 5 -d $'\n' -n 1 bash -c '
IFS='\t' read -r arg1 arg2 arg 3<<<"$1"
eval "hive -hiveconf tableName=$arg1 -f ../hive/LoadTables.hql" 2> ../path/LogFile-$arg1
' _ < ../path/TableNames.txt
Question is how can I capture the exit codes from each parallel process, so even if one child process fails, exit the script at the end with the error code.
Unfortunately I can't use gnu parallel.
I suppose that you look for something fancier, but a simple solution is to store possible errors in a tmp file and look it up afterwards:
FilewithErrors=/tmp/errors.txt
FinalError=0
xargs -P 5 -d $'\n' -n 1 bash -c '
IFS='\t' read -r arg1 arg2 arg 3<<<"$1"
eval "hive -hiveconf tableName=$arg1 -f ../hive/LoadTables.hql || echo $args1 > $FilewithErrors" 2> ../path/LogFile-$arg1
' _ < ../path/TableNames.txt
if [ -e $FilewithErrors ]; then FinalError=1; fi
rm $FilewithErrors
return $FinalError
As per the comments: Use GNU Parallel installed as a personal or minimal installation as described in http://git.savannah.gnu.org/cgit/parallel.git/tree/README
From man parallel
EXIT STATUS
Exit status depends on --halt-on-error if one of these are used: success=X,
success=Y%, fail=Y%.
0 All jobs ran without error. If success=X is used: X jobs ran without
error. If success=Y% is used: Y% of the jobs ran without error.
1-100 Some of the jobs failed. The exit status gives the number of failed jobs.
If Y% is used the exit status is the percentage of jobs that failed.
101 More than 100 jobs failed.
255 Other error.
If you need the exact error code (and not just whether the job failed or not) use: --joblog mylog.
You can probably do something like:
cat ../path/TableNames.txt |
parallel --colsep '\t' --halt now,fail=1 hive -hiveconf tableName={1} -f ../hive/LoadTables.hql '2>' ../path/LogFile-{1}
fail=1 will stop spawning new jobs if one job fails, and exit with the exit code from the job.
now will kill the remaining jobs. If you want the remaining jobs to exit of "natural causes", use soon instead.
I want to distribute the work from a master server to multiple worker servers using batches.
Ideally I would have a tasks.txt file with the list of tasks to execute
cmd args 1
cmd args 2
cmd args 3
cmd args 4
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
and each worker server will connect using ssh, read the file and mark each line as in progress or done
#cmd args 1 #worker1 - done
#cmd args 2 #worker2 - in progress
#cmd args 3 #worker3 - in progress
#cmd args 4 #worker1 - in progress
cmd args 5
cmd args 6
cmd args 7
...
cmd args n
I know how to make the ssh connection, read the file, and execute remotely but don't know how to make the read and write an atomic operation, in order to not have cases where 2 servers start the same task, and how to update the line.
I would like for each worker to go to the list of tasks and lock the next available task in the list rather than the server actively commanding the workers, as I will have a flexible number of workers clones that I will start or close according to how fast I will need the tasks to complete.
UPDATE:
and my ideea for the worker script would be :
#!/bin/bash
taskCmd=""
taskLine=0
masterSSH="ssh usr#masterhost"
tasksFile="/path/to/tasks.txt"
function getTask(){
while [[ $taskCmd == "" ]]
do
sleep 1;
taskCmd_and_taskLine=$($masterSSH "#read_and_lock_next_available_line $tasksFile;")
taskCmd=${taskCmd_and_taskLine[0]}
taskLine=${taskCmd_and_taskLine[1]}
done
}
function updateTask(){
message=$1
$masterSSH "#update_currentTask $tasksFile $taskLine $message;"
}
function doTask(){
return $taskCmd;
}
while [[ 1 -eq 1 ]]
do
getTask
updateTask "in progress"
doTask
taskErrCode=$?
if [[ $taskErrCode -eq 0 ]]
then
updateTask "done, finished successfully"
else
updateTask "done, error $taskErrCode"
fi
taskCmd="";
taskLine=0;
done
You can use flock to concurrently access the file:
exec 200>>/some/any/file ## create a file descriptor
flock -w 30 200 ## concurrently access /some/any/file, timeout of 30 sec.
You can point the file descriptor to your tasks list or any other file, but of course the same file in order to flock work. The lock will me removed as soon as the process that created it is done or fail. You can also remove the lock by yourself when you don't need it anymore:
flock -u 200
An usage sample:
ssh user#x.x.x.x '
set -e
exec 200>>f
echo locking...
flock -w 10 200
echo working...
sleep 5
'
set -e fails the script if any step fails. Play with the sleep time and execute this script in parallel. Just one sleep will execute at a time.
Check if you are reinventing GNU Parallel:
parallel -S worker1 -S worker2 command ::: arg1 arg2 arg3
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
try to implement something like
while read line; do
echo $line
#check if the line contains the # char, if not execute the ssh, else nothing to do
checkAlreadyDone=$(grep "^#" $line)
if [ -z "${checkAlreadyDone}" ];then
<insert here the command to execute ssh call>
<here, if everything has been executed without issue, you should
add a commad to update the file taskList.txt
one option could be to insert a sed command but it should be tested>
else
echo "nothing to do for $line"
fi
done < taskList.txt
Regards
Claudio
I think I have successfully implemented one: https://github.com/guo-yong-zhi/DistributedTaskQueue
It is mainly based on bash, ssh and flock, and python3 is required for string processing.
I have a shell script with a for loop. Does loop wait for execution of the command in its body before iterating?
Thanks in Advance
Here is my code. Will the commands execute sequentially or parallel?
for m in "${mode[#]}"
do
cmd="exec $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m $m"
$cmd
eval "$cmd"
done
Assuming that you haven't background-ed the command, then yes.
For example:
for i in {1..10}; do cmd; done
waits for cmd to complete before continuing the loop, whereas:
for i in {1..10}; do cmd &; done
doesn't.
If you want to run your commands in parallel, I would suggest changing your loop to something like this:
for m in "${mode[#]}"
do
"$perlExecutablePath" "$perlScriptFilePath" --owner "$j" -rel "$i" -m "$m" &
done
This runs each command in the background, so it doesn't wait for one command to finish before the next one starts.
An alternative would be to look at GNU Parallel, which is designed for this purpose.
Using GNU Parallel it looks like this:
parallel $perlExecutablePath $perlScriptFilePath --owner $j -rel $i -m {} ::: "${mode[#]}"
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel