Obtain the exit code for a known process id - bash

I have a list of processes triggered one after the other, in parallel. And, I need to know the exit code of all of these processes when they complete execution, without waiting for all of the processes to finish.
While status=$?; echo $status would provide the exit code for the last command executed, how do I know the exit code of any completed process, knowing the process id?

You can do that with GNU Parallel like this:
parallel --halt=now,done=1 ::: ./job1 ./job2 ./job3
The --halt=now,done=1 means halt immediately, as soon as any one job is done, killing all outstanding jobs immediately and exiting itself with the exit status of the complete job.
There are options to exit on success, or on failure as well as by completion. The number of successful, failing or complete jobs can be given as a percentage too. See documentation here.

Save the background job id using a wrapper shell function. After that the exit status of each job can be queried:
#!/bin/bash
jobs=()
function run_child() {
"$#" &
jobs+=($!)
}
run_child sleep 1
run_child sleep 2
run_child false
for job in ${jobs[#]}; do
wait $job
echo Exit Code $?
done
Output:
Exit Code 0
Exit Code 0
Exit Code 1

Related

Fish shell: how to wait for a background job and get its exit status?

I'm trying to figure out how to run multiple long-running commands in parallel in the fish shell, but with proper error handling so that my overall script/one-liner doesn't succeed unless they all do. But I can't figure out how to deal with exit status.
The posix spec for wait says that it should communicate exit status:
The exit status returned by the wait utility shall be the exit status of the process requested by the last pid operand.
And the GitHub thread about adding wait to fish spends a lot of time discussing exit statuses (but then was closed suddenly years later without explanation). But it doesn't seem that wait actually does deal correctly with exit statuses in fish:
> ls nonexistent; echo $status
ls: cannot access 'nonexistent': No such file or directory
2
> ls nonexistent &; wait $last_pid; echo $status
ls: cannot access 'nonexistent': No such file or directory
0
fish: Job 1, 'ls nonexistent &' has ended
How can I wait for a background job to terminate and be able to tell whether it succeeded or not?
wait does not yet expose the exit status, but you can get it with an on-process-exit event handler:
sleep 3 &
function sleep_ended --on-process-exit $last_pid
echo "pid $argv[2] exited with status $argv[3]"
end
wait

Why does asynchronous child become a zombie altough parent waits for it?

I use the following code to start some long running task asynchronously but detect if it fails at the very beginning:
sleep 0.3 &
long_running &
wait -n
# [Error handling]
# Do other stuff.
# Wait for completion of 'long_running'.
wait -n
# [Error handling]
If I SIGINT (using Ctrl+C) the script during waiting for the long running child, the long running task just continues and gets a zombie after completion.
Furthermore the parent script consumes full CPU. I have to SIGKILL the parent to get rid of the processes.
I know that SIGINT is ignored by the child (which is probably the reason it continues till completion), but why does the parent get into such confusing state?
It works (like expected) if I kill the child when SIGINT has been received (the commented trap below), but I want to understand why it does not work the other way.
Below is the complete script. Please refer also to https://gist.github.com/doak/08b69c500c91a7fade9f2c61882c93b4 for an even more complete example/try-out:
#!/usr/bin/env bash
count="count=100000" # Adapt that 'dd' lasts about 3s. Comment out to run forever.
#fail=YES # Demonstrates failure of background task.
# This would work.
#trap "jobs -p | xargs kill" SIGINT
echo executing long running asynchronous task ...
sleep 0.3 &
dd if=/dev/zero$fail of=/dev/null bs=1M $count &
wait -n
errcode=$?
if test $errcode -ne -0; then
echo "failed"
exit $errcode
fi
echo waiting for completion ...
wait -n
errcode=$?
echo finished
exit $errcode
It could be that my question is related to this C question, although it discusses the system call wait(): Possible for parent process to HANG on "wait" step if child process becomes a ZOMBIE or CRASHES?

Bash - kill ALL parallel processes if one fails

I have 2 processes that I run via:
init_thing & start_thing
init_thing polls the logs of start_thing for a particular line that it considers to show that start_thing has successfully begun, then executes a few commands against it (e.g. adding users).
The init_thing function could fail with a non-zero exit code if it considers start_thing to have timed out.
The start_thing function could fail, but if successful it runs forever.
What I want to do is kill start_thing if init_thing fails.
I've seen use of GNU parallel in a lot of answers, but it seems to rely on both processes completing (i.e. exiting with a zero exit-code), which in my case doesn't apply.
Is there a way to do this with bash? Perhaps using parallel in a way that I haven't seen/understood?
trap ERR may be useful, where pid variable contains the pid of process to kill
trap 'kill $pid' ERR
after reflexion it is clearer to write explicitly
init_thing || {
echo "something goes wrong, killing $pid"
kill "$pid"
}

Bash files: run process in parallel and stop when one is over

I would like to start two C codes from a bash file in parallel and the second one stops when the first one has finished.
The instruction wait expects both processes to stop which is not what I would like to do.
Thanks for any suggestion.
GNU parallel can do this kind of job. Check termination section, it can shutdown down remaining processes based on the exit code (either success or failure:
parallel -j2 --halt now,success=1 ::: 'cmd1 args' 'cmd2 args'
When one of the job finishes successfully, it will send TERM signal to the other jobs (if jobs are not terminated it forces using KILL signal).
With $! you get the pid of the last command executed in parallel. See some nice examples here: Bash `wait` command, waiting for more than 1 PID to finish execution
For your peculiar problem I imagine something like:
#!/bin/bash
command_master() {
echo -e "Command_master"
sleep 1
}
command_tokill() {
echo -e "Command_tokill"
sleep 10
}
command_master & pid_master=($!)
command_tokill & pid_tokill=($!)
wait "$pid_master"
kill "$pid_tokill"
wait -n is what you are looking for. It waits for the next job to finish. You can then have a list of the PIDs of the remaining jobs with jobs -p if you want to kill them.
prog1 & pids=( $! )
prog2 & pids+=( $! )
wait -n
kill "${pids[#]}"
This requires bash.
The two programs are started as background jobs, and the shell waits for one of them to exit.
When this happens, kill is used to terminate both processes (this will cause an error since one of them is already dead).

How can i check the exit code of individual process running in parallel executed by GNU Parallel

I am having an array in linux shell script. Array contains list of commands in bash shell script.
For instance :
args =( "ls","mv /abc/file1 /xyz/file2","hive -e 'select * from something'" )
Now I am executing these commands in array using GNU parallel as bellow
parallel ::: "${args[#]}"
I want to check the status code of individual process when they finish. I am aware that $? will give me the number of process which have failed but I want to know the exit code of individual process. How can I catch the exit codes of individual processes executed in GNU parallel?
Use the --halt 1 option, which makes parallel quit on the halting command, while returning it's exit code. From man parallel:
--halt-on-error val
--halt val
How should GNU parallel terminate if one of more jobs fail?
0 Do not halt if a job fails. Exit status will be the
number of jobs failed. This is the default.
1 Do not start new jobs if a job fails, but complete the
running jobs including cleanup. The exit status will be
the exit status from the last failing job.
2 Kill off all jobs immediately and exit without cleanup.
The exit status will be the exit status from the
failing job.
1-99% If val% of the jobs fail and minimum 3: Do not start
new jobs, but complete the running jobs including
cleanup. The exit status will be the exit status from
the last failing job.
--joblog logfile
Logfile for executed jobs. Save a list of the executed jobs to logfile in the following TAB separated format: sequence number, sshlogin, start time as seconds since epoch, run time in seconds, bytes in files transferred, bytes in files returned, exit status, signal, and command run.

Resources