CHILD=$!
sleep 2;
if kill -KILL ${CHILD} 2>/dev/null; then
echo "*** timed out after 2 seconds"
KILLED=yes
else
echo "terminated within time limit"
killed=no
fi
wait ${CHILD}
I'm a little confused on what is going on here and how the 'if' executes. My understanding is that this checks if killing a child process was successful then setting the KILLED variable to yes and printing out a message. Otherwise set KILLED to no and print a different message.
I thought that when a command is successful it returns a 0? If that's true wouldn't the 'if' interpret that as false and execute the else?
I'm also confused on what the messages printed out mean. I think I'm not understanding the difference between 'timed out' and 'terminated'. (i.e. I would assume the 'terminated' message would go where the 'timed out' message is, and vice versa).
Thanks!
It's a little counter-intuitive if you're coming from a language like C or Java, but in bash, a 0 exit status is actually interpreted as true. Here's an excerpt from the manual:
The most compact syntax of the if command is:
if TEST-COMMANDS; then CONSEQUENT-COMMANDS; fi
The TEST-COMMAND list is executed, and if its return status is zero, the CONSEQUENT-COMMANDS list is executed. The return status is the exit status of the last command executed, or zero if no condition tested true.
This is pretty useful, because there are usually a lot of ways a process can fail (giving different non-zero statuses), but only one way for everything to work correctly (zero status).
I think your other questions answer themselves after that :-)
kill returns an exit code of 0 (true) if the process still existed it and was killed. In this case, KILLED=yes.
kill returns an exit code of 1 (false) if the kill failed, probably because the process was no longer running. In this case, KILLED=no.
Related
I need to run a command with the timeout function, so that if it is not executed in x time, then the script shall start from the beginning, otherwise it can proceed and execute other commands. Can I do that?
I already tried to use case, but it works as long as I use one, if use two cases it gives me an error. Does anybody know how to do it with timeout? Maybe using an user-defined signal like -s USR1, but I don't know how to set that up and if I can execute a command with a signal :/
If some_command completes within 10 seconds, this will continue on with the rest of the script. If some_command times out, then it is repeated until it doesn't time out:
while timeout 10 some_command; [ $? -eq 124 ]
do
:
done
echo "Continuing on with script"
How it works
If a command times out, then, by default, timeout exits with code 124. We can use this to test whether the command timed out and, hence, whether it needs to be repeated. In more detail:
while timeout 10 some_command; [ $? -eq 124 ]; do
This starts a while loop by executing some_command with a 10 second timeout and then testing whether the exit code from timeout is 124 or not. If it is 124, then the while-loop repeats.
:
The command : is a no-op. We don't need any command within the body of the while loop, we use this command.
done
This marks the end of the while loop.
Special case: ill-behaved commands
This assumes that some_command responds responsibly to the TERM signal. If it doesn't, the exit code will be 128+9 instead of 124.
Documentation
From man timeout:
If the command times out, and --preserve-status is not set, then
exit with status 124. Otherwise, exit with the status of COMMAND.
If no signal is specified, send the TERM signal upon timeout. The
TERM signal kills any process that does not block or catch that
signal. It may be necessary to use the KILL (9) signal, since this
signal cannot be caught, in which case
the exit status is 128+9 rather than 124.
I saw here the use of:
while ps | grep " $my_pid "
Question: In this kind of syntax while -command-, what is the while loop checking, return code of the command or stdout?
It's checking the return value of the process pipeline, which happens to be the return value of the last element in that pipeline (unless pipefail is set, but it usually isn't). The bash doco has this to say:
while list-1; do list-2; done
The while command continuously executes the list list-2 as long as the last command in the list list-1 returns an exit status of zero.
Elsewhere, it states:
The return status of a pipeline is the exit status of the last command
So this while statement continues as long as grep returns zero. And the grep doco states:
The exit status is 0 if a line is selected.
So the intent is almost certainly to keep looping as long as the process you're monitoring is still alive.
Of course, this is a rather "flaky" way of detecting if your process is still running. For a start, if my_pid is equal to 60, that grep is going to return zero if any of the processes 60, 602, or 3060, are running.
It's also going to return zero if, like I often have, you have some number of sleep 60 or sleep 3600 processes in flight, no matter their process ID.
Perhaps a better way, assuming you're on a system with procfs, is to use something like:
while [[ -d /proc/$my_pid ]] ; do ...
This will solve everything but the possibility that process IDs may be recycled so that a different process may start up with the same PID between checks but, since most UNIX-like systems allocate PIDs sequentially with wrap-around, that's very unlikely.
The abort documentation says abort will
Terminate execution immediately, effectively by calling Kernel.exit(false).
What exactly does "immediately" mean? What is the difference between abort and exit with non-true status?
"Exit, Exit! Abort, Raiseā¦Get Me Outta Here!" describes everything you'd want to know I think.
In short:
Kernel.exit(code) "exits" the script immediately and returns the code to the OS, however, just before doing it, it calls any registered at_exit handler that your code could have registered.
Kernel.exit!(code) does the same, but exits immediatelly, no at_exit handlers called.
Kernel.abort(message) takes a message that will be printed to STDERR just before exiting with a failure code=1.
Different values of exit codes are barely suitable for detecting problems and debugging the code. However, they are very simple to use and making the parent process read them is almost trivial. Hence, exit and exit!.
If you can spend more time and make the error checking more robust, you'll need some serious error messages, not just codes. Traditionally, you can print them to STDERR if it exists. You can print manually to STDERR via normal puts, but exit-codes will still be used at the lowest level.
Printing to STDERR does not mark your job automatically as failed, so, abort was created to allow you to write and quit easily. A default exit code of 1 is enough to mark the FAIL condition, as it's assumed that all the real contextual information will be included in the error messages provided by you.
Also note that any unhanded exceptions, such as raise "wtf" with no rescue anywhere, actually behave as if calling Kernel.abort: they print to STDERR and use exitcode=1.
You said exit(false) but the exit! documentation says that the parameter is status code to be used.
I've just checked that on Windows and Ruby 1.9.3:
exit 0 # quits with code: 0
exit 1 # quits with code: 1
exit false # quits with code: 1
exit true # quits with code: 0
which really surprises me, as I'd assume that false would be coerced to 0 in the traditional C way. So, maybe you should rather be using integers like 0 or 1 to be perfectly clear about what code will be used.
I'm creating a startup/shutdown script for WebSEAL. It's written to allow several instances to be stopped/started in parallel. The only problem is verifying that it completed without issue. With other infrastructures, I could simply grep for a particular keyword in the output (which I redirect to a log file), but WebSEAL does not give any success/error message.
Instead, I thought to use the $? to throw the exit status into a dynamic variable that will be checked after the startups have occured (during log consolidation).
Here is the code that starts/stops and then creates the variable
${PDCOMMAND} >> ${LOGDIR}/${APP}.txt 2>&1 &
let return_${APP}=$?
PDCOMMAND is a valid startup/stop command: aka pdweb start my_instance
APP is the name of the instance: aka my_instance
The goal is that return_${APP} (return_my_instance) will have a value of 0 (success) or 1 (failure) when I check it at a later point in the script.
Are there problems using the $? for a command that may have not technically completed at the time that it was set, or does it set it upon completion of that? So let's say I have 3 instances
instance_1, instance_2, instance_3
if I ran the following:
pdweb start instance1 &
let return_instance_1 = $?
pdweb start instance2 &
let return_instance_2 = $?
pdweb start instance_3 &
let_return_instance_3 = $?
would return_instance_[1|2|3] have the correct values if they started in unequal amounts of time? If instance_3 starts before instance_1, for example, will it still output the result of instance_3 to return_instance_3?
Basically, I'm trying to figure out how the command line treats an asynchronous request in regards to the exit status.
Thanks in advance
No; the exit status code is only available when the command finishes. (That's why it's called "exit status".) If you successfully spawned a service and it is up and running, it does not yet have an exit status.
If I am able to correctly guess what you are trying to accomplish, you could reap the values of $! after starting each instance, wait for a "reasonable" time (a few seconds?) and check that the processes you started are still running. If they have terminated, there was a problem.
I'm currently wrapping scripts with begin; rescue; end. Which works, but is annoying to un/comment at two different places and so on. Is there something like error_reporting(0); in PHP, but applied to the exit code and STDERR output?
You could try trapping the EXIT signal:
The special signal name "EXIT" or signal number zero will be invoked just prior to program termination.
Something like this should guarantee that your script always returns zero to the operating system:
Signal.trap('EXIT') { exit 0 }
For example, this script:
Signal.trap('EXIT') { exit 0 }
exit 1
actually returns zero to the OS despite triggering script's termination with exit 1.
Actually I did not understand what you are asking for. Here is the answer as I understand. But would be useful if you provide some more detail.
def
some code...
rescue
abort
end