Best practices on setting exit status codes - bash

When implementing my own scripts, is it the best practice to exit with different exit codes for different failure scenarios? Or should I just return exit code 1 for failure and 0 for success providing the reason on stderr?

Providing a descriptive error message to stderr is fine and well for interactive users, but if you expect your scripts to be used by other scripts/programs, you should have distinctive error codes for different failures, so the calling script could make an informed decision on how to handle the failure.
If the calling program does not wish to handle different failures differently it could always check the return code against > 0 - but don't assume this is the case.

There are some recommendations, see wikipedia, but not normative, except the one of 0 iff success:
*In Unix and other POSIX-compatible systems, the wait system call sets a status value of type int packed as a bitfield with various types of child termination information. If the child terminated by exiting (as determined by the WIFEXITED macro; the usual alternative being that it died from an uncaught signal), SUS specifies that the low-order 8 bits of the exit status can be retrieved from the status value using the WEXITSTATUS macro in wait.h;[6][7] when using the POSIX waitid system call (added with POSIX-2001), the range of the status is no longer limited and can be in the full integer range.
POSIX-compatible systems typically use a convention of zero for success and non zero for error.[8] Some conventions have developed as to the relative meanings of various error codes; for example GNU recommend that codes with the high bit set be reserved for serious errors,[3] and FreeBSD have documented an extensive set of preferred interpretations.[9] Meanings for 15 status codes 64 through 78 are defined in sysexits.h. These historically derive from sendmail and other message transfer agents, but they have since found use in many other programs.[10]*


I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:
call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)
The variable nstartg is never altered again in the program. Later on, I have the boss process send eproc elements of an array edge to the workers:
if (me==0) then
do n=1,ntasks-1
(determine the starting point estart and the number eproc
of values to send)
call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
with a matching receive statement if me is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)
Here's where things get weird: the variable nstartg gets altered to n instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to
call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)
and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.
What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.
On the boss process, nstartg is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.
The most probable cause is that you pass an INTEGER scalar as the actual status argument to MPI_RECV when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE constant:
The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG constant and the field value can be accessed as status(MPI_TAG)) and if your status is simply a scalar INTEGER, then several other local variables would get overwritten. In your case it simply happens so that nstartg falls just above status in the stack.
If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE instead.

How to define Severity in SNMP?

Hi I am trying to understand SNMP trap mechanism, I referred and I understood that there are two types Generic and enterprise, Now In My Java code, I want to capture description from specific OID,
// variable binding for Enterprise Specific objects, Severity (should be defined in MIB file)
pdu.add(new VariableBinding(new OID(trapOid), new OctetString("Major")));
Here, Instead of "Major", what should I specify to get the severity for that specific OID?
Any help would be higly appreciated
In general, the severity is not an attribute of an SNMP trap.
Usually the custom severity mapping is defined in vendor specific MIB file as variable binding of specific trap. Here is an example:
sysLogMessageSeverity OBJECT-TYPE
emergency (0), --system is unusable
alert (1), --action must be taken immediately
critical (2), --critical conditions
error (3), --error conditions
warning (4), --warning conditions
notice (5), --normal but significant condition
informational (6), --informational messages
debug (7) --debug-level messages
ACCESS read-only
STATUS mandatory
"Severity level of the message"
::= { sysLogMibObjects 5 }
Please also note that most of modern NMSs allow the user to assign custom severity to any received SNMP traps based on user-defined rules.
i used two ways before:
1. adding a severity variable to MIB and including it in any sent trap.
2. classifying events causing traps to Critical, Major, ... and assigning an enterprise trap id range to each type like: traps with ids in range (1,100) are Critical, traps with ids in range (101,200) are Major and...

How to identify an unknown exit code?

I have the problem that a Mac application I wrote often suddenly exits with a for me unknown exit code 33 and without any further indication of what went wrong. I already searched the whole source code for the number 33, but I couldn't find anything (I was hoping for a line of code like exit(33)).
Can you give me any hint how I could track down this problem? Is there a way for example to set a breakpoint into the exit-function or something like that?
There are no predefined meanings for a processes exit code. The C standard defines EXIT_SUCCESS and EXIT_FAILURE without numeric values. On Unix-like systems they are defined to 0 and 1. Unix limits those exit to an unsigned 8-bit integer, so they range from 0 to 255, but the meaning for each exit code (except 0 for success) is up to the developer.
FreeBSD defines a couple of values as documented on the sysexits(3) manpage. But the number 33 is not among them.
Your best way to debug this problem would be to set a breakpoint to the various exit functions (exit, _exit) and see when and where they get called.
The problem was that there was an exit-call exit(12321) in my code, which gets reported in the console as 33. It seems the status-parameter of exit(int) can not be an arbitrary int-value.

EINTR and non-blocking calls

As is known, some blocking calls like read and write would return -1 and set errno to EINTR, and we need handle this.
My question is: Does this apply for non-blocking calls, e.g, set socket to O_NONBLOCK?
Since some articles and sources I have read said non-blocking calls don't need bother with this, but I have found no authoritative reference about it. If so, does it apply cross different implementations?
I cannot give you a definitive answer to this question, and the answer may further vary from system to system, but I would expect a non-blocking socket to never fail with EINTR. If you take a look at the man pages of various systems for the following socket functions bind(), connect(), send(), and receive(), or look those up in the POSIX standard, you'll notice something interesting: All these functions except one may return -1 and set errno to EINTR. The one function that is not documented to ever fail with EINTR is bind(). And bind() is also the only function of that list that will never block by default. So it seems that only blocking functions may fail because of EINTR, including read() and write(), yet if these functions never block, they also will never fail with EINTR and if you use O_NONBLOCK, those functions will never block.
It would also make no sense from a logical perspective. E.g. consider you are using blocking I/O and you call read() and this call has to block, but while it was blocking, a signal is sent to your process and thus the read request is unblocked. How should the system handle this situation? Claiming that read() did succeed? That would be a lie, it did not succeed because no data was read. Claiming it did succeed, but zero bytes data were read? This wouldn't be correct either, since a "zero read result" is used to indicate end-of-stream (or end-of-file), so your process would to assume that no data was read, because the end of a file has been reached (or a socket/pipe has been closed at other end), which simply isn't the case. The end-of-file (or end-of-stream) has not been reached, if you call read() again, it will be able to return more data. So that would also be a lie. You expectation is that this read call either succeeds and reads data or fails with an error. Thus the read call has to fail and return -1 in that case, but what errno value shall the system set? All the other error values indicate a critical error with the file descriptor, yet there was no critical error and indicating such an error would also be a lie. That's why errno is set to EINTR, which means: "There was nothing wrong with the stream. Your read call just failed, because it was interrupted by a signal. If it wasn't interrupted, it may still have succeeded, so if you still care for the data, please try again."
If you now switch to non-blocking I/O, the situation of above never arises. The read call will never block and if it cannot read data immediately, it will fail with an error EAGAIN (POSIX) or EWOULDBLOCK (unofficial, on Linux both are the same error, just alternative names for it), which means: "There is no data available right now and thus your read call would have to block and wait for data arriving, but blocking is not allowed, so it failed instead." So there is an error for every situation that may arise.
Of course, even with non-blocking I/O, the read call may have temporarily interrupted by a signal but why would the system have to indicate that? Every function call, whether this is a system function or one written by the user, may be temporarily interrupted by a signal, really every single one, no exception. If the system would have to inform the user whenever that happens, all system functions could possibly fail because of EINTR. However, even if there was a signal interruption, the functions usually perform their task all the way to the end, that's why this interruption is irrelevant. The error EINTR is used to tell the caller that the action he has requested was not performed because of a signal interruption, but in case of non-blocking I/O, there is no reason why the function should not perform the read or the write request, unless it cannot be performed right now, but then this can be indicated by an appropriate error.
To confirm my theory, I took a look at the kernel of MacOS (10.8), which is still largely based on the FreeBSD kernel and it seems to confirm the suspicion. If a read call is currently not possible, as no data are available, the kernel checks for the O_NONBLOCK flag in the file descriptor flags. If this flag is set, it fails immediately with EAGAIN. If it is not set, it puts the current thread to sleep by calling a function named msleep(). The function is documented here (as I said, OS X uses plenty of FreeBSD code in its kernel). This function causes the current thread to sleep until it is explicitly woken up (which is the case if data becomes ready for reading) or a timeout has been hit (e.g. you can set a receive timeout on sockets). Yet the thread is also woken up, if a signal is delivered, in which case msleep() itself returns EINTR and the next higher layer just passes this error through. So it is msleep() that produces the EINTR error, but if the O_NONBLOCK flag is set, msleep() is never called in the first place, hence this error cannot be returned.
Of course that was MacOS/FreeBSD, other systems may be different, but since most systems try to keep at least a certain level of consistency among these APIs, if a system breaks the assumption, that non-blocking I/O calls can never fail because of EINTR, this is probably not by intention and may even get fixed if your report it.
#Mecki Great explanation. To add to the accepted answer, the book "Unix Network Programming - Volume 1, Third Edition" (Stevens) makes a distinction between slow system call and others in chapter/section 5.9 - "Handling Interrupted System Calls". I am quoting from the book -
We used the term "slow system call" to describe accept, and we use
this term for any system call that can block forever. That is, the
system call need never return.
In the next para of the same section -
The basic rule that applies here is that when a process is blocked in
a slow system call and the process catches a signal and the signal
handler returns, the system call can return an error of EINTR.
Going by this explanation, a read / write on a non-blocking socket is not a slow system call and hence should not return an error of EINTR.
Just to add some evidence to #Mecki's answer, I found this discussion about fixing a bug in Linux where a patch caused non-blocking recvmsg to return EINTR. It was stated:
EINTR always means that you asked for a blocking operation, and a
signal arrived meanwhile.
Once you invert the "blocking" part of that set of conditions, EINTR
becomes an impossible event.
Look at what we do for AF_INET. We handle this the proper way.
If we are 'interrupted' by a signal while sleeping in lock_sock(),
recvmsg() on a non blocking socket, we return -EAGAIN properly, not
Fact that we potentially sleep to get the socket lock is hidden for
the user, its an implementation detail of the kernel.
We never return -EINTR, as stated in manpage for non blocking sockets.
Source here:

EWOULDBLOCK equivalent errno under Windows Perl

G'day Stackoverflowers,
I'm the author of Perl's autodie pragma, which changes Perl's built-ins to throw exceptions on failure. It's similar to Fatal, but with lexical scope, an extensible exception model, more intelligent return checking, and much, much nicer error messages. It will be replacing the Fatal module in future releases of Perl (provisionally 5.10.1+), but can currently be downloaded from the CPAN for Perl 5.8.0 and above.
The next release of autodie will add special handling for calls to flock with the LOCK_NB (non-blocking) option. While a failed flock call would normally result in an exception under autodie, a failed call to flock using LOCK_NB will merely return false if the returned errno ($!) is EWOULDBLOCK.
The reason for this is so people can continue to write code like:
use Fcntl qw(:flock);
use autodie; # All perl built-ins now succeed or die.
open(my $fh, '<', 'some_file.txt');
my $lock = flock($fh, LOCK_EX | LOCK_NB); # Lock the file if we can.
if ($lock) {
# Opportuntistically do something with the locked file.
In the above code, a lock that fails because someone else has the file locked already (EWOULDBLOCK) is not considered to be a hard error, so autodying flock merely returns a false value. In the situation that we're working with a filesystem that doesn't support file-locks, or a network filesystem and the network just died, then autodying flock generates an appropriate exception when it sees that our errno is not EWOULDBLOCK.
This works just fine in my dev version on Unix-flavoured systems, but it fails horribly under Windows. It appears that while Perl under Windows supports the LOCK_NB option, it doesn't define EWOULDBLOCK. Instead, the errno returned is 33 ("Domain error") when blocking would occur.
Obviously I can hard-code this as a constant into autodie, but that's not what I want to do here, because it means that I'm screwed if the errno ever changes (or has changed). I would love to compare it to the Windows equivalent of POSIX::EWOULDBLOCK, but I can't for the life of me find where such a thing would be defined. If you can help, let me know.
Answers I specifically don't want:
Suggestions to hard-code it as a constant (or worse still, leave a magic number floating about).
Not supporting LOCK_NB functionality at all under Windows.
Assuming that any failure from a LOCK_NB call to flock should return merely false.
Suggestions that I ask on p5p or perlmonks. I already know about them.
An explanation of how flock, or exceptions, or Fatal work. I already know. Intimately.
Under Win32 "native" Perl, note that $^E is more descriptive at 33, "The process cannot access the file because another process locked a portion of the file" which is ERROR_LOCK_VIOLATION (available from Win32::WinError).
For the Windows-specific error code, you want to use $^E. In this case, it's 33: "The process cannot access the file because another process has locked a portion of the file" (ERROR_LOCK_VIOLATION in winerror.h).
Unfortunately, I don't think Win32::WinError is in core. On the other hand, if Microsoft ever renumbered the Windows error codes, pretty much every Windows program ever written would stop working, so I don't think there'll be a problem with hardcoding it.
