why shell uses 0 as success? - shell

In C and many other languages, 0 means false and 1 ( or non-zero )
means true. In the shell 0 for the status of a process means success
and non-zero means an error. Shell if statements essentially use 0
for true. Why did the writer of the first shell decide to use 0 for
true?

The OP asking about the
Why did the writer of the first shell decide to use 0 for true?
The short answer:
Isn't possible to answer this question in this form,
because the first shell has not any exit status.
EDIT
Because #John Kugelman asked for summarisation, this answer getting a bit long.
Nor the second, third and Nth shell hasn't anything that could be called "the exit status". For the approximative correct answer, need to go to deeper into the history.
The shell was invented
by "Louis Pouzin" as the "RUNCOM" tool, in
the MIT's Compatible Time-Sharing System
read about CTSS,
where is written:
Louis Pouzin also invented RUNCOM for CTSS. This facility, the
direct ancestor of the Unix shell script, allowed users to create
a file-system file of commands to be executed, with parameter
substitution. Louis also produced a design for the Multics shell,
ancestor of the Unix shell.
Read also this page.
The "next level" was in the Multics, called as
the Multics Command Language:
The command language interpreter, the Shell, is normally driven by
the Listener.
Its function is to listen for requests in the form of command lines
typed in at the user console. In the above command language
description, the listener reads in a line from the console, evaluates
the line as a command, and re-calls itself to repeat the function.
It hasn't anything that could be called "exit status".
The programs calls the terminate{file/process} procedure,
what stops the program execution.
And this is one of the most important points. Need differentiate between
the exit as system call
When you terminating a running (compiled) program, (for example, in "C" using the exit(N)) in the reality you call
a library function, what makes the bridge to the underlying operating system and correctly terminates the program.
This means the "exit(N)" (from the point of view of the exit function) is OS implementation-dependent.
See Wikipedia
shell's exit status -
The exit status of an executed command is the value returned by the waitpid system call or equivalent function.
Wikipedia. The waitpid like functions usually returns the exit status from the previous point, but it is OS implementation-dependent. (ofc, now standardized by POSIX).
Back to the history:
After the Multics, the UNIX got developed. From this page
the shell as we know today has two predecessors:
the Thompson shell (version 1 up to version 6) - if someone is interested, here is maintained a runnable version for the v6.
and the Programmers Work Bench [PWB] shell
the first shell that we known today was the Bourne shell V7
If interested, read through the manuals - links are here.
The first mentioning of the "exit code" is in the
Manual of the PWB (aka Mashey) Shell.
All shells "before" talks only about the:
Termination Reporting
If a command (not followed by "&") terminates abnormally,
a message is printed. (All terminations other than exit
and interrupt are considered abnormal.)
So, the answer to the question is in the above lines, - in line what is already said in the comments:
Because read the exit code as an error code. 0-mean no error. >0 some error.
Probably because there's only one mode of success and many modes of failure.
Nice citation from the PWB shell manual
$r is the exit status code of the preceding command.
``0'' is the normal return from most commands.
Notice the: most word. :), e.g. it was not invented in "one moment" by someone, but it is evolved in time. And it very tightly depends on the implementation of the exit and wait-like calls in the underlaying OS.
For example, the 1st version of UNIX know nothing about the exit(N) levels - it was a simple exit() and the wait didn't return a specific value - from the The first edition of the Unix Programmer's Manual, dated November 3, 1971
exit is the normal means of terminating a process. All files are
closed and the parent process is notified if it is executing a wait.
wait causes its caller to delay until one of its child processes terminates. If any child has already died, return is immediate; if
there are no children, return is immediate with the error bit set.
Also, read here - really worth - The Evolution of the Unix Time-sharing System - about the evolution of the fork, exec, exit and wait...
The V7 standardized the exit status as:
The value of a simple command is its
exit status if it terminates normally or 200+status if it
terminates abnormally (see signal(2) for a list of status
values).
also:
DIAGNOSTICS
Errors detected by the shell, such as syntax errors cause
the shell to return a non-zero exit status. If the shell is
being used non interactively then execution of the shell
file is abandoned. Otherwise, the shell returns the exit
status of the last command executed (see also exit).
Howg.. OMG... Someone could be so nice as to edit this post and correct my broken English and/or extend/correct the above...
And finally - sorry folks - totally of the topic - but can't resist adding the following image from the history - not many current users know Microsoft-XENIX. ;)

In most processors, the processor status indicates if the last instruction produced a zero value. One can branch if zero (or nonzero) without explicitly comparing with zero (saving instruction cycles). If you make zero success, that gives multiple ways to return failure (and save instructions).
This is actually a very poor convention. The VMS operating system from the 70's remains more advanced then nearly every over system in use today. It had a centralized error system in which applications could add their own error messages and codes which would be integrated with system error codes. In the VMS system, odd values were success and even values were failures.
Let's say you had a function that opens a file. You could have one success code that says it opened an existing file and another that says it created a new file.

Related

are there any benefits of ending a bash script with exit

I encountered a bash script ending with the exit line. Would anything changes (save scaring users who 'source' rather than calling straight when the terminal closes )?
Note that I am not particularly interested in difference between exit and return. Here I am only interested in differences between having exit without parameters in the end of a bash script (one being closing console or process which sources the script rather than calling).
Could it be to address some less known shell dialects?
There are generally no benefits to doing this. There are only downsides, specifically the inability to source scripts like you say.
You can construct scenarios where it matters, such as having a sourceing script rely on it for termination on errors, or having a self-extracting archive header avoid executing its payload, but these unusual cases should not be the basis for a general guideline.
The one significant advantage is that it gives you explicit control over the return code.
Otherwise the return code of the script is going to be the return code of whatever the last command it executed happened to be. Which may or may not be indicative of the actual success or failure of the script as a whole.
A slightly less significant advantage is that if the last command's exit code is significant, and you follow it up with "exit $?" that tells the maintenance programmer coming along later that yes, you did consider what the exit code of the program should be and he shouldn't monkey with it without understanding why.
Conversely, of course, I wouldn't recommend ending a bash script with an explicit call to exit unless you really mean "ignore all previous exit codes and use this one". Because that's what anyone else looking at your code is going to assume you wanted and they're going to be annoyed that you wasted their time trying to figure out why if you did it just by rote and not for a reason.

Does bash support exception?

Is it possible to raise exceptions in bash? This can be useful, for example, when we want the script to exit when an error happens in a subcommand. Without exception, it seems the best we can do is to append || exit after each subcommand, which gives poor readability.
I didn't find descriptions about exceptions in bash manual. But I'm wondering whether there are ways to simulate them.
No, Bash does not have a notion of exceptions like other languages like Java have. The key unit of error-reporting in Bash is the exit code; functions, commands, and scripts all return 0 on success and non-zero to report some sort of error condition. Many programs document specific exit codes to report certain failure modes, for instance grep uses 1 to mean no-match-found and 2 to report other errors.
There are a number of useful debugging tricks you can take advantage of despite the lack of exceptions, including the caller command which enables some introspection of the current execution context.
Other resources:
How to debug a bash script?
Trace of executed programs called by bash script
Accessing function call stack in trap function

Exit command examples

I want to press a key at any point, causing the simulation to stop without loosing data collected until that point. I don't know how to do the exit command. Can you give me some examples?
I think, WandMaker's comment tells only half of the story.
First, there is no general rule, that Control-C will interrupt your program (see the for instance here), but assume that this works in your case (since it will work in many cases):
If I understand you write, you want to somehow "process" the data collected up to this point. This means that you need to intercept the effect of Control-C (which, IF it works as expected, will make the controlling shell deliver a SIGINT), or that you need to interecept the "exit" (since the default behaviour upon receiving a SIGINT would be to exit the program).
If you want to go along the first path, you need to catch the Interrupt exception; see for example here.
If you want to follow the second route, you need to install an exit handler. Note that it will be called too when the program is exited in the normal way.
If you are unsure, which way is better - and I see no general way to recommend one over the other -, try the first one. There is less chance that you will accidentally ruin something.

Why do some people exit -1 rather than exit 1 on error?

I'm writing a rather trivial bash script; if I detect an error (not with a bad exit status from some other process) I want to exit with an exit status indicating an error (without being too specific).
It seems like I should be doing exit 1 (e.g. as per the TLDP Advanced Bash Scripting Guide, and the C Standard Library's stdlib.h header); yet I notice many people exit -1. Why is that?
TLDP's ABS is of questionable validity (in that it often uses, without comment, sub-par practices) so I wouldn't take it as a particular bastion of correctness about this.
That said valid command return codes are between 0 and 255 with 0 being "success". So yes, 1 is a perfectly valid (and common) error code.
Obviously I cannot say for certain why other people do that but I have two thoughts on the topic.
A failure to context switch (possibly combined with a lack of domain knowledge).
In many languages a return value of -1 from a function is a perfectly valid value and stands out from all the positive values that might (one assumes) normally be returned.
So attempting to extend that pattern (which the writer has picked up over time) to a shell script/etc. is a reasonable thing for them to do. Especially if they don't have the domain knowledge to realize that valid return codes are between 0 255.
An attempt to have those error exit lines "stand out" from normal exit cases (which may or may not be successful exits themselves) in an attempt to visually distinguish a certain set of extremely unlikely or otherwise extraordinary exit cases.
An exit of -1 does, actually, work it just doesn't get you a return code of -1 it gets you a return code of 255. (Try (exit -1); echo $? in your shell to see that.) So this isn't an entirely unreasonable thing to want to do (despite being confusing and complicit in perpetrating a confusion about exit codes).

Is there a minimally POSIX.2 compliant shell?

Is there a minimally POSIX.2 compliant shell (let's call it mpcsh) in the following sense:
if mpcsh myscript.sh behaves correctly on my (compliant) system then xsh myscript.sh will behave identically for any POSIX.2 compliant shell xsh on any compliant system. ("Identically" up to less relevant things like the wording of error messages etc.)
Does dash qualify?
If not, is there any way to verify compliance of myscript.sh?
Edit (9 years later):
The accepted answer still stands, but have a look at this blog post and the checkbashisms command (source). Avoiding bashisms is not the same as writing a POSIX.2 compliant shell script, but it comes close.
The sad answer in advance
It won't help you (not as much and reliably as you would expect and want it to anyway).
Here is why.
One big problem that cannot be addressed by a virtual "POSIX shell" are things that are ambiguously worded or just not addressed in the standard, so that shells may implement things in different ways while still adhering to the standard.
Take these two examples regarding pipelines, the first of which is well known:
Example 1 - scoping
$ ksh -c 'printf "foo" | read s; echo "[${s}]"'
[foo]
$ bash -c 'printf "foo" | read s; echo "[${s}]"'
[]
ksh executes the last command of a pipe in the current shell, whereas bash executes all - including the last command - in a subshell. bash 4 introduced the lastpipe option which makes it behave like ksh:
$ bash -c 'shopt -s lastpipe; printf "foo" | read s; echo "[${s}]"'
[foo]
All of this is (debatably) according to the standard:
Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment.
I am not 100% certain on what they meant with extension, but based on other examples in the document it does not mean that the shell has to provide a way to switch between behavior but simply that it may, if it wishes so, implement things in this "extended way". Other people read this differently and argue about the ksh behavior being non-standards-compliant and I can see why. Not only is the wording unlucky, it is not a good idea to allow this in the first place.
In practice it doesn't really matter which behavior is correct since those are the """two big shells""" and people would think that if you don't use their extensions and only supposedly POSIX-compliant code that it will work in either, but the truth is that if you rely on one or the other behavior mentioned above your script can break in horrible ways.
Example 2 - redirection
This one I learnt about just a couple of days ago, see my answer here:
foo | bar 2>./qux | quux
Common sense and POLA tells me that when the next line of code is hit, both quux and bar should have finished running, meaning that the file ./qux is fully populated. Right? No.
POSIX states that
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)
May (!) wait for all commands to complete! WTH!
bash waits:
The shell waits for all commands in the pipeline to terminate before returning a value.
but ksh doesn't:
Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.
So if you use redirection inbetween a pipe, make sure you know what you are doing since this is treated differently and can horribly break on edge cases, depending on your code.
I could give another example not related to pipelines, but I hope these two suffice.
Conclusion
Having a standard is good, continuously revising it is even better and adhering to it is great. But if the standard fails due to ambiguity or permissiveness things can still unexpectedly break practically rendering the usefulness of the standard void.
What this means in practice is that on top of writing "POSIX-compliant" code you still need to think and know what you are doing to prevent certain things from happening.
All that being said, one shell which has not yet been mentioned is posh which is supposedly POSIX plus even fewer extensions than dash has, (primarily echo -n and the local keyword) according to its manpage:
BUGS
Any bugs in posh should be reported via the Debian BTS.
Legitimate bugs are inconsistencies between manpage and behavior,
and inconsistencies between behavior and Debian policy
(currently SUSv3 compliance with the following exceptions:
echo -n, binary -a and -o to test, local scoping).
YMMV.
Probably the closest thing to a canonical shell is ash which is maintained by The NetBSD Foundation, among other organizations.
A downstream variant of this shell called dash is better known.
Currently, there is no single role model for the POSIX shell.
Since the original Bourne shell, the POSIX shell has adopted a number of additional features.
All of the shells that I know that implement those features also have extensions that go beyond the feature set of the POSIX shell.
For instance, POSIX allows for arithmetic expressions in the format:
var=$(( expression ))
but it does not allow the equivalent:
(( var = expression ))
supported by bash and ksh93.
I know that bash has a set -o posix option, but that will not disable any extensions.
$ set -o posix
$ (( a = 1 + 1 ))
$ echo $a
2
To the best of my knowledge, ksh93 tries to conform to POSIX out of the box, but still allows extensions.
The POSIX developers spent years (not an exaggeration) wrestling with the question: "What does it mean for an application program to conform to the standard?" While the POSIX developers were able to define a conformance test suite for an implementation of the standards (POSIX.1 and POSIX.2), and could define the notion of a "strictly conforming application" as one which used no interface beyond the mandatory elements of the standard, they were unable to define a testing regime that would confirm that a particular application program was "strictly conforming" to POSIX.1, or that a shell script was "strictly conforming" to POSIX.2.
The original question seeks just that; a conformance test that verifies a script uses only elements of the standard which are fully specified. Alas, the standard is full of "weasel words" that loosen definitions of behavior, making such a test effectively impossible for a script of any significant level of usefulness. (This is true even setting aside the fact that shell scripts can generate and execute shell scripts, thus rendering the question of "strictly conforming" as equivalent to the Stopping Problem.)
(Full disclosure: I was a working member and committee leader within IEEE-CS TCOS, the creators of the POSIX family of standards, from 1988-1999.)
If not, is there any way to verify compliance of myscript.sh?
This is basically a case of Quality Assurance. Start with:
code review
unit tests (yes, I've done this)
functional tests
perform the test suite with as many different shell programs as you can find.
(ash, bash, dash, ksh93, mksh, zsh)
Personally, I aim for the common set of extensions as supported by bash and ksh93. They're the oldest and most widely available interpreters of the shell language available.
EDIT Recently I happened upon rylnd/shpec - a testing framework for your shell code. You can describe features of your code in test cases, and specify how they can be verified.
Disclosure: I helped making it work across bash, ksh, and dash.

Resources