Bash: graceful function death on error

Bash: graceful function death on error - bash

I'm trying to find a way to emulate the behavior of set -e in a function, but only within the scope of that function.
Basically, I want a function where if any simple command would trigger set -e it returns 1 up one level. The goal is to isolate sets of risky jobs into functions so that I can gracefully handle them.

If you want any failing command to return 1, you can achieve that by following each command with || return 1.
For instance:
false || return 1 # This will always return 1
I am a big fan of never letting any command fail without explicit handling. For my scripts, I am using an exception handling technique where I return errors in a way that is not return codes, and trap all errors (with bash traps). Any command with a non-zero return code automatically means an improperly handled situation or bug, and I prefer my scripts to fail as soon as such a situation occurs.

Caution: I highly advise against using this technique. If you run the function in a subshell environment, you almost get the behavior you desire. Consider:
#!/bin/bash
foo() ( # Use parens to get a sub-shell
set -e # Does not impact the main script
echo This is executed
false
echo This should *not* be executed
)
foo # Function call fails, returns 1
echo return: $?
# BUT: this is a good reason to avoid this technique
if foo; then # Set -e is invalid in the function
echo Foo returned 0!!
else
echo fail
fi
false # Demonstrates that set -e is not set for the script
echo ok

Seems like you are looking for "nested exceptions" somewhat like what Java gives. For your requirement of scoping it, how about doing a set -e at the beginning of the function and making sure to run set +e before returning from it?
Another idea, which is not efficient or convenient, is to call your function in a subshell:
# some code
(set -e; my_function)
if [[ $? -ne 0 ]]; then
# the function didn't succeed...
fi
# more code
In any case, please be aware that set -e is not the greatest way to handle errors in a shell script. There are way too many issues making it quite unreliable. See these related posts:
What does set -e mean in a bash script?
Error handling in Bash
The approach I take for large scripts that need to exist for a long time in a production environment is:
create a library of functions to do all the standard stuff
the library will have a wrapper around each standard action (say, mv, cp, mkdir, ln, rm, etc.) that would validate the arguments carefully and also handle exceptions
upon exception, the wrapper exits with a clear error message
the exit itself could be a library function, somewhat like this:
--
# library of common functions
trap '_error_handler' ERR
trap '_exit_handler' EXIT
trap '_int_handler' SIGINT
_error_handler() {
# appropriate code
}
# other handlers go here...
#
exit_if_error() {
error_code=${1:-0}
error_message=${2:-"Uknown error"}
[[ $error_code == 0 ]] && return 0 # it is all good
# this can be enhanced to print out the "stack trace"
>&2 printf "%s\n" $error_message
# out of here
my_exit $error_code
}
my_exit() {
exit_code=${1:-0}
_global_graceful_exit=1 # this can be checked by the "EXIT" trap handler
exit $exit_code
}
# simple wrapper for cp
my_cp() {
# add code to check arguments more effectively
cp $1 $2
exit_if_error $? "cp of '$1' to '$2' failed"
}
# main code
source /path/to/library.sh
...
my_cp file1 file2
# clutter-free code
This, along with effective use of trap to take action on ERR and EXIT events, would be a good way to write reliable shell scripts.

Doing more research, I found a solution I rather like in Google's Shell Style Guide. There are some seriously interesting suggestions here, but I think I'm going to go with this for readability:
if ! mv "${file_list}" "${dest_dir}/" ; then
echo "Unable to move ${file_list} to ${dest_dir}" >&2
exit "${E_BAD_MOVE}"
fi

Related

Can an approach be found to do set -e/ERR subshell trapping immune to && and || restrictions?

While the Bash man page states that:
The ERR trap is not executed if the failed command is ... part of a command executed in a && or || list ...
I hoped that code in a subshell would be in a different context and would not be subject to the above restriction. The code below shows that even subshells are not immune from this restriction:
#!/bin/bash
main()
{
local Arg="$1"
(
set -e
echo "In main: $Arg"
trap MyTrap ERR
$(exit 1)
echo "Should not get here"
)
return 1
}
MyTrap()
{
echo "In MyTrap"
}
main 1
[[ $? -eq 0 ]] || echo "failed"
echo
main 2 || echo "failed"
The above code has the following output:
In main: 1
In MyTrap
failed
In main: 2
Should not get here
failed
My present workaround is to use file persistence to save error states in MyTrap and then inspect return codes back in the caller. For example:
MyTrap()
{
echo "_ERROR_=$?" > $HOME/.persist
echo "In MyTrap"
}
main 1
[[ -f $HOME/.persist ]] && . $HOME/.persist || _ERROR_=0
[[ $_ERROR_ -eq 0 ]] || echo "failed"
The output of the above is now:
In main: 1
In MyTrap
failed
So, the question is: Can an approach be found to do set -e/ERR subshell trapping that is immune to && and || restrictions that is simpler than the workaround above?
Notes: This question applies to:
Bash 4.2 and higher
Red Hat 7, CentOS 7, and related distros (that is, not Debian, etc.)
No third-party software should be used. Packages must be available, for example, via OS-provider packages in the [CentOS-7-x86_64-DVD-1611.iso][2] repository iso (and similarly for RHEL 7, Fedora 7, etc).

This code is NOT a solution. It is a modified version that you can test to give you some ideas.
#!/bin/bash
main()
{
local Arg="$1"
(
echo "In main: $Arg"
$(exit 1)
echo "Should not get here"
)
return 1
}
MyTrap()
{
echo "In MyTrap"
exit 1
}
set -o errtrace
set -o functrace
trap MyTrap ERR
main 1
[[ $? -eq 0 ]] || echo "failed"
echo
main 2 || echo "failed"
It will not do what you would like it to do (I assume), because there is something important missing that is too involved to explain in a post, but there are a few key things.
If you want to handle errors uniformly in all your script, you will have to do the following.
Set the trap at the top level, and do NOT use set -e
Use the shell options that cause subshells and functions to inherit traps (shown in the example)
Create a framework where you differentiate between "expected" errors (explicitely handled as some kind of exception, not as return codes), and "unexpected" exceptions (bugs), which will be trapped.
Never use logical operators directly (except on test constructs or other simple statements you decide are safe enough)
Perform logical tests after collecting exceptions, not as tests on the return codes.
Building that framework is very tricky, but doable (I have it working flawlessly on dozens of very complex scripts). Once you have done it, if you chose to blindly test a command (without explicitly handling exceptions), if it fails you have the option to crash your script in the trap (exit) instead of continuing execution with your script in an unstable state.
This probably incurs a slight performance penalty (at least the way I did it, using a special try function preceding every statement for which exception handling is used), and sure requires a LOT of discipline in coding, but in my case it makes me way more productive building complex scripts with the reasonable confidence the location of any bug (not that I have any of those...) will be much easier to pinpoint.

Let bash treat error inside condition as error

I want to write a robust bash script and I use set -e to make sure the script will stop running whenever something goes wrong. It works great but there is still one hole: conditions. If I write if f; then .... fi and function f fails on executing some command, the script will not be terminated. Therefore I need to check return code of every command in f, as well as in all the subroutines f invoke, recursively. This is annoying.
Is there something, e.g. some flag or option in bash, that makes it fail even inside a condition. The only exception is return statement directly inside f. If f calls g and g returns 1, then it is still considered as error, with the exception that g is also called as condition, i.e. if g; then ... fi, then return statement inside g is allowed. So on so forth.

Succinctly, No.
If you want the shell to exit on failure, don't test f; simply run it. The code in the then clause should simply follow the invocation of f because you'll only ever get to execute it if f succeeded.
Old code:
set -e
if f; then echo "f succeeded"; fi
New code:
set -e
f
echo "f succeeded"
You'll only see "f succeeded" if it does succeed; if it fails, the set -e ensures the script exits.

What you're asking to do is slightly odd. I can imagine two possibilities for why you're asking:
You want the script to exit if anything goes wrong inside the body of the function f, even if the function itself still returns success.
You want to distinguish between the function successfully computing what turns out to be a legitimate false value, and failing.
For case 1 - don't do that. If anything goes wrong inside f, the function should absolutely return false (if not exit the whole script itself). And if it returns false, you're good - see #JonathanLeffler's answer.
For case 2, you're effectively trying to make $? serve two duties. It can be either an error indicator or a Boolean result; it can't be both.
If you want your script to exit if something goes wrong, you could just have the function itself call exit when that happens, so the caller doesn't have to worry about that case.
Alternatively, you could have the function echo its computed Boolean value instead of returning it, and reserve $? for success/failure. Something like this, maybe:
f() {
if that thing is true; then
echo 1
else
echo 0
fi
if something went wrong; then
return 1
else
return 0
fi
}
result=$(f) # we'll blow up and exit if f returns nonzero in `$?`
if (( result )); then
...
fi

You would have to distinguish between f being false and f encountering an error.
Traditionally both falsehood and error conditions are signaled in the same way: by setting $? to a non-zero value.
Some commands do make such a distinction. For example, the grep command sets $? to 0 if the pattern was found, 1 if it wasn't, and 2 if there was an error (such as a missing file). So for the grep command, you could do something like:
grep ...
case $? in
0) # ok
;;
1) # pattern not found
;;
*) # error
;;
esac
But that's specific to grep. There is no universal convention for distinguishing between a command yielding a "false" result and failing, and in many cases there is no such distinction to make.
To do what you want, you'll have to define just what constitutes an error, and then determine for each command you might execute how to detect an error condition.

How to find or make a Bash utility script library? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 23 days ago and failed to reopen the post:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
Improve this question
Is there any commonly used (or unjustly uncommonly used) utility "library" of bash functions? Something like Apache commons-lang for Java. Bash is so ubiquitous that it seems oddly neglected in the area of extension libraries.
If not, how would I make one?

Libraries for bash are out there, but not common. One of the reasons that bash libraries are scarce is due to the limitation of functions. I believe these limitations are best explained on "Greg's Bash Wiki":
Functions. Bash's "functions" have several issues:
Code reusability: Bash functions don't return anything; they only produce output streams. Every reasonable method of capturing that stream and either assigning it to a variable or passing it as an argument requires a SubShell, which breaks all assignments to outer scopes. (See also BashFAQ/084 for tricks to retrieve results from a function.) Thus, libraries of reusable functions are not feasible, as you can't ask a function to store its results in a variable whose name is passed as an argument (except by performing eval backflips).
Scope: Bash has a simple system of local scope which roughly resembles "dynamic scope" (e.g. Javascript, elisp). Functions see the locals of their callers (like Python's "nonlocal" keyword), but can't access a caller's positional parameters (except through BASH_ARGV if extdebug is enabled). Reusable functions can't be guaranteed free of namespace collisions unless you resort to weird naming rules to make conflicts sufficiently unlikely. This is particularly a problem if implementing functions that expect to be acting upon variable names from frame n-3 which may have been overwritten by your reusable function at n-2. Ksh93 can use the more common lexical scope rules by declaring functions with the "function name { ... }" syntax (Bash can't, but supports this syntax anyway).
Closures: In Bash, functions themselves are always global (have "file scope"), so no closures. Function definitions may be nested, but these are not closures, though they look very much the same. Functions are not "passable" (first-class), and there are no anonymous functions (lambdas). In fact, nothing is "passable", especially not arrays. Bash uses strictly call-by-value semantics (magic alias hack excepted).
There are many more complications involving: subshells; exported functions; "function collapsing" (functions that define or redefine other functions or themselves); traps (and their inheritance); and the way functions interact with stdio. Don't bite the newbie for not understanding all this. Shell functions are totally f***ed.
Source: http://mywiki.wooledge.org/BashWeaknesses
One example of a shell "library" is /etc/rc.d/functions on Redhat based system. This file contains functions commonly used in sysV init script.

I see some good info and bad info here. Let me share what I know since bash is the primary language I use at work (and we build libraries..).
Google has a decent write up on bash scripts in general that I thought was a good read: https://google.github.io/styleguide/shell.xml.
Let me start by saying you should not think of a bash library as you do libraries in other languages.
There are certain practices that must be enforced to keep a library in bash simple, organized, and most importantly, reusable.
There is no concept of returning anything from a bash function except for strings that it prints and the function's exit status (0-255).
There are expected limitations here and a learning curve especially if you're accustomed to functions of higher-level languages.
It can be weird at first, and if you find yourself in a situation where strings just aren't cutting it, you'll want to leverage an external tool such as jq.
If jq (or something like it) is available, you can start having your functions print formatted output to be parsed & utilized as you would an object, array, etc.
Function Declarations
There are two ways to declare a function in bash.
One operates within your current shell, we'll call is Fx0.
And one spawns a subshell to operate in, we'll call that Fx1.
Here are examples of how they're declared:
Fx0(){ echo "Hello from $FUNCNAME"; }
Fx1()( echo "Hello from $FUNCNAME" )
These 2 functions perform the same operation - indeed.
However, there is a key difference here.
Fx1 cannot perform any action that alters the current shell.
That means modifying variables, changing shell options and declaring other functions.
The latter is what can be exploited to prevent name spacing issues that can easily creep up on you.
# Fx1 cannot change the variable from a subshell
Fx0(){ Fx=0; }
Fx1()( Fx=1 )
Fx=foo; Fx0; echo $Fx
# 0
Fx=foo; Fx1; echo $Fx
# foo
That being said, The only time you should use an "Fx0" kind of function is when you're wanting to redeclare something in the current shell.
Always use "Fx1" functions because they are safer and you you don't have to worry about the naming of any functions declared within it.
As you can see below, the innocent function is overwritten inside of Fx1, however, it remains unscathed after the execution of Fx1.
innocent_function()(
echo ":)"
)
Fx1()(
innocent_function()( true )
innocent_function
)
Fx1 #prints nothing, just returns true
innocent_function
# :)
This would have (likely) unintended consequences if you had used curly braces.
Examples of useful "Fx0" type functions would be specifically for changing the current shell, like so:
use_strict(){
set -eEu -o pipefail
}
enable_debug(){
set -Tx
}
disable_debug(){
set +Tx
}
Regarding Declarations
The use of global variables, or at least those expected to have a value, is bad practice all the way around.
As you're building a library in bash, you don't ever want a function to rely on an external variable already being set.
Anything the function needs should be supplied to it via the positional parameters.
This is the main problem I see in libraries other folks try to build in bash.
Even if I find something cool, I can't use it because I don't know the names of the variables I need to have set ahead of time.
It leads to digging through all of the code and ultimately just picking out the useful pieces for myself.
By far, the best functions to create for a library are extremely small and don't utilize named variables at all, even locally.
Take the following for example:
serviceClient()(
showUsage()(
echo "This should be a help page"
) >&2
isValidArg()(
test "$(type -t "$1")" = "function"
)
isRunning()(
nc -zw1 "$(getHostname)" "$(getPortNumber)"
) &>/dev/null
getHostname()(
echo localhost
)
getPortNumber()(
echo 80
)
getStatus()(
if isRunning
then echo OK
else echo DOWN
fi
)
getErrorCount()(
grep -c "ERROR" /var/log/apache2/error.log
)
printDetails()(
echo "Service status: $(getStatus)"
echo "Errors logged: $(getErrorCount)"
)
if isValidArg "$1"
then "$1"
else showUsage
fi
)
Typically, what you would see near the top is local hostname=localhost and local port_number=80 which is fine, but it is not necessary.
It is my opinion that these things should be functional-ized as you're building to prevent future pain when all of a sudden some logic needs to be introduced for getting a value, like: if isHttps; then echo 443; else echo 80; fi.
You don't want that kind of logic placed in your main function or else you'll quickly make it ugly and unmanageable.
Now, serviceClient has internal functions that get declared upon invocation which adds an unnoticeable amount of overhead to each run.
The benefit is now you can have service2Client with functions (or external functions) that are named the same as what serviceClient has with absolutely no conflicts.
Another important thing to keep in mind is that redirections can be applied to an entire function upon declaring it. see: isRunning or showUsage
This gets as close to object-oriented-ness as I think you should bother using bash.
. serviceClient.sh
serviceClient
# This should be a help page
if serviceClient isRunning
then serviceClient printDetails
fi
# Service status: OK
# Errors logged: 0
I hope this helps my fellow bash hackers out there.

Here's a list of "worthy of your time" bash libraries that I found after spending an hour or so googling.
https://github.com/mietek/bashmenot/
bashmenot is a library that is used by Halcyon and Haskell on Heroku. The above link points to a complete list of available functions with examples -- impressive quality, quantity and documentation.
http://marcomaggi.github.io/docs/mbfl.html
MBFL offers a set of modules implementing common operations and a script template. Pretty mature project and still active on github
https://github.com/javier-lopez/learn/blob/master/sh/lib
You need to look at the code for a brief description and examples. It has a few years of development in its back.
https://github.com/martinburger/bash-common-helpers
This has the fewer most basic functions. For documentation you also have to look at the code.

Variables declared inside a function but without the local keyword are global.
It's good practice to declare variables only needed inside a function with local to avoid conflicts with other functions and globally (see foo() below).
Bash function libraries need to always be 'sourced'. I prefer using the 'source' synonym instead of the more common dot(.) so I can see it better during debugging.
The following technique works in at least bash 3.00.16 and 4.1.5...
#!/bin/bash
#
# TECHNIQUES
#
source ./TECHNIQUES.source
echo
echo "Send user prompts inside a function to stderr..."
foo() {
echo " Function foo()..." >&2 # send user prompts to stderr
echo " Echoing 'this is my data'..." >&2 # send user prompts to stderr
echo "this is my data" # this will not be displayed yet
}
#
fnRESULT=$(foo) # prints: Function foo()...
echo " foo() returned '$fnRESULT'" # prints: foo() returned 'this is my data'
echo
echo "Passing global and local variables..."
#
GLOBALVAR="Reusing result of foo() which is '$fnRESULT'"
echo " Outside function: GLOBALVAR=$GLOBALVAR"
#
function fn()
{
local LOCALVAR="declared inside fn() with 'local' keyword is only visible in fn()"
GLOBALinFN="declared inside fn() without 'local' keyword is visible globally"
echo
echo " Inside function fn()..."
echo " GLOBALVAR=$GLOBALVAR"
echo " LOCALVAR=$LOCALVAR"
echo " GLOBALinFN=$GLOBALinFN"
}
# call fn()...
fn
# call fnX()...
fnX
echo
echo " Outside function..."
echo " GLOBALVAR=$GLOBALVAR"
echo
echo " LOCALVAR=$LOCALVAR"
echo " GLOBALinFN=$GLOBALinFN"
echo
echo " LOCALVARx=$LOCALVARx"
echo " GLOBALinFNx=$GLOBALinFNx"
echo
The sourced function library is represented by...
#!/bin/bash
#
# TECHNIQUES.source
#
function fnX()
{
local LOCALVARx="declared inside fnX() with 'local' keyword is only visible in fnX()"
GLOBALinFNx="declared inside fnX() without 'local' keyword is visible globally"
echo
echo " Inside function fnX()..."
echo " GLOBALVAR=$GLOBALVAR"
echo " LOCALVARx=$LOCALVARx"
echo " GLOBALinFNx=$GLOBALinFNx"
}
Running TECHNIQUES produces the following output...
Send user prompts inside a function to stderr...
Function foo()...
Echoing 'this is my data'...
foo() returned 'this is my data'
Passing global and local variables...
Outside function: GLOBALVAR=Reusing result of foo() which is 'this is my data'
Inside function fn()...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVAR=declared inside fn() with 'local' keyword is only visible in fn()
GLOBALinFN=declared inside fn() without 'local' keyword is visible globally
Inside function fnX()...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVARx=declared inside fnX() with 'local' keyword is only visible in fnX()
GLOBALinFNx=declared inside fnX() without 'local' keyword is visible globally
Outside function...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVAR=
GLOBALinFN=declared inside fn() without 'local' keyword is visible globally
LOCALVARx=
GLOBALinFNx=declared inside fnX() without 'local' keyword is visible globally

I found a good but old article here that gave a comprehensive list of utility libraries:
http://dberkholz.com/2011/04/07/bash-shell-scripting-libraries/

I can tell you that the lack of available function libraries has nothing to do with Bash's limitations, but rather how Bash is used. Bash is a quick and dirty language made for automation, not development, so the need for a library is rare. Then, you start to have a fine line between a function that needs to be shared, and converting the function into a full fledged script to be called. This is from a coding perspective, to be loaded by a shell is another matter, but normally runs on personal taste, not need. So... again a lack of shared libraries.
Here are a few functions I use regularly
In my .bashrc
cd () {
local pwd="${PWD}/"; # we need a slash at the end so we can check for it, too
if [[ "$1" == "-e" ]]; then
shift
# start from the end
[[ "$2" ]] && builtin cd "${pwd%/$1/*}/${2:-$1}/${pwd##*/$1/}" || builtin cd "$#"
else
# start from the beginning
if [[ "$2" ]]; then
builtin cd "${pwd/$1/$2}"
pwd
else
builtin cd "$#"
fi
fi
}
And a version of a log()/err() exists in a function library at work for coders-- mainly so we all use the same style.
log() {
echo -e "$(date +%m.%d_%H:%M) $#"| tee -a $OUTPUT_LOG
}
err() {
echo -e "$(date +%m.%d_%H:%M) $#" |tee -a $OUTPUT_LOG
}
As you can see, the above utilities we use here, are not that exciting to share. I have another library to do tricks around bash limitations, which I think is the best use for them and I recommend creating your own.

How to handle errors reliably in a shell function called as a condition?

Using bash 4.1.5:
#!/bin/bash
set -e
foo()
{
false
echo "argh, I don't want to get there! What about set -e?!"
}
foo && echo "ok"
This yields the following output:
argh, I don't want to get there! What about set -e?!
ok
This issue occurs whenever foo is called as a condition (i.e inside if, while, &&, ||, etc.). foo behaves correctly if called as a simple command.
I find this behavior surprising and quite frankly dangerous, because this means that the behavior of a bash function changes depending on how it is called. For example, even something as simple as foo and foo && true will not yield the same results. This is very troubling! One can only imagine how much chaos this could cause if foo is doing sensitive operations...
Is there any workaround I could use to avoid this kind of situation?

Why don't you make foo() return a non-zero exit code if it fails?
foo(){
return 1
echo "argh, I don't want to get here! What about set -e?!"
}

The behavior you describe is expected, and quite necessary. Consider a function like:
word_is_in_file() {
grep $1 $2 > /dev/null
}
Now, consider a script that uses this function (sorry, this example is a bit contrived
since a real script would probably just invoke grep directly) to make a decision:
if word_is_in_file $word $file; then
do_something
else
do_something_else
fi
The definition of the function may be buried in a library of shell
functions that the author never sees. The author does not consider
the grep failure to be a failure, and would be very baffled if
the script terminated because of it.
A way to get the semantics you desire is to do something like:
foo() {
# This function will abort if errors are encountered, but
# the script will continue
sh -e -c '
false
echo not reached'
}
foo && echo not reached
echo reached
foo
echo not reached
The semantics of set -e are also set to not abort the script in
the "foo && ..." case for the same reason. It allows branching.

What is the purpose of the : (colon) GNU Bash builtin?

What is the purpose of a command that does nothing, being little more than a comment leader, but is actually a shell builtin in and of itself?
It's slower than inserting a comment into your scripts by about 40% per call, which probably varies greatly depending on the size of the comment. The only possible reasons I can see for it are these:
# poor man's delay function
for ((x=0;x<100000;++x)) ; do : ; done
# inserting comments into string of commands
command ; command ; : we need a comment in here for some reason ; command
# an alias for `true'
while : ; do command ; done
I guess what I'm really looking for is what historical application it might have had.

Historically, Bourne shells didn't have true and false as built-in commands. true was instead simply aliased to :, and false to something like let 0.
: is slightly better than true for portability to ancient Bourne-derived shells. As a simple example, consider having neither the ! pipeline operator nor the || list operator (as was the case for some ancient Bourne shells). This leaves the else clause of the if statement as the only means for branching based on exit status:
if command; then :; else ...; fi
Since if requires a non-empty then clause and comments don't count as non-empty, : serves as a no-op.
Nowadays (that is: in a modern context) you can usually use either : or true. Both are specified by POSIX, and some find true easier to read. However there is one interesting difference: : is a so-called POSIX special built-in, whereas true is a regular built-in.
Special built-ins are required to be built into the shell; Regular built-ins are only "typically" built in, but it isn't strictly guaranteed. There usually shouldn't be a regular program named : with the function of true in PATH of most systems.
Probably the most crucial difference is that with special built-ins, any variable set by the built-in - even in the environment during simple command evaluation - persists after the command completes, as demonstrated here using ksh93:
$ unset x; ( x=hi :; echo "$x" )
hi
$ ( x=hi true; echo "$x" )
$
Note that Zsh ignores this requirement, as does GNU Bash except when operating in POSIX compatibility mode, but all other major "POSIX sh derived" shells observe this including dash, ksh93, and mksh.
Another difference is that regular built-ins must be compatible with exec - demonstrated here using Bash:
$ ( exec : )
-bash: exec: :: not found
$ ( exec true )
$
POSIX also explicitly notes that : may be faster than true, though this is of course an implementation-specific detail.

I use it to easily enable/disable variable commands:
#!/bin/bash
if [[ "$VERBOSE" == "" || "$VERBOSE" == "0" ]]; then
vecho=":" # no "verbose echo"
else
vecho=echo # enable "verbose echo"
fi
$vecho "Verbose echo is ON"
Thus
$ ./vecho
$ VERBOSE=1 ./vecho
Verbose echo is ON
This makes for a clean script. This cannot be done with '#'.
Also,
: >afile
is one of the simplest ways to guarantee that 'afile' exists but is 0 length.

A useful application for : is if you're only interested in using parameter expansions for their side-effects rather than actually passing their result to a command.
In that case, you use the parameter expansion as an argument to either : or false depending upon whether you want an exit status of 0 or 1. An example might be
: "${var:=$1}"
Since : is a builtin, it should be pretty fast.

: can also be for block comment (similar to /* */ in C language). For example, if you want to skip a block of code in your script, you can do this:
: << 'SKIP'
your code block here
SKIP

Two more uses not mentioned in other answers:
Logging
Take this example script:
set -x
: Logging message here
example_command
The first line, set -x, makes the shell print out the command before running it. It's quite a useful construct. The downside is that the usual echo Log message type of statement now prints the message twice. The colon method gets round that. Note that you'll still have to escape special characters just like you would for echo.
Cron job titles
I've seen it being used in cron jobs, like this:
45 10 * * * : Backup for database ; /opt/backup.sh
This is a cron job that runs the script /opt/backup.sh every day at 10:45. The advantage of this technique is that it makes for better looking email subjects when the /opt/backup.sh prints some output.

It's similar to pass in Python.
One use would be to stub out a function until it gets written:
future_function () { :; }

If you'd like to truncate a file to zero bytes, useful for clearing logs, try this:
:> file.log

You could use it in conjunction with backticks (``) to execute a command without displaying its output, like this:
: `some_command`
Of course you could just do some_command > /dev/null, but the :-version is somewhat shorter.
That being said I wouldn't recommend actually doing that as it would just confuse people. It just came to mind as a possible use-case.

It's also useful for polyglot programs:
#!/usr/bin/env sh
':' //; exec "$(command -v node)" "$0" "$#"
~function(){ ... }
This is now both an executable shell-script and a JavaScript program: meaning ./filename.js, sh filename.js, and node filename.js all work.
(Definitely a little bit of a strange usage, but effective nonetheless.)
Some explication, as requested:
Shell-scripts are evaluated line-by-line; and the exec command, when run, terminates the shell and replaces it's process with the resultant command. This means that to the shell, the program looks like this:
#!/usr/bin/env sh
':' //; exec "$(command -v node)" "$0" "$#"
As long as no parameter expansion or aliasing is occurring in the word, any word in a shell-script can be wrapped in quotes without changing its' meaning; this means that ':' is equivalent to : (we've only wrapped it in quotes here to achieve the JavaScript semantics described below)
... and as described above, the first command on the first line is a no-op (it translates to : //, or if you prefer to quote the words, ':' '//'. Notice that the // carries no special meaning here, as it does in JavaScript; it's just a meaningless word that's being thrown away.)
Finally, the second command on the first line (after the semicolon), is the real meat of the program: it's the exec call which replaces the shell-script being invoked, with a Node.js process invoked to evaluate the rest of the script.
Meanwhile, the first line, in JavaScript, parses as a string-literal (':'), and then a comment, which is deleted; thus, to JavaScript, the program looks like this:
':'
~function(){ ... }
Since the string-literal is on a line by itself, it is a no-op statement, and is thus stripped from the program; that means that the entire line is removed, leaving only your program-code (in this example, the function(){ ... } body.)

Self-documenting functions
You can also use : to embed documentation in a function.
Assume you have a library script mylib.sh, providing a variety of functions. You could either source the library (. mylib.sh) and call the functions directly after that (lib_function1 arg1 arg2), or avoid cluttering your namespace and invoke the library with a function argument (mylib.sh lib_function1 arg1 arg2).
Wouldn't it be nice if you could also type mylib.sh --help and get a list of available functions and their usage, without having to manually maintain the function list in the help text?
#!/bin/bash
# all "public" functions must start with this prefix
LIB_PREFIX='lib_'
# "public" library functions
lib_function1() {
: This function does something complicated with two arguments.
:
: Parameters:
: ' arg1 - first argument ($1)'
: ' arg2 - second argument'
:
: Result:
: " it's complicated"
# actual function code starts here
}
lib_function2() {
: Function documentation
# function code here
}
# help function
--help() {
echo MyLib v0.0.1
echo
echo Usage: mylib.sh [function_name [args]]
echo
echo Available functions:
declare -f | sed -n -e '/^'$LIB_PREFIX'/,/^}$/{/\(^'$LIB_PREFIX'\)\|\(^[ \t]*:\)/{
s/^\('$LIB_PREFIX'.*\) ()/\n=== \1 ===/;s/^[ \t]*: \?['\''"]\?/ /;s/['\''"]\?;\?$//;p}}'
}
# main code
if [ "${BASH_SOURCE[0]}" = "${0}" ]; then
# the script was executed instead of sourced
# invoke requested function or display help
if [ "$(type -t - "$1" 2>/dev/null)" = function ]; then
"$#"
else
--help
fi
fi
A few comments about the code:
All "public" functions have the same prefix. Only these are meant to be invoked by the user, and to be listed in the help text.
The self-documenting feature relies on the previous point, and uses declare -f to enumerate all available functions, then filters them through sed to only display functions with the appropriate prefix.
It is a good idea to enclose the documentation in single quotes, to prevent undesired expansion and whitespace removal. You'll also need to be careful when using apostrophes/quotes in the text.
You could write code to internalize the library prefix, i.e. the user only has to type mylib.sh function1 and it gets translated internally to lib_function1. This is an exercise left to the reader.
The help function is named "--help". This is a convenient (i.e. lazy) approach that uses the library invoke mechanism to display the help itself, without having to code an extra check for $1. At the same time, it will clutter your namespace if you source the library. If you don't like that, you can either change the name to something like lib_help or actually check the args for --help in the main code and invoke the help function manually.

I saw this usage in a script and thought it was a good substitute for invoking basename within a script.
oldIFS=$IFS
IFS=/
for basetool in $0 ; do : ; done
IFS=$oldIFS
...
this is a replacement for the code: basetool=$(basename $0)

Another way, not yet mentioned here is the initialisation of parameters in infinite while-loops. Below is not the cleanest example, but it serves it's purpose.
#!/usr/bin/env bash
[ "$1" ] && foo=0 && bar="baz"
while : "${foo=2}" "${bar:=qux}"; do
echo "$foo"
(( foo == 3 )) && echo "$bar" && break
(( foo=foo+1 ))
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash: graceful function death on error - bash

Related

Can an approach be found to do set -e/ERR subshell trapping immune to && and || restrictions?

Let bash treat error inside condition as error

How to find or make a Bash utility script library? [closed]

How to handle errors reliably in a shell function called as a condition?

What is the purpose of the : (colon) GNU Bash builtin?

Categories

Resources