bash function name with dash is bad pratice? [duplicate] - bash

What are the syntax rules for identifiers, especially function and variable names, in Bash?
I wrote a Bash script and tested it on various versions of Bash on Ubuntu, Debian, Red Hat 5 and 6, and even an old Solaris 8 box. The script ran well, so it shipped.
Yet when a user tried it on SUSE machines, it gave a "not a valid identifier" error. Fortunately, my guess that there was an invalid character in the function name was right. The hyphens were messing it up.
The fact that a script that was at least somewhat tested would have completely different behaviour on another Bash or distro was disconcerting. How can I avoid this?

From the manual:
Shell Function Definitions
...
name () compound-command [redirection]
function name [()] compound-command [redirection]
name is defined elsewhere:
name A word consisting only of alphanumeric characters and under‐
scores, and beginning with an alphabetic character or an under‐
score. Also referred to as an identifier.
So hyphens are not valid. And yet, on my system, they do work...
$ bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

The question was about "the rules", which has been answered two different ways, each correct in some sense, depending on what you want to call "the rules". Just to flesh out #rici's point that you can shove about any character in a function name, I wrote a small bash script to try to check every possible (0-255) character as a function name, as well as as the second character of a function name:
#!/bin/bash
ASCII=( nul soh stx etx eot enq ack bel bs tab nl vt np cr so si dle \
dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us sp )
for((i=33; i < 127; ++i)); do
printf -v Hex "%x" $i
printf -v Chr "\x$Hex"
ASCII[$i]="$Chr"
done
ASCII[127]=del
for((i=128; i < 256; ++i)); do
ASCII[$i]=$(printf "0X%x" $i)
done
# ASCII table is now defined
function Test(){
Illegal=""
for((i=1; i <= 255; ++i)); do
Name="$(printf \\$(printf '%03o' $i))"
eval "function $1$Name(){ return 0; }; $1$Name ;" 2>/dev/null
if [[ $? -ne 0 ]]; then
Illegal+=" ${ASCII[$i]}"
# echo Illegal: "${ASCII[$i]}"
fi
done
printf "Illegal: %s\n" "$Illegal"
}
echo "$BASH_VERSION"
Test
Test "x"
# can we really do funky crap like this?
function [}{(){
echo "Let me take you to, funkytown!"
}
[}{ # why yes, we can!
# though editor auto-indent modes may punish us
I actually skip NUL (0x00), as that's the one character bash may object to finding in the input stream. The output from this script was:
4.4.0(1)-release
Illegal: soh tab nl sp ! " # $ % & ' ( ) * 0 1 2 3 4 5 6 7 8 9 ; < > \ ` { | } ~ del
Illegal: soh " $ & ' ( ) ; < > [ \ ` | del
Let me take you to, funkytown!
Note that bash happily lets me name my function "[}{". Probably my code is not quite rigorous enough to provide the exact rules for legality-in-practice, but it should give a flavor of what manner of abuse is possible.
I wish I could mark this answer "For mature audiences only."

Command identifiers and variable names have different syntaxes. A variable name is restricted to alphanumeric characters and underscore, not starting with a digit. A command name, on the other hand, can be just about anything which doesn't contain bash metacharacters (and even then, they can be quoted).
In bash, function names can be command names, as long as they would be parsed as a WORD without quotes. (Except that, for some reason, they cannot be integers.) However, that is a bash extension. If the target machine is using some other shell (such as dash), it might not work, since the Posix standard shell grammar only allows "NAME" in the function definition form (and also prohibits the use of reserved words).

From 3.3 Shell Functions:
Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed. Shell functions are executed in the current shell context; no new process is created to interpret them.
Functions are declared using this syntax:
name () compound-command [ redirections ]
or
function name [()] compound-command [ redirections ]
and from 2 Definitions:
name
A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

Note The biggest correction here is that newline is never allowed in a function name.
My answer:
Bash --posix: [a-zA-Z_][0-9a-zA-Z_]*
Bash 3.0-4.4: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
Bash 5.0: [^#%0-9\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
\1 and \x7f works now
Bash 5.1: [^#%\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
Numbers can come first?! Yep!
Any bash 3-5: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
Same as 3.0-4.4
My suggestion (opinion): [^#%0-9\0-\f "$&'();<>\`|\x7f-\xff][^\0-\f "$&'();<>\`|\x7f-\xff]
Positive version: [!*+,-./:=?#A-Z\[\]^_a-z{}~][#%0-9!*+,-./:=?#A-Z\[\]^_a-z{}~]*
My version of the test:
for ((x=1; x<256; x++)); do
hex="$(printf "%02x" $x)"
name="$(printf \\x${hex})"
if [ "${x}" = "10" ]; then
name=$'\n'
fi
if [ "$(echo -n "${name}" | xxd | awk '{print $2}')" != "${hex}" ]; then
echo "$x failed first sanity check"
fi
(
eval "function ${name}(){ echo ${x};}" &>/dev/null
if test "$("${name}" 2>/dev/null)" != "${x}"; then
eval "function ok${name}doe(){ echo ${x};}" &>/dev/null
if test "$(type -t okdoe 2>/dev/null)" = "function"; then
echo "${x} failed second sanity test"
fi
if test "$("ok${name}doe" 2>/dev/null)" != "${x}"; then
echo "${x}(${name}) never works"
else
echo "${x}(${name}) cannot be first"
fi
else
# Just assume everything over 128 is hard, unless this says otherwise
if test "${x}" -gt 127; then
if declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is actually not difficult"
declare -pF | grep "declare -f \x${hex}" | xxd
fi
elif ! declare -pF | grep -q "declare -f \x${hex}"; then
echo "${x} works, but is difficult in bash"
fi
fi
)
done
Some additional notes:
Characters 1-31 are less than ideal, as they are more difficult to type.
Characters 128-255 are even less ideal in bash (except on bash 3.2 on macOS. It might be compiled differently?) because commands like declare -pF do not render the special characters, even though they are there in memory. This means any introspection code will incorrectly assume that these functions are not there. However, features like compgen still correctly render the characters.
Out of my testing scope, but some unicode does work too, although it's extra hard to paste/type on macOS over ssh.

This script tests all valid chars for
function names with 1 char.
It outputs 53 valid chars (a-zA-Z and underscore) using
a POSIX shell and 220 valid chars with BASH v4.4.12.
The Answer from Ron Burk is valid, but lacks the numbers.
#!/bin/sh
FILE='/tmp/FOO'
I=0
VALID=0
while [ $I -lt 256 ]; do {
NAME="$( printf \\$( printf '%03o' $I ))"
I=$(( I + 1 ))
>"$FILE"
( eval "$NAME(){ rm $FILE;}; $NAME" 2>/dev/null )
if [ -f "$FILE" ]; then
rm "$FILE"
else
VALID=$(( VALID + 1 ))
echo "$VALID/256 - OK: $NAME"
fi
} done

Related

How can I multiply a character or string in native POSIX shell script?

This is not a duplicate of the other question for two reasons. That question did not specify that the answer had to be POSIX. The marked answer for that question does not run correctly in a Dash shell. (I don't know how to tell if the answer is POSIX but it didn't run on the latest version of Dash.
To clarify, I would like to only use POSIX shell builtins. I should have made this clearer. As far as I know, seq is a GNU util, which means I would have to spawn an external program. I would like to not spawn any external programs.
I would like to know the most efficient way to multiply a string in a shell script. For example the script would perform as follows...
./multiplychar "*" "5"
*****
./multiplychar "*" "1"
*
./multiplychar "*" "0"
# It would not output anything
I cannot seem to find it anymore, but I found this question for bash a long time ago. All the answers, however, either used an external program like Perl e.x. perl -E "print '$' x $2" or used bashisms.
I would like to know how to accomplish this natively in a POSIX shell. I am using dash. If there are multiple ways, in case it's relevant for efficiency sake, I am dealing with small numbers less than 100. (It's a script for a strictly text based volume bar notification.)
Shell: dash
OS: Arch Linux
The shortest almost-way I found is this, no loops, no seq:
(EDITED based on MemReflect's comment)
$ printf "%9s" ' ' | tr ' ' x
xxxxxxxxx
So you could do:
#!/bin/sh
if [ $2 -eq 0 ]; then
exit
fi
printf "%${2}s\n" ' ' | tr ' ' "$1"
$ ./multiplychar "*" "5"
*****
$ ./multiplychar "*" "1"
*
$ ./multiplychar "*" "0"
$
Fully POSIX way of doing it would be using the following script. This script doesn't create any external process, only using the built-in functionality.
#!/bin/sh
_seq() (
i=0 buf=
while [ "$(( i += 1 ))" -le "$1" ]; do
buf="$buf $i "
done
printf '%s' "$buf"
)
buf=
for i in $(_seq "$2"); do
buf="$buf$1"
done
printf '%s\n' "$buf"
Calling this script ./multiplychar "*" 5 will print *****.
Here is how this script works,
The _seq() function runs in a subshell so that it doesn't affect any variables. Calling _seq 5 will output " 1 2 3 4 5 ". Whitespaces don't matter on the for loop.
In both the _seq() function and the main function, the output is stored in a variable named $buf so that we call printf only once.
This should be the most efficient and POSIX way to have this functionality on your shell.
This short bash script may be useful. It gives the output you want (tested on macOS using GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)). I have not tested it with dash (not really using it), but this does not appear to use any bashisms.
#!/usr/bin/env bash
str=''
for i in `seq 1 $2` ; do
str="${str}${1}"
done
echo "${str}"
If seq is not POSIX-compliant or dash-compliant, maybe this will do the trick (again, tested with bash only):
#!/bin/sh
i=1
str=''
while [ "$i" -le "$2" ]
do
i=`expr $i + 1`
str="${str}${1}"
done
echo "${str}"
Example:
$ ./test1.sh '*' 5
*****
$ ./test1.sh '*' 1
*
$ ./test1.sh '*' 0
The * of course has to be quoted, but the number does not.
A simple loopless POSIX function that builds a long enough (125 chars) string, and uses the shell's builtin printf to print some of it:
multiplychar() { n="$1"
n="$n$n$n$n$n"
n="$n$n$n$n$n"
n="$n$n$n$n$n"
printf %."$2"s\\n "$n" ; }
Test run:
$ multiplychar x 5
xxxxx
$ multiplychar x 0
$ multiplychar b 9
bbbbbbbbb
Note that the above code is really a multiply string function:
$ multiplychar yow! 12
yow!yow!yow!
...it prints 12 chars worth of the repeated string. If the first char of the string is all that's desired, changing the first line of code will truncate the string:
multiplychar() { n="${1%%${1#?}}"
n="$n$n$n$n$n"
n="$n$n$n$n$n"
n="$n$n$n$n$n"
printf %."$2"s\\n "$n" ; }
Test:
$ multiplychar yow! 12
yyyyyyyyyyyy

What does nested parenthese mean in bash? [duplicate]

I am confused by the usage of brackets, parentheses, curly braces in Bash, as well as the difference between their double or single forms. Is there a clear explanation?
In Bash, test and [ are shell builtins.
The double bracket, which is a shell keyword, enables additional functionality. For example, you can use && and || instead of -a and -o and there's a regular expression matching operator =~.
Also, in a simple test, double square brackets seem to evaluate quite a lot quicker than single ones.
$ time for ((i=0; i<10000000; i++)); do [[ "$i" = 1000 ]]; done
real 0m24.548s
user 0m24.337s
sys 0m0.036s
$ time for ((i=0; i<10000000; i++)); do [ "$i" = 1000 ]; done
real 0m33.478s
user 0m33.478s
sys 0m0.000s
The braces, in addition to delimiting a variable name are used for parameter expansion so you can do things like:
Truncate the contents of a variable
$ var="abcde"; echo ${var%d*}
abc
Make substitutions similar to sed
$ var="abcde"; echo ${var/de/12}
abc12
Use a default value
$ default="hello"; unset var; echo ${var:-$default}
hello
and several more
Also, brace expansions create lists of strings which are typically iterated over in loops:
$ echo f{oo,ee,a}d
food feed fad
$ mv error.log{,.OLD}
(error.log is renamed to error.log.OLD because the brace expression
expands to "mv error.log error.log.OLD")
$ for num in {000..2}; do echo "$num"; done
000
001
002
$ echo {00..8..2}
00 02 04 06 08
$ echo {D..T..4}
D H L P T
Note that the leading zero and increment features weren't available before Bash 4.
Thanks to gboffi for reminding me about brace expansions.
Double parentheses are used for arithmetic operations:
((a++))
((meaning = 42))
for ((i=0; i<10; i++))
echo $((a + b + (14 * c)))
and they enable you to omit the dollar signs on integer and array variables and include spaces around operators for readability.
Single brackets are also used for array indices:
array[4]="hello"
element=${array[index]}
Curly brace are required for (most/all?) array references on the right hand side.
ephemient's comment reminded me that parentheses are also used for subshells. And that they are used to create arrays.
array=(1 2 3)
echo ${array[1]}
2
A single bracket ([) usually actually calls a program named [; man test or man [ for more info. Example:
$ VARIABLE=abcdef
$ if [ $VARIABLE == abcdef ] ; then echo yes ; else echo no ; fi
yes
The double bracket ([[) does the same thing (basically) as a single bracket, but is a bash builtin.
$ VARIABLE=abcdef
$ if [[ $VARIABLE == 123456 ]] ; then echo yes ; else echo no ; fi
no
Parentheses (()) are used to create a subshell. For example:
$ pwd
/home/user
$ (cd /tmp; pwd)
/tmp
$ pwd
/home/user
As you can see, the subshell allowed you to perform operations without affecting the environment of the current shell.
(a) Braces ({}) are used to unambiguously identify variables. Example:
$ VARIABLE=abcdef
$ echo Variable: $VARIABLE
Variable: abcdef
$ echo Variable: $VARIABLE123456
Variable:
$ echo Variable: ${VARIABLE}123456
Variable: abcdef123456
(b) Braces are also used to execute a sequence of commands in the current shell context, e.g.
$ { date; top -b -n1 | head ; } >logfile
# 'date' and 'top' output are concatenated,
# could be useful sometimes to hunt for a top loader )
$ { date; make 2>&1; date; } | tee logfile
# now we can calculate the duration of a build from the logfile
There is a subtle syntactic difference with ( ), though (see bash reference) ; essentially, a semicolon ; after the last command within braces is a must, and the braces {, } must be surrounded by spaces.
Brackets
if [ CONDITION ] Test construct
if [[ CONDITION ]] Extended test construct
Array[1]=element1 Array initialization
[a-z] Range of characters within a Regular Expression
$[ expression ] A non-standard & obsolete version of $(( expression )) [1]
[1] http://wiki.bash-hackers.org/scripting/obsolete
Curly Braces
${variable} Parameter substitution
${!variable} Indirect variable reference
{ command1; command2; . . . commandN; } Block of code
{string1,string2,string3,...} Brace expansion
{a..z} Extended brace expansion
{} Text replacement, after find and xargs
Parentheses
( command1; command2 ) Command group executed within a subshell
Array=(element1 element2 element3) Array initialization
result=$(COMMAND) Command substitution, new style
>(COMMAND) Process substitution
<(COMMAND) Process substitution
Double Parentheses
(( var = 78 )) Integer arithmetic
var=$(( 20 + 5 )) Integer arithmetic, with variable assignment
(( var++ )) C-style variable increment
(( var-- )) C-style variable decrement
(( var0 = var1<98?9:21 )) C-style ternary operation
I just wanted to add these from TLDP:
~:$ echo $SHELL
/bin/bash
~:$ echo ${#SHELL}
9
~:$ ARRAY=(one two three)
~:$ echo ${#ARRAY}
3
~:$ echo ${TEST:-test}
test
~:$ echo $TEST
~:$ export TEST=a_string
~:$ echo ${TEST:-test}
a_string
~:$ echo ${TEST2:-$TEST}
a_string
~:$ echo $TEST2
~:$ echo ${TEST2:=$TEST}
a_string
~:$ echo $TEST2
a_string
~:$ export STRING="thisisaverylongname"
~:$ echo ${STRING:4}
isaverylongname
~:$ echo ${STRING:6:5}
avery
~:$ echo ${ARRAY[*]}
one two one three one four
~:$ echo ${ARRAY[*]#one}
two three four
~:$ echo ${ARRAY[*]#t}
one wo one hree one four
~:$ echo ${ARRAY[*]#t*}
one wo one hree one four
~:$ echo ${ARRAY[*]##t*}
one one one four
~:$ echo $STRING
thisisaverylongname
~:$ echo ${STRING%name}
thisisaverylong
~:$ echo ${STRING/name/string}
thisisaverylongstring
The difference between test, [ and [[ is explained in great details in the BashFAQ.
(Note: The link shows many examples for comparison)
To cut a long story short: test implements the old, portable syntax of
the command. In almost all shells (the oldest Bourne shells are the
exception), [ is a synonym for test (but requires a final argument of
]). Although all modern shells have built-in implementations of [,
there usually still is an external executable of that name, e.g.
/bin/[.
[[ is a new, improved version of it, and it is a keyword, not a program.
This has beneficial effects on the ease of use, as shown below. [[ is
understood by KornShell and BASH (e.g. 2.03), but not by the older
POSIX or BourneShell.
And the conclusion:
When should the new test command [[ be used, and when the old one [?
If portability/conformance to POSIX or the BourneShell is a concern, the old syntax should
be used. If on the other hand the script requires BASH, Zsh, or KornShell,
the new syntax is usually more flexible.
Parentheses in function definition
Parentheses () are being used in function definition:
function_name () { command1 ; command2 ; }
That is the reason you have to escape parentheses even in command parameters:
$ echo (
bash: syntax error near unexpected token `newline'
$ echo \(
(
$ echo () { command echo The command echo was redefined. ; }
$ echo anything
The command echo was redefined.
Some common and handy uses for brackets, parenthesis, and braces
As mentioned above, sometimes you want a message displayed without losing the return value. This is a handy snippet:
$ [ -f go.mod ] || { echo 'File not found' && false; }
This produced no output and a 0 (true) return value if the file go.mod exists in the current directory. Test the result:
$ echo $?
0
If the file does not exist, you get the message but also a return value of 1 (false), which can also be tested:
$ [ -f fake_file ] || { echo 'File not found'; false; }
File not found
$ echo $?
1
You can also simply create a function to check if a file exists:
fileexists() { [ -f "$1" ]; }
or if a file is readable (not corrupted, have permissions, etc.):
canread() { [ -r "$1" ]; }
or if it is a directory:
isdir() { [ -d "$1" ]; }
or is writable for the current user:
canwrite() { [ -w "$1" ]; }
or if a file exists and is not empty (like a log file with content...)
isempty() { [ -s "$1" ]; }
There are more details at: TLDP
You can also see if a program exists and is available on the path:
exists () { command -v $1 > /dev/null 2>&1; }
This is useful in scripts, for example:
# gitit does an autosave commit to the current
# if Git is installed and available.
# If git is not available, it will use brew
# (on macOS) to install it.
#
# The first argument passed, if any, is used as
# the commit message; otherwise the default is used.
gitit() {
$(exists git) && {
git add --all;
git commit -m "${1:-'GitBot: dev progress autosave'}";
git push;
} || brew install git;
}
Additional info about How to use parentheses to group and expand expressions:
(it is listed on the link syntax-brackets)
Some main points in there:
Group commands in a sub-shell: ( )
(list)
Group commands in the current shell: { }
{ list; }
Test - return the binary result of an expression: [[ ]]
[[ expression ]]
Arithmetic expansion
The format for Arithmetic expansion is:
$(( expression ))
The format for a simple Arithmetic Evaluation is:
(( expression ))
Combine multiple expressions
( expression )
(( expr1 && expr2 ))
Truncate the contents of a variable
$ var="abcde"; echo ${var%d*}
abc
Make substitutions similar to sed
$ var="abcde"; echo ${var/de/12}
abc12
Use a default value
$ default="hello"; unset var; echo ${var:-$default}
hello

BASH - never execute unless environment variable is defined and certain value [duplicate]

I've got a few Unix shell scripts where I need to check that certain environment variables are set before I start doing stuff, so I do this sort of thing:
if [ -z "$STATE" ]; then
echo "Need to set STATE"
exit 1
fi
if [ -z "$DEST" ]; then
echo "Need to set DEST"
exit 1
fi
which is a lot of typing. Is there a more elegant idiom for checking that a set of environment variables is set?
EDIT: I should mention that these variables have no meaningful default value - the script should error out if any are unset.
Parameter Expansion
The obvious answer is to use one of the special forms of parameter expansion:
: ${STATE?"Need to set STATE"}
: ${DEST:?"Need to set DEST non-empty"}
Or, better (see section on 'Position of double quotes' below):
: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"
The first variant (using just ?) requires STATE to be set, but STATE="" (an empty string) is OK — not exactly what you want, but the alternative and older notation.
The second variant (using :?) requires DEST to be set and non-empty.
If you supply no message, the shell provides a default message.
The ${var?} construct is portable back to Version 7 UNIX and the Bourne Shell (1978 or thereabouts). The ${var:?} construct is slightly more recent: I think it was in System III UNIX circa 1981, but it may have been in PWB UNIX before that. It is therefore in the Korn Shell, and in the POSIX shells, including specifically Bash.
It is usually documented in the shell's man page in a section called Parameter Expansion. For example, the bash manual says:
${parameter:?word}
Display Error if Null or Unset. If parameter is null or unset, the expansion of word (or a message to that effect if word is not present) is written to the standard error and the shell, if it is not interactive, exits. Otherwise, the value of parameter is substituted.
The Colon Command
I should probably add that the colon command simply has its arguments evaluated and then succeeds. It is the original shell comment notation (before '#' to end of line). For a long time, Bourne shell scripts had a colon as the first character. The C Shell would read a script and use the first character to determine whether it was for the C Shell (a '#' hash) or the Bourne shell (a ':' colon). Then the kernel got in on the act and added support for '#!/path/to/program' and the Bourne shell got '#' comments, and the colon convention went by the wayside. But if you come across a script that starts with a colon, now you will know why.
Position of double quotes
blong asked in a comment:
Any thoughts on this discussion? https://github.com/koalaman/shellcheck/issues/380#issuecomment-145872749
The gist of the discussion is:
… However, when I shellcheck it (with version 0.4.1), I get this message:
In script.sh line 13:
: ${FOO:?"The environment variable 'FOO' must be set and non-empty"}
^-- SC2086: Double quote to prevent globbing and word splitting.
Any advice on what I should do in this case?
The short answer is "do as shellcheck suggests":
: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"
To illustrate why, study the following. Note that the : command doesn't echo its arguments (but the shell does evaluate the arguments). We want to see the arguments, so the code below uses printf "%s\n" in place of :.
$ mkdir junk
$ cd junk
$ > abc
$ > def
$ > ghi
$
$ x="*"
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
abc
def
ghi
$ unset x
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
bash: x: You must set x
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
bash: x: You must set x
$ x="*"
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
*
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
abc
def
ghi
$ x=
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$ unset x
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$
Note how the value in $x is expanded to first * and then a list of file names when the overall expression is not in double quotes. This is what shellcheck is recommending should be fixed. I have not verified that it doesn't object to the form where the expression is enclosed in double quotes, but it is a reasonable assumption that it would be OK.
Try this:
[ -z "$STATE" ] && echo "Need to set STATE" && exit 1;
Your question is dependent on the shell that you are using.
Bourne shell leaves very little in the way of what you're after.
BUT...
It does work, just about everywhere.
Just try and stay away from csh. It was good for the bells and whistles it added, compared the Bourne shell, but it is really creaking now. If you don't believe me, just try and separate out STDERR in csh! (-:
There are two possibilities here. The example above, namely using:
${MyVariable:=SomeDefault}
for the first time you need to refer to $MyVariable. This takes the env. var MyVariable and, if it is currently not set, assigns the value of SomeDefault to the variable for later use.
You also have the possibility of:
${MyVariable:-SomeDefault}
which just substitutes SomeDefault for the variable where you are using this construct. It doesn't assign the value SomeDefault to the variable, and the value of MyVariable will still be null after this statement is encountered.
Surely the simplest approach is to add the -u switch to the shebang (the line at the top of your script), assuming you’re using bash:
#!/bin/sh -u
This will cause the script to exit if any unbound variables lurk within.
${MyVariable:=SomeDefault}
If MyVariable is set and not null, it will reset the variable value (= nothing happens).
Else, MyVariable is set to SomeDefault.
The above will attempt to execute ${MyVariable}, so if you just want to set the variable do:
MyVariable=${MyVariable:=SomeDefault}
In my opinion the simplest and most compatible check for #!/bin/sh is:
if [ "$MYVAR" = "" ]
then
echo "Does not exist"
else
echo "Exists"
fi
Again, this is for /bin/sh and is compatible also on old Solaris systems.
bash 4.2 introduced the -v operator which tests if a name is set to any value, even the empty string.
$ unset a
$ b=
$ c=
$ [[ -v a ]] && echo "a is set"
$ [[ -v b ]] && echo "b is set"
b is set
$ [[ -v c ]] && echo "c is set"
c is set
I always used:
if [ "x$STATE" == "x" ]; then echo "Need to set State"; exit 1; fi
Not that much more concise, I'm afraid.
Under CSH you have $?STATE.
For future people like me, I wanted to go a step forward and parameterize the var name, so I can loop over a variable sized list of variable names:
#!/bin/bash
declare -a vars=(NAME GITLAB_URL GITLAB_TOKEN)
for var_name in "${vars[#]}"
do
if [ -z "$(eval "echo \$$var_name")" ]; then
echo "Missing environment variable $var_name"
exit 1
fi
done
We can write a nice assertion to check a bunch of variables all at once:
#
# assert if variables are set (to a non-empty string)
# if any variable is not set, exit 1 (when -f option is set) or return 1 otherwise
#
# Usage: assert_var_not_null [-f] variable ...
#
function assert_var_not_null() {
local fatal var num_null=0
[[ "$1" = "-f" ]] && { shift; fatal=1; }
for var in "$#"; do
[[ -z "${!var}" ]] &&
printf '%s\n' "Variable '$var' not set" >&2 &&
((num_null++))
done
if ((num_null > 0)); then
[[ "$fatal" ]] && exit 1
return 1
fi
return 0
}
Sample invocation:
one=1 two=2
assert_var_not_null one two
echo test 1: return_code=$?
assert_var_not_null one two three
echo test 2: return_code=$?
assert_var_not_null -f one two three
echo test 3: return_code=$? # this code shouldn't execute
Output:
test 1: return_code=0
Variable 'three' not set
test 2: return_code=1
Variable 'three' not set
More such assertions here: https://github.com/codeforester/base/blob/master/lib/assertions.sh
This can be a way too:
if (set -u; : $HOME) 2> /dev/null
...
...
http://unstableme.blogspot.com/2007/02/checks-whether-envvar-is-set-or-not.html
None of the above solutions worked for my purposes, in part because I checking the environment for an open-ended list of variables that need to be set before starting a lengthy process. I ended up with this:
mapfile -t arr < variables.txt
EXITCODE=0
for i in "${arr[#]}"
do
ISSET=$(env | grep ^${i}= | wc -l)
if [ "${ISSET}" = "0" ];
then
EXITCODE=-1
echo "ENV variable $i is required."
fi
done
exit ${EXITCODE}
Rather than using external shell scripts I tend to load in functions in my login shell. I use something like this as a helper function to check for environment variables rather than any set variable:
is_this_an_env_variable ()
local var="$1"
if env |grep -q "^$var"; then
return 0
else
return 1
fi
}
The $? syntax is pretty neat:
if [ $?BLAH == 1 ]; then
echo "Exists";
else
echo "Does not exist";
fi

What's a concise way to check that environment variables are set in a Unix shell script?

I've got a few Unix shell scripts where I need to check that certain environment variables are set before I start doing stuff, so I do this sort of thing:
if [ -z "$STATE" ]; then
echo "Need to set STATE"
exit 1
fi
if [ -z "$DEST" ]; then
echo "Need to set DEST"
exit 1
fi
which is a lot of typing. Is there a more elegant idiom for checking that a set of environment variables is set?
EDIT: I should mention that these variables have no meaningful default value - the script should error out if any are unset.
Parameter Expansion
The obvious answer is to use one of the special forms of parameter expansion:
: ${STATE?"Need to set STATE"}
: ${DEST:?"Need to set DEST non-empty"}
Or, better (see section on 'Position of double quotes' below):
: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"
The first variant (using just ?) requires STATE to be set, but STATE="" (an empty string) is OK — not exactly what you want, but the alternative and older notation.
The second variant (using :?) requires DEST to be set and non-empty.
If you supply no message, the shell provides a default message.
The ${var?} construct is portable back to Version 7 UNIX and the Bourne Shell (1978 or thereabouts). The ${var:?} construct is slightly more recent: I think it was in System III UNIX circa 1981, but it may have been in PWB UNIX before that. It is therefore in the Korn Shell, and in the POSIX shells, including specifically Bash.
It is usually documented in the shell's man page in a section called Parameter Expansion. For example, the bash manual says:
${parameter:?word}
Display Error if Null or Unset. If parameter is null or unset, the expansion of word (or a message to that effect if word is not present) is written to the standard error and the shell, if it is not interactive, exits. Otherwise, the value of parameter is substituted.
The Colon Command
I should probably add that the colon command simply has its arguments evaluated and then succeeds. It is the original shell comment notation (before '#' to end of line). For a long time, Bourne shell scripts had a colon as the first character. The C Shell would read a script and use the first character to determine whether it was for the C Shell (a '#' hash) or the Bourne shell (a ':' colon). Then the kernel got in on the act and added support for '#!/path/to/program' and the Bourne shell got '#' comments, and the colon convention went by the wayside. But if you come across a script that starts with a colon, now you will know why.
Position of double quotes
blong asked in a comment:
Any thoughts on this discussion? https://github.com/koalaman/shellcheck/issues/380#issuecomment-145872749
The gist of the discussion is:
… However, when I shellcheck it (with version 0.4.1), I get this message:
In script.sh line 13:
: ${FOO:?"The environment variable 'FOO' must be set and non-empty"}
^-- SC2086: Double quote to prevent globbing and word splitting.
Any advice on what I should do in this case?
The short answer is "do as shellcheck suggests":
: "${STATE?Need to set STATE}"
: "${DEST:?Need to set DEST non-empty}"
To illustrate why, study the following. Note that the : command doesn't echo its arguments (but the shell does evaluate the arguments). We want to see the arguments, so the code below uses printf "%s\n" in place of :.
$ mkdir junk
$ cd junk
$ > abc
$ > def
$ > ghi
$
$ x="*"
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
abc
def
ghi
$ unset x
$ printf "%s\n" ${x:?You must set x} # Careless; not recommended
bash: x: You must set x
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
bash: x: You must set x
$ x="*"
$ printf "%s\n" "${x:?You must set x}" # Careful: should be used
*
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
abc
def
ghi
$ x=
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$ unset x
$ printf "%s\n" ${x:?"You must set x"} # Not quite careful enough
bash: x: You must set x
$
Note how the value in $x is expanded to first * and then a list of file names when the overall expression is not in double quotes. This is what shellcheck is recommending should be fixed. I have not verified that it doesn't object to the form where the expression is enclosed in double quotes, but it is a reasonable assumption that it would be OK.
Try this:
[ -z "$STATE" ] && echo "Need to set STATE" && exit 1;
Your question is dependent on the shell that you are using.
Bourne shell leaves very little in the way of what you're after.
BUT...
It does work, just about everywhere.
Just try and stay away from csh. It was good for the bells and whistles it added, compared the Bourne shell, but it is really creaking now. If you don't believe me, just try and separate out STDERR in csh! (-:
There are two possibilities here. The example above, namely using:
${MyVariable:=SomeDefault}
for the first time you need to refer to $MyVariable. This takes the env. var MyVariable and, if it is currently not set, assigns the value of SomeDefault to the variable for later use.
You also have the possibility of:
${MyVariable:-SomeDefault}
which just substitutes SomeDefault for the variable where you are using this construct. It doesn't assign the value SomeDefault to the variable, and the value of MyVariable will still be null after this statement is encountered.
Surely the simplest approach is to add the -u switch to the shebang (the line at the top of your script), assuming you’re using bash:
#!/bin/sh -u
This will cause the script to exit if any unbound variables lurk within.
${MyVariable:=SomeDefault}
If MyVariable is set and not null, it will reset the variable value (= nothing happens).
Else, MyVariable is set to SomeDefault.
The above will attempt to execute ${MyVariable}, so if you just want to set the variable do:
MyVariable=${MyVariable:=SomeDefault}
In my opinion the simplest and most compatible check for #!/bin/sh is:
if [ "$MYVAR" = "" ]
then
echo "Does not exist"
else
echo "Exists"
fi
Again, this is for /bin/sh and is compatible also on old Solaris systems.
bash 4.2 introduced the -v operator which tests if a name is set to any value, even the empty string.
$ unset a
$ b=
$ c=
$ [[ -v a ]] && echo "a is set"
$ [[ -v b ]] && echo "b is set"
b is set
$ [[ -v c ]] && echo "c is set"
c is set
I always used:
if [ "x$STATE" == "x" ]; then echo "Need to set State"; exit 1; fi
Not that much more concise, I'm afraid.
Under CSH you have $?STATE.
For future people like me, I wanted to go a step forward and parameterize the var name, so I can loop over a variable sized list of variable names:
#!/bin/bash
declare -a vars=(NAME GITLAB_URL GITLAB_TOKEN)
for var_name in "${vars[#]}"
do
if [ -z "$(eval "echo \$$var_name")" ]; then
echo "Missing environment variable $var_name"
exit 1
fi
done
We can write a nice assertion to check a bunch of variables all at once:
#
# assert if variables are set (to a non-empty string)
# if any variable is not set, exit 1 (when -f option is set) or return 1 otherwise
#
# Usage: assert_var_not_null [-f] variable ...
#
function assert_var_not_null() {
local fatal var num_null=0
[[ "$1" = "-f" ]] && { shift; fatal=1; }
for var in "$#"; do
[[ -z "${!var}" ]] &&
printf '%s\n' "Variable '$var' not set" >&2 &&
((num_null++))
done
if ((num_null > 0)); then
[[ "$fatal" ]] && exit 1
return 1
fi
return 0
}
Sample invocation:
one=1 two=2
assert_var_not_null one two
echo test 1: return_code=$?
assert_var_not_null one two three
echo test 2: return_code=$?
assert_var_not_null -f one two three
echo test 3: return_code=$? # this code shouldn't execute
Output:
test 1: return_code=0
Variable 'three' not set
test 2: return_code=1
Variable 'three' not set
More such assertions here: https://github.com/codeforester/base/blob/master/lib/assertions.sh
This can be a way too:
if (set -u; : $HOME) 2> /dev/null
...
...
http://unstableme.blogspot.com/2007/02/checks-whether-envvar-is-set-or-not.html
None of the above solutions worked for my purposes, in part because I checking the environment for an open-ended list of variables that need to be set before starting a lengthy process. I ended up with this:
mapfile -t arr < variables.txt
EXITCODE=0
for i in "${arr[#]}"
do
ISSET=$(env | grep ^${i}= | wc -l)
if [ "${ISSET}" = "0" ];
then
EXITCODE=-1
echo "ENV variable $i is required."
fi
done
exit ${EXITCODE}
Rather than using external shell scripts I tend to load in functions in my login shell. I use something like this as a helper function to check for environment variables rather than any set variable:
is_this_an_env_variable ()
local var="$1"
if env |grep -q "^$var"; then
return 0
else
return 1
fi
}
The $? syntax is pretty neat:
if [ $?BLAH == 1 ]; then
echo "Exists";
else
echo "Does not exist";
fi

number of tokens in bash variable

how can I know the number of tokens in a bash variable (whitespace-separated tokens) - or at least, wether it is one or there are more.
The $# expansion will tell you the number of elements in a variable / array. If you're working with a bash version greater than 2.05 or so you can:
VAR='some string with words'
VAR=( $VAR )
echo ${#VAR[#]}
This effectively splits the string into an array along whitespace (which is the default delimiter), and then counts the members of the array.
EDIT:
Of course, this recasts the variable as an array. If you don't want that, use a different variable name or recast the variable back into a string:
VAR="${VAR[*]}"
I can't understand why people are using those overcomplicated bashisms all the time. There's almost always a straight-forward, no-bashism solution.
howmany() { echo $#; }
myvar="I am your var"
howmany $myvar
This uses the tokenizer built-in to the shell, so there's no discrepancy.
Here's one related gotcha:
myvar='*'
echo $myvar
echo "$myvar"
set -f
echo $myvar
echo "$myvar"
Note that the solution from #guns using bash array has the same gotcha.
The following is a (supposedly) super-robust version to work around the gotcha:
howmany() ( set -f; set -- $1; echo $# )
If we want to avoid the subshell, things start to get ugly
howmany() {
case $- in *f*) set -- $1;; *) set -f; set -- $1; set +f;; esac
echo $#
}
These two must be used WITH quotes, e.g. howmany "one two three" returns 3
set VAR='hello world'
echo $VAR | wc -w
here is how you can check.
if [ `echo $VAR | wc -w` -gt 1 ]
then
echo "Hello"
fi
Simple method:
$ VAR="a b c d"
$ set $VAR
$ echo $#
4
To count:
sentence="This is a sentence, please count the words in me."
words="${sentence//[^\ ]} "
echo ${#words}
To check:
sentence1="Two words"
sentence2="One"
[[ "$sentence1" =~ [\ ] ]] && echo "sentence1 has more than one word"
[[ "$sentence2" =~ [\ ] ]] && echo "sentence2 has more than one word"
For a robust, portable sh solution, see #JoSo's functions using set -f.
(Simple bash-only solution for answering (only) the "Is there at least 1 whitespace?" question; note: will also match leading and trailing whitespace, unlike the awk solution below:
[[ $v =~ [[:space:]] ]] && echo "\$v has at least 1 whitespace char."
)
Here's a robust awk-based bash solution (less efficient due to invocation of an external utility, but probably won't matter in many real-world scenarios):
# Functions - pass in a quoted variable reference as the only argument.
# Takes advantage of `awk` splitting each input line into individual tokens by
# whitespace; `NF` represents the number of tokens.
# `-v RS=$'\3'` ensures that even multiline input is treated as a single input
# string.
countTokens() { awk -v RS=$'\3' '{print NF}' <<<"$1"; }
hasMultipleTokens() { awk -v RS=$'\3' '{if(NF>1) ec=0; else ec=1; exit ec}' <<<"$1"; }
# Example: Note the use of glob `*` to demonstrate that it is not
# accidentally expanded.
v='I am *'
echo "\$v has $(countTokens "$v") token(s)."
if hasMultipleTokens "$v"; then
echo "\$v has multiple tokens."
else
echo "\$v has just 1 token."
fi
Not sure if this is exactly what you meant but:
$# = Number of arguments passed to the bash script
Otherwise you might be looking for something like man wc

Resources