Is there file scope in bash programming? - bash

I want to make this variable
local to="$HOME/root_install/grunt"
be available to the entire file
makeGrunt(){
# set paths
local to="$HOME/root_install/grunt"
cd $to
sudo npm install -g grunt-init
sudo git clone https://github.com/gruntjs/grunt-init-gruntfile.git ~/.grunt-init/gruntfile
sudo grunt-init gruntfile
}

In POSIX-like shells - unless you use nonstandard constructs such as local, typeset, or declare - variables created implicitly through
assignment have global scope in the shell at hand.
Thus, to="$HOME/root_install/grunt" will make variable $to available anywhere in the current shell - unless you're inside a function and that variable was explicitly marked as local.
andlrc's helpful answer demonstrates the pitfalls associated with subshells - subshells are child processes that are clones of the original shell - they see the same state, but cannot modify the original shell's environment.

Bash shells use dynamic
scopes
which means that all variables are available for all called functions, commands,
etc. Consider this:
var=1
a() {
local var=2
b
}
b() {
echo "$var"
}
a # 2
b # 1
a # 2
When using the local keyword a variable will be available for the function, in
where it's defined, but also within all functions called from that function.
The same applies when a variable is created without the local keyword. With
that exception that it will also be available outside the function.
One more thing to be aware of is that whenever a subshell is created a variable
will not be able to "leave" it, i.e. when a pipe is involved. Consider this:
sum=0
seq 3 | while read -r num; do
sum=$((sum + num))
echo "$sum" # will print 1, 3 and 6
done
echo "$sum" # 0 huh? 1 + 2 + 3 = 0?

Related

Is lazier (ksh-like) function autoloading possible in bash?

My project (port from ksh) use some directories as autoloadable functions.
In those directories each filenames as the name of a function declared inside the file, sourcing that file to declare (implement) the function. Each directories could be considered a 'package' that augment the bash builtin set via functions. I have about 20 packages, and the number of functions per package can be significant (can reach 30 in some packages).
The bash documentation includes an example implementation of autoloading:
https://www.apt-browse.org/browse/ubuntu/trusty/main/all/bash-doc/4.3-6ubuntu1/file/usr/share/doc/bash/examples/functions/autoload.v2
However, that implementation requires the set of potentially-autoloadable functions to be known (and enumerated) at shell startup time.
Is an implementation that doesn't have that limitation possible?
Well, after asking, my turn to 'give' :) to the SO community. I investigated this auto load feature I need and came up with 2 implementations, I provide one them here, so some may suggest enhancements or point out bugs. I'll post the second one on a second post.
The 2 implementations will runs some test cases, so before presenting the implementations I present the common test cases. We have 2 directories a1/ and a2/ that host function definitions located in file with same name as the function, each dir could be considered a 'package' dir containing functions for this package, and then function in there are namespaced with the package name (dir name), with few exception for the test purpose.
./a1/ac_f3::
function ac_f3
{ echo "In a1 ac_f3() : args=$#"
}
./a1/a1_f1::
function a1_f1
{ echo "In a1_f1() : args=$#"
}
./a1/a1_f2::
function a1_f22
{ echo "In a1_f2() : args=$#"
}
./a2/ac_f3::
function f3
{ echo "In a2 ac_f3() : args=$#"
}
./a2/a2_f1::
function a2_f1
{ echo "In a2_f1() : args=$#"
}
ac_f3 is a function that is not namespaced, and then common to both dir a1/ a2/ yet with different implementation, this is to demonstrate the $FPATH precedence.
a1_f2 is a bogus one it doesn't implement the function a1_f2() and then we must fail gracefully.
a1_f1, a2_f1, simply implement a1_f1() a2_f2(), and must be found and executed.
command_not_found_handle implementation
Thanks Charles for bringing the command_not_found_handle option, because, surely auto loadable function are related to the fact that a 'command' has not been found and then we try to find a auto loadable to load and execute.
But amazingly, the bash shell has an interesting "feature", i.e some undocumented behavior.
Bash doc says.
If the search is unsuccessful, the shell searches for a defined
shell function named command_not_found_handle. If that
function exists, it is invoked with the original command and
the original command's arguments as its arguments, and the
function's exit status becomes the exit status of the shell.
If that function is not defined, the shell prints an error
message and returns an exit status of 127.
This is misleading because here we talk about the command_not_found_handle() function invocation, and then we may infere 'from the shell context' and this is not the case.
In the shell logic, we failed to get an alias, then fail to get a function, then failed to get an 'external to the shell' program, and the shell is already in a sub-shell creation mode, so command_not_found_handle() is invoked but in a subshell. not the shell context. This could be OK, but the 'funny feature' here is that the sub-process created is not clean, its $$ and $PPID are not set correctly, may be this will be fixed one day. To exhibit this bash feature we can do
function command_not_found_handle
{ echo $$ ; sh -c 'echo $PPID'
}
PW$ # In a shell context invocation
PW$ command_not_found_handle
2746
2746
PW$ # In a subshell invocation (via command not found)
PW$ qqq
2746
3090
Back to our autoload feature, this mean we want to install more functions in a shell instance, nothing that can be done in a subshell, so basically command_not_found_handle() is of tiny help and can do nothing beside signal its parent we got entered (then a command was not found), we will exploit this feature in our implementation.
# autoload
# This file must be sourced
# - From your rc files if you need autoloadable fuctions from your
# interactive shell
# - From any script that need autoloadable functions.
#
# The FPATH must be set with a set of dirs/ where to look to find
# file name match the function name to source and execute.
#
# Note that if FPATH is exported, this is a way to export functions to
# script subshells
# Create a default command_not_found_handle if none exist
declare -F command_not_found_handle >/dev/null ||
function command_not_found_handle { ! echo bash: $1 command not found>&2; }
# Rename current command_not_found_handle
_cnf_body=$(declare -f command_not_found_handle | tail -n +2)
eval "function _cnf_prev $_cnf_body"
# Change USR1 to your liking
CNF_SIG=USR1
function autoload
{ declare f=$1 ; shift
declare d s
for d in $(IFS=:; echo $FPATH)
do s=$d/$f
[ -f $s -a -r $s ] &&
{ . $s
declare -F $f >/dev/null ||
{ echo "$s exist but don't define $f" >&2 ; return 127
}
$f "$#" ; return
}
done
_cnf_prev $f "$#"
}
trap 'autoload ${BASH_COMMAND[#]}' $CNF_SIG
function command_not_found_handle
{ kill -$CNF_SIG $$
}
WARNING, if you ever use this 'autoload' file be prepared for bash fix, it may one day reflect the real $$ $PPID, in which case you will need to fix the above snippet with
$PPID instead of $$.
Results.
PW$ . /path/to/autoload
PW$ FPATH=a1:a2
PW$ a1_f1 11a 11b 11c
In a1_f1() : args=11a 11b 11c
PW$ a2_f1 21a 21b 21c
In a2_f1() : args=21a 21b 21c
PW$ a1_f2 12a 12b 12c
a1/a1_f2 exist but don't define a1_f2
PW$ ac_f3 c3a c3b c3c
In a1 ac_f3() : args=c3a c3b c3c
PW$ qqq
Command 'qqq' not found, did you mean:
command 'qrq' from snap qrq (0.3.1)
command 'qrq' from deb qrq
See 'snap info <snapname>' for additional versions.
What we got here is correct, a1_f1() a2_f1() are found, loaded, executed.
a1_f2() is nowhere to be found, despite having a file that could host it.
The qqq invocation display the chaining of the handlers, going function autoload fist, then the ubuntu command-not-found package (if installed) meaning we are not loosing the command_not_found_handle() user experience.
Note there is no 'admin' functions here like adding/removing/reloading functions.
Adding is a matter of setting file in dirs present in $FPATH
Removing is a matter of removing the source file and unset -f the function
Reloading is a matter of editing the source file and unset -f the function.
Reloading function can be pretty neat during development in interactive shell, but all this can be done with a simple
unset -f funcname, so basically you edit your source file. unset the function, then call it, you get the latest. Same may happen in a script daemon, one could implement a signal to the daemon and the trap handler would simply unset a set of functions that would then be reloaded without stopping/restarting the daemon.
Another feature here is that shell 'package' are possible, i.e a source file may implement 'many' functions, some are the external API, other are internal to the package, since all is flat in the shell, function are namespaced, and then each external API functions (albeit documented) can be hard linked to the same file. The first external API used will load all the package functions.
In my project, the documentation is extracted from the packages sources, and then hardlink are inferred and build at this time.
PROs and CONs
PROs
Here we got a light signature in the autoload sourcing, i.e from scripts or from bash rc file (interactive), the define of the autoload() is modest.
It is very dynamic, in the sense that function loading and executing is really deferred until really needed.
CONs
It grabs a signal number, that would not be necessary should the command_not_found_handle() be a real function called from the shell context, this could happen one day.
It is implemented on a bash feature that may move (wrong, $$ $PPID) then need maintenance on the moving target.
Conclusion
This implementation is OK for me (I Don't care loosing SIGUSR1). The ideal solution would be that command_not_found_handle() would be cleanly implemented and then called in the shell context. The a similar implementation would be possible without any signal.
This is a second implementation to avoid the signal usage seen in the previous implementation and the usage of the command_not_found_handle() that seems not completly stable.
autoload::
function autoload
{ local d="$1" && [ "$1" ] && shift && autoload "$#"
local identifier='^[_a-zA-Z][_a-zA-Z0-9]*$'
[ -d "$d" -a -x "$d" ] && cd "$d" &&
{ for f in *
do [[ $f =~ $identifier ]] && alias $f=". $PWD/$f;unalias $f;$f"
done
cd ->/dev/null 2>&1
}
}
autoload $# $(IFS=:; echo $FPATH)
Here again we got to source this autolaod file either in rc file or in scripts.
The usage of FPATH is not really needed (see Notes for more details on FPATH)
So basically the idea is to source the autoload file along with a set of directories to look for.
PW$ . /path/to/autoload a1 a2
PW$ alias | grep 'a[12c]_*'
alias a1_f1='. /home/phi/a1/a1_f1;unalias a1_f1;a1_f1'
alias a1_f2='. /home/phi/a1/a1_f2;unalias a1_f2;a1_f2'
alias a2_f1='. /home/phi/a2/a2_f1;unalias a2_f1;a2_f1'
alias ac_f3='. /home/phi/a1/ac_f3;unalias ac_f3;ac_f3'
PW$ declare -F | grep 'a[12c]_*'
After the autoload sourcing, we got all the alias defined and no functions.
This is a bit heavier than the previous implementation, yet pretty lightweight, alias are not costly to create in the shell, even with hundred of them.
PW$ a1_f1 11a 11b 11c
In a1_f1() : args=11a 11b 11c
PW$ a2_f1 21a 21b 21c
In a2_f1() : args=21a 21b 21c
PW$ alias | grep 'a[12c]_*'
alias a1_f2='. /home/phi/a1/a1_f2;unalias a1_f2;a1_f2'
alias ac_f3='. /home/phi/a1/ac_f3;unalias ac_f3;ac_f3'
PW$ declare -F | grep 'a[12c]_*'
declare -f a1_f1
declare -f a2_f1
Here we see that a1_f1() and a2_f2() are then loaded and executed, they are removed from the alias list and added in the function list.
PW$ a1_f2 12a 12b 12c
a1_f2: command not found
PW$ ac_f3 c3a c3b c3c
In a1 ac_f3() : args=c3a c3b c3c
PW$ qqq
Command 'qqq' not found, did you mean:
command 'qrq' from snap qrq (0.3.1)
command 'qrq' from deb qrq
See 'snap info <snapname>' for additional versions.
Here we see that a1_f2() is not found, not well reported as in the previous implementation.
ac_f3() is the one from a1/ as expected.
qqq still provide the command-not-found distro package result if installed ( normal we didn't mess with command_not_found_handle() )
PROs and CONs
PROs
Not sitting on a bash bug, i.e could live for a while after bash updates.
CONs
A little bit heavier than previous implementation, yet acceptable.
Much simpler, well may be not simpler, but surely shorter than proposed examples in bash documentation, and a bit more lazy, i.e function are loaded only when necessary (not the aliases though)
Multi function 'package' files along with hardlink for external API exposure is less performing, because each external API function (hardlink) will trig a reload of the file, unless the package file is well written removing all the excess aliases after loading.

Why would I use declare / typeset in a shell script instead of just X=y?

I've recently come across a shell script that uses
declare -- FOO="" which apparently is spelled typeset -- FOO="" in non-bash shells.
Why might I want to do that instead of plain FOO="" or export FOO?
The most important purpose of using declare is to control scope, or to use array types that aren't otherwise accessible.
Using Function-Local Variables
To give you an example:
print_dashes() { for (( i=0; i<10; i++; do printf '-'; done; echo; }
while read -p "Enter a number: " i; do
print_dashes
echo "You entered: $i"
done
You'd expect that to print the number the user entered, right? But instead, it'll always print the value of i that print_dashes leaves when it's complete.
Consider instead:
print_dashes() {
declare i # ''local i'' would also have the desired effect
for (( i=0; i<10; i++; do printf '-'; done; echo;
}
...now i is local, so the newly-assigned value doesn't last beyond its invocation.
Declaring Explicitly Global Variables
Contrariwise, you sometimes want to declare a global variable, and make it clear to your code's readers that you're doing that by intent, or to do so while also declaring something as an array (or otherwise where declare would otherwise implicitly specify global state). You can do that too:
myfunc() {
declare arg # make arg local
declare -g -A myfunc_args_seen # make myfunc_args_seen a global associative array
for arg; do
myfunc_args_seen["$arg"]=1
done
echo "Across all invocations of myfunc, we have seen the following arguments:"
printf ' - %q\n' "${!myfunc_args_seen[#]}"
}
Declaring Associative Arrays
Normal shell arrays can just be assigned: my_arr=( one two three )
However, that's not the case for associative arrays, which are keyed as strings. For those, you need to declare them:
declare -A my_arr=( ["one"]=1 ["two"]=2 ["three"]=3 )
declare -i cnt=0
declares an integer-only variable, which is faster for math and always evaluates in arithmetic context.
declare -l lower="$1"
declares a variabl that automatically lowercases anything put in it, without any special syntax on access.
declare -r unchangeable="$constant"
declares a variable read-only.
Take a look at https://unix.stackexchange.com/questions/254367/in-bash-scripting-whats-the-different-between-declare-and-a-normal-variable for some useful discussion - you might not need these things often, but if you don't know what's available you're likely to work harder than you should.
A great reason to use declare, typeset, and/or readonly is code compartmentalization and reuse (i.e. encapsulation). You can write code in one script that can be sourced by others.
(Note declared/typeset/readonly constants/variables/functions lose their "readonly-ness" in a subshell, but they retain it when a child script sources their defining script since sourcing loads a script into the current shell, not a subshell.)
Since sourcing loads code from the script into the current shell though, the namespaces will overlap. To prevent a variable in a child script from being overwritten by its parent (or vice-versa, depending on where the script is sourced and the variable used), you can declare a variable readonly so it won't get overwritten.
You have to be careful with this because once you declare something readonly, you cannot unset it, so you do not want to declare something readonly that might naturally be redefined in another script. For example, if you're writing a library for general use that has logging functions, you might not want to use typeset -f on a function called warn, error, or info, since it is likely other scripts will create similar logging functions of their own with that name. In this case, it is actually standard practice to prefix the function, variable, and/or constant name with the name of the defining script and then make it readonly (e.g. my_script_warn, my_script_error, etc.). This preserves the values of the functions, variables, and/or constants as used in the logic in the code in the defining script so they don't get overwritten by sourcing scripts and accidentally fail.

Bash: Hide global variable using local variable with same name

I'd like to use a global variable in a function but don't want the change to go outside the function. So I defined a local variable initialized to the value of the global variable. The global variable has a great name, so I want to use the same name on the local variable. This seems doable in Bash, but I'm not sure if this is undefined behavior.
#!/bin/bash
a=3
echo $a
foo() {
local a=$a ## defined or undefined?
a=4
echo $a
}
foo
echo $a
Gives output:
3
4
3
Expansion happen before assignment (early on) as the documentation states:
Expansion is performed on the command line after it has been split into words.
So the behavior should be predictable (and defined). In local a=$a when expanding $a it's still the global one. The command execution (assignment/declaration) happens later (when $a has already been replaced by its value).
However I am not sure this would not get confusing to have essentially two different variables (scope dependent) with the same name (i.e. appearing to be the one and same). So, I'd rather question wisdom of doing so on coding practices / readability / ease of navigation grounds.
There is a new shell option in Bash 5.0, localvar_inherit, to have local variables with the same name inherit the value of a variable with the same name in the preceding scope:
#!/usr/bin/env bash
shopt -s localvar_inherit
myfunc() {
local globalvar
echo "In call: $globalvar"
globalvar='local'
echo "In call, after setting: $globalvar"
}
globalvar='global'
echo "Before call: $globalvar"
myfunc
echo "After call: $globalvar"
with the following output:
Before call: global
In call: global
In call, after setting: local
After call: global
If you don't have Bash 5.0, you have to set the value in the function, as you did in your question, with the same result.

why bash throws unbound variable warning when I declare a local array in function whose name is shadowing a global one?

In this example declaring local variable with different name from that of global scope produces no error but when name is the same as global I get:
line 5: !1: unbound variable
code:
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr myArr[#]
Just to make sure we are on the same sheet of paper, here is the version working on Bash 3.2 (works fine quoted or unquoted). You must either have an environment setting or stray characters in your file, of something unrelated to your script causing issues:
#!/bin/bash
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr "myArr[#]"
exit 0
Version
$ bash --version
GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
Output
$ bash array_indirect_ref.sh
one two three
one two three
Execution
$ bash -x array_indirect_ref.sh
+ set -u
+ myArr=(one two three)
+ get_arr 'myArr[#]'
+ myArr2=("${!1}")
+ local myArr2
+ echo one two three
one two three
+ myArr=("${!1}")
+ local myArr
+ echo one two three
one two three
Update: it appears that how you declare the passed array inside your function affects whether or not shadowed names will work, even in new bash versions.
I have some bash code that used to work, as of last week, but now fails after I updated cygwin to its current code.
~~~~~~~~~~
My cygwin bash version is now 4.3.39:
$ bash --version
GNU bash, version 4.3.39(2)-release (i686-pc-cygwin)
which is the latest.
~~~~~~~~~~
Consider this bash code:
#!/bin/bash
set -e # exit on first failed command
set -u # exit if encounter never set variable
testArrayArg1() {
declare -a argArray=("${!1}")
echo "testArrayArg1: ${argArray[#]}"
}
testArrayArg2() {
declare -a anArray=("${!1}")
echo "testArrayArg2: ${anArray[#]}"
}
anArray=("a" "b" "c")
testArrayArg1 anArray[#]
testArrayArg2 anArray[#]
Note that testArrayArg2 function uses an array name (anArray) which shadows the subsequent variable name in the script.
Also note that the way I pass the array to the function (anArray[#]) and the way that I declare the array in the function (declare -a anArray=("${!1}")) are taken from Ken Bertelson's answer here.
Both functions above used to always work.
Now, after my cygwin/bash update, testArrayArg1 still works but testArrayArg2 which uses a shadowed array name fails:
$ bash t.sh
testArrayArg1: a b c
t.sh: line 11: !1: unbound variable
Anyone know what changed recently in bash to cause this?
~~~~~~~~~~
I can fix this if I change how I declare the array inside the function from declare -a anArray=("${!1}") to your "local" idiom of local anArray=("${!1}").
So, this code
testArrayArg3() {
local anArray=("${!1}")
echo "testArrayArg3: ${anArray[#]}"
}
testArrayArg3 anArray[#]
works:
testArrayArg3: a b c
~~~~~~~~~~
OK, so the local anArray=("${!1}") function array arg declaration idiom seems to work.
This idiom is mentioned in that SO link that I mentioned above in a hidden comment under Ken Bertelson's answer. To see it, click on the "show 3 more" link and check out Mike Q's comment.
Is it every bit as good as the declare -a anArray=("${!1}") idiom, or does it have drawbacks of its own?
I have some critical code that depends on passing arrays to bash functions, so I really need to get this straight.

Why can a bash function recursively call itself without using local variables?

I am doing scripts in bash. It is said in this site (http://tldp.org/LDP/abs/html/recurnolocvar.html) that "A function may recursively call itself even without use of local variables." but it was not explained why.
There is a sample function involving the fibonacci sequence. He commented on the code that it doesnt need to be local and asked why, but did not answer. A part is shown below:
Fibonacci ()
{
idx=$1 # Doesn't need to be local. Why not?
if [ "$idx" -lt "$MINIDX" ]
then
echo "$idx" # First two terms are 0 1 ... see above.
else
(( --idx )) # j-1
term1=$( Fibonacci $idx ) # Fibo(j-1)
(( --idx )) # j-2
term2=$( Fibonacci $idx ) # Fibo(j-2)
echo $(( term1 + term2 ))
fi
}
The Fibonacci function has an "idx" variable which could have been modified by successive calls because the successive "idx" definitions are not declared local, hence it should affect the previous definitions.
The previous topic on that site (http://tldp.org/LDP/abs/html/localvar.html) demonstrates that if a variable is not declared as local (therefore defaults to global) then changing it would reflect changes in global scope.
Why can a bash function recursively call itself without using local variables?
Because the variables effectively are local.
The command in process substitution ($()) is run by a subshell. Variable values don't propagate back from subshell. So the recursive calls can't affect the parent call.
Commands run in subshell are:
process substitution ($(command))
both sides of a pipeline (command1 | command2)
explicit subshell ((command))
background jobs (command&)
(bash-specific) process redirections (>(command), <(command))
There is no way to propagate any variable values back from these.
It's because he calls the function recursively through a subshell:
term1=$( Fibonacci $idx ) # Fibo(j-1)
I actually find that inefficient For every level of recursion, a process is summoned and could cause overload to the system. It's better to use local variables.

Resources