Variable assignment in nested function call unexpectedly changes local variable in the caller's scope - bash

Editor's note:
Perhaps the following, taken from the OP's own answer, better illustrates the surprising behavior:
f() { local b=1; g; echo $b; }; g() { b=2; }; f # -> '2'
I.e., g() was able to modify f()'s local $b variable.
In Zsh and Bash, if I have the following function f() { a=1; g; echo $a; } and the following function g() { a=2; } when I run f, I get the following output instead of the expected:
$ f
2
Is there anyway to disable this variable bleedthrough from function to function?
I'm working on a rather large and important bash/zsh script at work that uses a ton of variables in various functions; many of these functions depend upon a larger master function, however because of the variable bleed through some rather unfortunate and unexpected behavior and bugs have come to the forefront, preventing me from confidently furthering development, since I'd like to address this strange issue first.
I've even tried using local to localize variables, but the effect still occurs.
EDIT: Note that my question isn't about how to use local variables to prevent variable bleed through or about how local variables work, how to set local variables, how to assign a new value to an already declared local variable, or any of that crap: it is about how to prevent variables from bleeding into the scope of caller/called functions.

Using local creates a variable that is not inherited from the parent scope.
There are useful things to add.
A local variable will be inherited (and can be modified) if the function that declares it calls another function. Therefore, local protects changes to a variable of the same name inherited from higher in the scope, but not lower in the scope. The local declaration must therefore be used at each level, unless of course you actually want to alter the value in the parent scope. This is counter to what most programming languages would do, and has advantages (quick and dirty data sharing) but creates difficult to debug failure modes.
A local variable can be exported with local -x to make it usable by sub-processes (quite useful), or made readonly upon creation with local -r.
One nice trick is you can initialise a variable with the value inherited from the parent scope at the time of creation :
local -r VAR="$VAR"
If, like me, you always use set -u to avoid silently using uninitialized variables, and cannot be sure the variable already is assigned, you can use this to initialize it with an empty value if it is not defined in the parent scope:
local -r VAR="${VAR-}"

I feel like an idiot for not realizing this sooner; I'm going to go ahead and post this question & answer anyway, just in case other scrubs like me encounter the same issue: you have to declare both variables as local:
f() { local b=1; g; echo $b; }
g() { b=2; }
f
# output: 2
f() { local b=1; g; echo $b; }
g() { local b=2; }
f
# output: 1

Related

How do I share a group of variables with multiple functions in bash [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 19 days ago.
Improve this question
I've been getting more familiar with bash scripting lately and its unfamiliar variable scoping.
func1() {
local file="/path/to/a/file.log"
func2 "$file"
func5 "$file"
}
func2() {
local name=$(basename -- "$1")
local ext="${1##*.}"
#do stuff
}
I have different chains of functions that $file is passed to
func2 -> func3 -> func4
func5 -> func6 -> func7
and so on.
In those functions I may need different parts of the file path, the name, extension or just the path. So far I have just been extracting these parts in each function as needed but this has led to duplicate variables in some functions, which I don't like.
Usually I would use a class to to share a group of variable with a set of functions but bash doesn't have classes. What is the "normal" way to do this in bash?
Should I create all the variables in func1 and pass them to the functions as needed or should I just use them without passing them unless there is a specific need to pass a variable?
You can use global associative arrays to avoid polluting your global namespace
#! /bin/bash
setvars() {
local -r arg="$1"
declare -gA file
file[path]=$(dirname -- "$arg")
file[name]=$(basename -- "$arg")
file[ext]=${arg##*.}
readonly file
}
func1() {
#local file="/path/to/a/file.log"
func2 #"$file"
func5 #"$file"
}
func2() {
echo ${file[name]}
echo ${file[ext]}
#do stuff
}
func5() {
echo ${file[path]}
}
setvars "/path/to/a/file.log"
func1
# error: file is immutable
file[ext]='abc'
I'm getting inheritance, which is pretty flexible.
$: f2(){ echo "${1:-${x:-empty}}"; } # arg precedent with default
$: f1(){ local x=foo; f2; x=bar f2; f2 baz; unset x; f2; } # localize and try variations
$: x=test # set a global default
$: f2 # use global default
test
$: x=different f2 # set an ephemeral default
different
$: f2 override # pass an explicit value
override
$: x=different f2 override # explicit argument wins
override
$: f1 # test variations over local
foo
bar
baz
empty
$: echo $x # global default unchanged
test
In particular, note that when using local in f1, unset x only makes the local x unset; the echo at the end is still using the global definition of x.
So unless yours behaves differently, the func1 value of file likely exists in it's called children, as would name and ext if it defined them, and the values were not overridden in the child functions. Obviously, those child functions could also override the inherited value if needed, in several ways.
Effectively, this lets you establish values for a heirarchical context.
responding to question
"Yes file is accessible to all child functions which is what I am not used to and it makes me wonder what is the point of ever passing variables in bash."
Your OP specifically asked how to share a group of variables with multiple functions in bash. Scoping can handle that, but is a very specific use case.
c.f. the Basg Guide's comments:
Functions serve a few purposes in a script... The second is to allow a block of code to be reused with slightly different arguments.
A func1 that depends on globals is entangled and state-dependent, which might be fine in the context of a single app - sometimes you just need a way to repeat a block of code. x=foo func1 alters $x in that call, but requires knowledge of function internals. func1 foo does not.
If globals handle your needs with no side effects or namespace collisions, fine... but arguments make easier refactoring, especially if you store a useful function and source it in multiple scripts.
Some say if those things are a concern, use another language. It's a consideration...
TL;DR
func1 "$arg" is generally preferred as safer, isolates from entanglements, doesn't pollute the namespace and risk collisions, doesn't require knowledge of function internals, and is arguably easier to maintain, refactor, and understand without scrolling around to read the function code.

Bash local variable scope best practice

I've seen that some people when writing bash script they define local variables inside an if else statement like example 1
Example 1:
#!/bin/bash
function ok() {
local animal
if [ ${A} ]; then
animal="zebra"
fi
echo "$animal"
}
A=true
ok
For another example, this is the same:
Example 2:
#!/bin/bash
function ok() {
if [ ${A} ]; then
local animal
animal="zebra"
fi
echo "$animal"
}
A=true
ok
So, the example above printed the same result but which one is the best practice to follow. I prefer the example 2 but I've seen a lot people declaring local variable inside a function like example 1. Would it be better to declare all local variables on top like below:
function ok() {
# all local variable declaration must be here
# Next statement
}
the best practice to follow
Check your scripts with https://shellcheck.net .
Quote variable expansions. Don't $var, do "$var". https://mywiki.wooledge.org/Quotes
For script local variables, prefer to use lowercase variable names. For exported variables, use upper case and unique variable names.
Do not use function name(). Use name(). https://wiki.bash-hackers.org/scripting/obsolete
Document the usage of global variables a=true. Or add local before using variables local a; then a=true. https://google.github.io/styleguide/shellguide.html#s4.2-function-comments
scope best practice
Generally, use the smallest scope possible. Keep stuff close to each other. Put local close to the variable usage. (This is like the rule from C or C++, to define a variable close to its usage, but unlike in C or C++, in shell declaration and assignment should be on separate lines).
Note that your examples are not the same. In the case variable A (or a) is an empty string, the first version will print an empty line (the local animal variable is empty), the second version will print the value of the global variable animal (there was no local). Although the scope should be as smallest, animal is used outside of if - so local should also be outside.
The local command constrains the variables declared to the function scope.
With that said, you can deduce that doing so inside an if block will be the same as if you did outside of it, as long as it's inside of a function.

The scope of local variables in sh

I've got quite a lot of headaches trying to debug my recursive function. It turns out that Dash interprets local variables strangely. Consider the following snippet:
iteration=0;
MyFunction()
{
local my_variable;
iteration=$(($iteration + 1));
if [ $iteration -lt 2 ]; then
my_variable="before recursion";
MyFunction
else
echo "The value of my_variable during recursion: '$my_variable'";
fi
}
MyFunction
In Bash, the result is:
The value of my_variable during recursion: ''
But in Dash, it is:
The value of my_variable during recursion: 'before recursion'
Looks like Dash makes the local variables available across the same function name. What is the point of this and how can I avoid issues when I don't know when and which recursive iteration changed the value of a variable?
local is not part of the POSIX specification, so bash and dash are free to implement it any way they like.
dash does not allow assignments with local, so the variable is unset unless it inherits a value from a surrounding scope. (In this case, the surrounding scope of the second iteration is the first iteration.)
bash does allow assignments (e.g., local x=3), and it always creates a variable with a default empty value unless an assignment is made.
This is a consequence of your attempt to read the variable in the inner-most invocation without having set it in there explicitly. In that case, the variable is indeed local to the function, but it inherits its initial value from the outer context (where you have it set to "before recursion").
The local marker on a variable thus only affects the value of the variable in the caller after the function invocation returned. If you set a local variable in a called function, its value will not affect the value of the same variable in the caller.
To quote the dash man page:
Variables may be declared to be local to a function by using a local command. This should appear as the first statement of a function, and the syntax is
local [variable | -] ...
Local is implemented as a builtin command.
When a variable is made local, it inherits the initial value and exported and readonly flags from the variable with the same name in the surrounding scope, if there is one. Otherwise, the variable is initially unset. The shell uses dynamic scoping, so that if you make the variable x local to function f, which then calls function g, references to the variable x made inside g will refer to the variable x declared inside f, not to the
global variable named x.
The only special parameter that can be made local is “-”. Making “-” local any shell options that are changed via the set command inside the function to be restored to their original values when the function returns.
To be sure about the value of a variable in a specific context, make sure to always set it explicitly in that context. Else, you rely on "fallback" behavior of the various shells which might be different across shells.

Why would I use declare / typeset in a shell script instead of just X=y?

I've recently come across a shell script that uses
declare -- FOO="" which apparently is spelled typeset -- FOO="" in non-bash shells.
Why might I want to do that instead of plain FOO="" or export FOO?
The most important purpose of using declare is to control scope, or to use array types that aren't otherwise accessible.
Using Function-Local Variables
To give you an example:
print_dashes() { for (( i=0; i<10; i++; do printf '-'; done; echo; }
while read -p "Enter a number: " i; do
print_dashes
echo "You entered: $i"
done
You'd expect that to print the number the user entered, right? But instead, it'll always print the value of i that print_dashes leaves when it's complete.
Consider instead:
print_dashes() {
declare i # ''local i'' would also have the desired effect
for (( i=0; i<10; i++; do printf '-'; done; echo;
}
...now i is local, so the newly-assigned value doesn't last beyond its invocation.
Declaring Explicitly Global Variables
Contrariwise, you sometimes want to declare a global variable, and make it clear to your code's readers that you're doing that by intent, or to do so while also declaring something as an array (or otherwise where declare would otherwise implicitly specify global state). You can do that too:
myfunc() {
declare arg # make arg local
declare -g -A myfunc_args_seen # make myfunc_args_seen a global associative array
for arg; do
myfunc_args_seen["$arg"]=1
done
echo "Across all invocations of myfunc, we have seen the following arguments:"
printf ' - %q\n' "${!myfunc_args_seen[#]}"
}
Declaring Associative Arrays
Normal shell arrays can just be assigned: my_arr=( one two three )
However, that's not the case for associative arrays, which are keyed as strings. For those, you need to declare them:
declare -A my_arr=( ["one"]=1 ["two"]=2 ["three"]=3 )
declare -i cnt=0
declares an integer-only variable, which is faster for math and always evaluates in arithmetic context.
declare -l lower="$1"
declares a variabl that automatically lowercases anything put in it, without any special syntax on access.
declare -r unchangeable="$constant"
declares a variable read-only.
Take a look at https://unix.stackexchange.com/questions/254367/in-bash-scripting-whats-the-different-between-declare-and-a-normal-variable for some useful discussion - you might not need these things often, but if you don't know what's available you're likely to work harder than you should.
A great reason to use declare, typeset, and/or readonly is code compartmentalization and reuse (i.e. encapsulation). You can write code in one script that can be sourced by others.
(Note declared/typeset/readonly constants/variables/functions lose their "readonly-ness" in a subshell, but they retain it when a child script sources their defining script since sourcing loads a script into the current shell, not a subshell.)
Since sourcing loads code from the script into the current shell though, the namespaces will overlap. To prevent a variable in a child script from being overwritten by its parent (or vice-versa, depending on where the script is sourced and the variable used), you can declare a variable readonly so it won't get overwritten.
You have to be careful with this because once you declare something readonly, you cannot unset it, so you do not want to declare something readonly that might naturally be redefined in another script. For example, if you're writing a library for general use that has logging functions, you might not want to use typeset -f on a function called warn, error, or info, since it is likely other scripts will create similar logging functions of their own with that name. In this case, it is actually standard practice to prefix the function, variable, and/or constant name with the name of the defining script and then make it readonly (e.g. my_script_warn, my_script_error, etc.). This preserves the values of the functions, variables, and/or constants as used in the logic in the code in the defining script so they don't get overwritten by sourcing scripts and accidentally fail.

Bash functions returning values meanwhile altering global variables

I'm just struggling with bash functions, and trying to return string values meanwhile some global variable is modified inside the function. An example:
MyGlobal="some value"
function ReturnAndAlter () {
MyGlobal="changed global value"
echo "Returned string"
}
str=$(ReturnAndAlter)
echo $str # prints out 'Returned value' as expected
echo $MyGlobal # prints out the initial value, not changed
This is because $(...) (and also `...` if used instead) cause the function to have its own environment, so the global variable is never affected.
I found a very dirty workaround by returning the value into another global variable and calling the function only using its name, but think that there should be a cleaner way to do it.
My dirty solution:
MyGlobal="some value"
ret_val=""
function ReturnAndAlter () {
ret_val="Returned string"
MyGlobal="changed value"
}
ReturnAndAlter # call the bare function
str=$ret_val # and assign using the auxiliary global ret_val
echo $str
echo $MyGlobal # Here both global variables are updated.
Any new ideas? Some way of calling functions that I'm missing?
Setting global variables is the only way a function has of communicating directly with the shell that calls it. The practice of "returning" a value by capturing the standard output is a bit of a hack necessitated by the shell's semantics, which are geared towards making it easy to call other programs, not making it easy to do things in the shell itself.
So, don't worry; no, you aren't missing any cool tricks. You're doing what the shell allows you to do.
The $(…) (command expansion) is run in a sub-shell.
All changes inside the sub-shell are lost when the sub-shell close.
It is usually a bad idea to use both printing a result and changing a variable inside a function. Either make all variables or just use one printed string.
There is no other solution.

Resources