Bash local variable scope best practice - bash

I've seen that some people when writing bash script they define local variables inside an if else statement like example 1
Example 1:
#!/bin/bash
function ok() {
local animal
if [ ${A} ]; then
animal="zebra"
fi
echo "$animal"
}
A=true
ok
For another example, this is the same:
Example 2:
#!/bin/bash
function ok() {
if [ ${A} ]; then
local animal
animal="zebra"
fi
echo "$animal"
}
A=true
ok
So, the example above printed the same result but which one is the best practice to follow. I prefer the example 2 but I've seen a lot people declaring local variable inside a function like example 1. Would it be better to declare all local variables on top like below:
function ok() {
# all local variable declaration must be here
# Next statement
}

the best practice to follow
Check your scripts with https://shellcheck.net .
Quote variable expansions. Don't $var, do "$var". https://mywiki.wooledge.org/Quotes
For script local variables, prefer to use lowercase variable names. For exported variables, use upper case and unique variable names.
Do not use function name(). Use name(). https://wiki.bash-hackers.org/scripting/obsolete
Document the usage of global variables a=true. Or add local before using variables local a; then a=true. https://google.github.io/styleguide/shellguide.html#s4.2-function-comments
scope best practice
Generally, use the smallest scope possible. Keep stuff close to each other. Put local close to the variable usage. (This is like the rule from C or C++, to define a variable close to its usage, but unlike in C or C++, in shell declaration and assignment should be on separate lines).
Note that your examples are not the same. In the case variable A (or a) is an empty string, the first version will print an empty line (the local animal variable is empty), the second version will print the value of the global variable animal (there was no local). Although the scope should be as smallest, animal is used outside of if - so local should also be outside.

The local command constrains the variables declared to the function scope.
With that said, you can deduce that doing so inside an if block will be the same as if you did outside of it, as long as it's inside of a function.

Related

how to differentiate between function arguments and script arguments

lets say I have a script called hello
$ cat hello
function1 () {
echo $1
}
function1 what
echo $1
and I call
$ sh hello chicken
what
chicken
How do i refer to the script parameters (chicken) inside the function. Would I have to rename all the script arguments or store them somewhere else? Whats the best way to handle this?
This is a case of shadowing, you can find information about it below
https://www.gnu.org/software/bash/manual/html_node/Shell-Functions.html
If you try to picture it, the inner scope variable casts a "shadow" over the outer scope variable and hides it from view. As soon as the inner scope variable is gone, the program can again "find" the outer scope variable.
It's pretty much another variation of a general rule in programming where things that are more specific or refer to an inner scope, override things that are more generic or part of an outer scope.
If you wrote
temp="hi"
phrase(){
echo "$temp"
temp="hello"
echo "$temp"
}
phrase
The result would be
hi
hello
because the variable of the inner scope "overshadows" the variable of the outer scope.
That can be prevented by storing your script's $1 parameter using another name.
So, as you said, the best approach is to make sure all variables have different names by storing your script parameters inside distinctly named variables.
temp=$1
function1 () {
echo "$1"
echo "$temp"
}
function1 what
echo "$1"
Edit: I forgot to account for the fact that script variables are not available directly inside functions like #gordondavisson said, so even if you weren't passing the word "what" as a parameter to your function, you still wouldn't be able to print the word "chicken".
So, in this case, the only possible way to use the parameter inside the function would be to assign $1 to a variable.

The scope of local variables in sh

I've got quite a lot of headaches trying to debug my recursive function. It turns out that Dash interprets local variables strangely. Consider the following snippet:
iteration=0;
MyFunction()
{
local my_variable;
iteration=$(($iteration + 1));
if [ $iteration -lt 2 ]; then
my_variable="before recursion";
MyFunction
else
echo "The value of my_variable during recursion: '$my_variable'";
fi
}
MyFunction
In Bash, the result is:
The value of my_variable during recursion: ''
But in Dash, it is:
The value of my_variable during recursion: 'before recursion'
Looks like Dash makes the local variables available across the same function name. What is the point of this and how can I avoid issues when I don't know when and which recursive iteration changed the value of a variable?
local is not part of the POSIX specification, so bash and dash are free to implement it any way they like.
dash does not allow assignments with local, so the variable is unset unless it inherits a value from a surrounding scope. (In this case, the surrounding scope of the second iteration is the first iteration.)
bash does allow assignments (e.g., local x=3), and it always creates a variable with a default empty value unless an assignment is made.
This is a consequence of your attempt to read the variable in the inner-most invocation without having set it in there explicitly. In that case, the variable is indeed local to the function, but it inherits its initial value from the outer context (where you have it set to "before recursion").
The local marker on a variable thus only affects the value of the variable in the caller after the function invocation returned. If you set a local variable in a called function, its value will not affect the value of the same variable in the caller.
To quote the dash man page:
Variables may be declared to be local to a function by using a local command. This should appear as the first statement of a function, and the syntax is
local [variable | -] ...
Local is implemented as a builtin command.
When a variable is made local, it inherits the initial value and exported and readonly flags from the variable with the same name in the surrounding scope, if there is one. Otherwise, the variable is initially unset. The shell uses dynamic scoping, so that if you make the variable x local to function f, which then calls function g, references to the variable x made inside g will refer to the variable x declared inside f, not to the
global variable named x.
The only special parameter that can be made local is “-”. Making “-” local any shell options that are changed via the set command inside the function to be restored to their original values when the function returns.
To be sure about the value of a variable in a specific context, make sure to always set it explicitly in that context. Else, you rely on "fallback" behavior of the various shells which might be different across shells.

Variable assignment in nested function call unexpectedly changes local variable in the caller's scope

Editor's note:
Perhaps the following, taken from the OP's own answer, better illustrates the surprising behavior:
f() { local b=1; g; echo $b; }; g() { b=2; }; f # -> '2'
I.e., g() was able to modify f()'s local $b variable.
In Zsh and Bash, if I have the following function f() { a=1; g; echo $a; } and the following function g() { a=2; } when I run f, I get the following output instead of the expected:
$ f
2
Is there anyway to disable this variable bleedthrough from function to function?
I'm working on a rather large and important bash/zsh script at work that uses a ton of variables in various functions; many of these functions depend upon a larger master function, however because of the variable bleed through some rather unfortunate and unexpected behavior and bugs have come to the forefront, preventing me from confidently furthering development, since I'd like to address this strange issue first.
I've even tried using local to localize variables, but the effect still occurs.
EDIT: Note that my question isn't about how to use local variables to prevent variable bleed through or about how local variables work, how to set local variables, how to assign a new value to an already declared local variable, or any of that crap: it is about how to prevent variables from bleeding into the scope of caller/called functions.
Using local creates a variable that is not inherited from the parent scope.
There are useful things to add.
A local variable will be inherited (and can be modified) if the function that declares it calls another function. Therefore, local protects changes to a variable of the same name inherited from higher in the scope, but not lower in the scope. The local declaration must therefore be used at each level, unless of course you actually want to alter the value in the parent scope. This is counter to what most programming languages would do, and has advantages (quick and dirty data sharing) but creates difficult to debug failure modes.
A local variable can be exported with local -x to make it usable by sub-processes (quite useful), or made readonly upon creation with local -r.
One nice trick is you can initialise a variable with the value inherited from the parent scope at the time of creation :
local -r VAR="$VAR"
If, like me, you always use set -u to avoid silently using uninitialized variables, and cannot be sure the variable already is assigned, you can use this to initialize it with an empty value if it is not defined in the parent scope:
local -r VAR="${VAR-}"
I feel like an idiot for not realizing this sooner; I'm going to go ahead and post this question & answer anyway, just in case other scrubs like me encounter the same issue: you have to declare both variables as local:
f() { local b=1; g; echo $b; }
g() { b=2; }
f
# output: 2
f() { local b=1; g; echo $b; }
g() { local b=2; }
f
# output: 1

Bash functions returning values meanwhile altering global variables

I'm just struggling with bash functions, and trying to return string values meanwhile some global variable is modified inside the function. An example:
MyGlobal="some value"
function ReturnAndAlter () {
MyGlobal="changed global value"
echo "Returned string"
}
str=$(ReturnAndAlter)
echo $str # prints out 'Returned value' as expected
echo $MyGlobal # prints out the initial value, not changed
This is because $(...) (and also `...` if used instead) cause the function to have its own environment, so the global variable is never affected.
I found a very dirty workaround by returning the value into another global variable and calling the function only using its name, but think that there should be a cleaner way to do it.
My dirty solution:
MyGlobal="some value"
ret_val=""
function ReturnAndAlter () {
ret_val="Returned string"
MyGlobal="changed value"
}
ReturnAndAlter # call the bare function
str=$ret_val # and assign using the auxiliary global ret_val
echo $str
echo $MyGlobal # Here both global variables are updated.
Any new ideas? Some way of calling functions that I'm missing?
Setting global variables is the only way a function has of communicating directly with the shell that calls it. The practice of "returning" a value by capturing the standard output is a bit of a hack necessitated by the shell's semantics, which are geared towards making it easy to call other programs, not making it easy to do things in the shell itself.
So, don't worry; no, you aren't missing any cool tricks. You're doing what the shell allows you to do.
The $(…) (command expansion) is run in a sub-shell.
All changes inside the sub-shell are lost when the sub-shell close.
It is usually a bad idea to use both printing a result and changing a variable inside a function. Either make all variables or just use one printed string.
There is no other solution.

Local variable scope when one function calls another inside bash shell

# ! /bin/sh
function pqr()
{
# This prints value to 10 even though variable is local inside a
echo "Displaying value of var a $a"
}
function abc()
{
local a=10
# call function pqr and don't pass value of a
pqr
}
Even though I don't pass variable a to pqr() function I get a=10 inside pqr(). My question is is scope and visibility of a is same inside pqr() as that of abc() ?Is this because we are calling pqr() from function abc()?I was expecting new variable would get created inside pqr and will display blank value.(As this is how variable scope and visibility works inside modern languages so I am curious how this works inside bash )
I understood that In the above example If I re declare a inside pqr() then new variable will get created and hence displaying blank value. Thanks in advance!!!
As mentioned in the comments (from man bash):
When local is used within a function, it causes the variable name to have a visible scope restricted to that function and its children.
So calling pqr from within abc means that the variable $a is visible inside both functions.
It's worth mentioning that since you're using bash-specific features such as local and the non-portable function syntax, you should change your shebang to #!/bin/bash.

Resources