The scope of local variables in sh - shell

I've got quite a lot of headaches trying to debug my recursive function. It turns out that Dash interprets local variables strangely. Consider the following snippet:
iteration=0;
MyFunction()
{
local my_variable;
iteration=$(($iteration + 1));
if [ $iteration -lt 2 ]; then
my_variable="before recursion";
MyFunction
else
echo "The value of my_variable during recursion: '$my_variable'";
fi
}
MyFunction
In Bash, the result is:
The value of my_variable during recursion: ''
But in Dash, it is:
The value of my_variable during recursion: 'before recursion'
Looks like Dash makes the local variables available across the same function name. What is the point of this and how can I avoid issues when I don't know when and which recursive iteration changed the value of a variable?

local is not part of the POSIX specification, so bash and dash are free to implement it any way they like.
dash does not allow assignments with local, so the variable is unset unless it inherits a value from a surrounding scope. (In this case, the surrounding scope of the second iteration is the first iteration.)
bash does allow assignments (e.g., local x=3), and it always creates a variable with a default empty value unless an assignment is made.

This is a consequence of your attempt to read the variable in the inner-most invocation without having set it in there explicitly. In that case, the variable is indeed local to the function, but it inherits its initial value from the outer context (where you have it set to "before recursion").
The local marker on a variable thus only affects the value of the variable in the caller after the function invocation returned. If you set a local variable in a called function, its value will not affect the value of the same variable in the caller.
To quote the dash man page:
Variables may be declared to be local to a function by using a local command. This should appear as the first statement of a function, and the syntax is
local [variable | -] ...
Local is implemented as a builtin command.
When a variable is made local, it inherits the initial value and exported and readonly flags from the variable with the same name in the surrounding scope, if there is one. Otherwise, the variable is initially unset. The shell uses dynamic scoping, so that if you make the variable x local to function f, which then calls function g, references to the variable x made inside g will refer to the variable x declared inside f, not to the
global variable named x.
The only special parameter that can be made local is “-”. Making “-” local any shell options that are changed via the set command inside the function to be restored to their original values when the function returns.
To be sure about the value of a variable in a specific context, make sure to always set it explicitly in that context. Else, you rely on "fallback" behavior of the various shells which might be different across shells.

Related

Bash local variable scope best practice

I've seen that some people when writing bash script they define local variables inside an if else statement like example 1
Example 1:
#!/bin/bash
function ok() {
local animal
if [ ${A} ]; then
animal="zebra"
fi
echo "$animal"
}
A=true
ok
For another example, this is the same:
Example 2:
#!/bin/bash
function ok() {
if [ ${A} ]; then
local animal
animal="zebra"
fi
echo "$animal"
}
A=true
ok
So, the example above printed the same result but which one is the best practice to follow. I prefer the example 2 but I've seen a lot people declaring local variable inside a function like example 1. Would it be better to declare all local variables on top like below:
function ok() {
# all local variable declaration must be here
# Next statement
}
the best practice to follow
Check your scripts with https://shellcheck.net .
Quote variable expansions. Don't $var, do "$var". https://mywiki.wooledge.org/Quotes
For script local variables, prefer to use lowercase variable names. For exported variables, use upper case and unique variable names.
Do not use function name(). Use name(). https://wiki.bash-hackers.org/scripting/obsolete
Document the usage of global variables a=true. Or add local before using variables local a; then a=true. https://google.github.io/styleguide/shellguide.html#s4.2-function-comments
scope best practice
Generally, use the smallest scope possible. Keep stuff close to each other. Put local close to the variable usage. (This is like the rule from C or C++, to define a variable close to its usage, but unlike in C or C++, in shell declaration and assignment should be on separate lines).
Note that your examples are not the same. In the case variable A (or a) is an empty string, the first version will print an empty line (the local animal variable is empty), the second version will print the value of the global variable animal (there was no local). Although the scope should be as smallest, animal is used outside of if - so local should also be outside.
The local command constrains the variables declared to the function scope.
With that said, you can deduce that doing so inside an if block will be the same as if you did outside of it, as long as it's inside of a function.

How does "FOO= myprogram" in bash make "if(getent("FOO"))" return true in C?

I recently ran into a C program that makes use of an environmental variable as a flag to change the behavior of a certain part of the program:
if (getenv("FOO")) do_this_if_foo();
You'd then request the program by prepending the environment variable, but without actually setting it to anything:
FOO= mycommand myargs
Note that the intention of this was to trigger the flag - if you didn't want the added operation, you just wouldn't include the FOO=. However, I've never seen an environment variable set like this before. Every example I can find of prepended variables sets a value, FOO=bar mycommand myargs, rather than leaving it empty like that.
What exactly is happening here, that allows this flag to work without being set? And are there potential issues with implementing environmental variables like this?
The bash manual says:
A variable may be assigned to by a statement of the form
name=[value]
If value is not given, the variable is assigned the null string.
Note that "null" (in the sense of e.g. JavaScript null) is not a thing in the shell. When the bash manual says "null string", it means an empty string (i.e. a string whose length is zero).
Also:
When a simple command is executed, the shell performs the following expansions, assignments, and redirections, from left to right.
[...]
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
So all FOO= mycommand does is set the environment variable FOO to the empty string while executing mycommand. This satisfies if (getenv("FOO")) because it only checks for the presence of the variable, not whether it has a (non-empty) value.
Of course, any other value would work as well: FOO=1 mycommand, FOO=asdf mycommand, etc.
FOO= is just setting the variable to null (to be precise it's setting the variable to a zero-byte string, which thus returns a pointer to a NUL terminator - thanks #CharlesDuffy). Given the code you posted it could be FOO='bananas'and produce the same behavior. It's very odd to write code that way though. The common reason to set a variable on the command line is to pass a value for that variable into the script, e.g. to set debugging or logging level flags is extremely common, e.g. (pseudocode):
debug=1 logLevel=3 myscript
myscript() {
if (debug == 1) {
if (loglevel > 0) {
printf "Entering myscript()\n" >> log
if (logLevel > 1) {
printf "Arguments: %s\n" "$*" >> log
}
}
}
do_stuff()
}
Having just a "variable exists" test is a bit harder to work with because then you have to specifically unset the variable to clear the flag instead of just setting FOO=1 when you want to do something and otherwise your script doesn't care when FOO is null or 0 or unset or anything else.

Bash functions returning values meanwhile altering global variables

I'm just struggling with bash functions, and trying to return string values meanwhile some global variable is modified inside the function. An example:
MyGlobal="some value"
function ReturnAndAlter () {
MyGlobal="changed global value"
echo "Returned string"
}
str=$(ReturnAndAlter)
echo $str # prints out 'Returned value' as expected
echo $MyGlobal # prints out the initial value, not changed
This is because $(...) (and also `...` if used instead) cause the function to have its own environment, so the global variable is never affected.
I found a very dirty workaround by returning the value into another global variable and calling the function only using its name, but think that there should be a cleaner way to do it.
My dirty solution:
MyGlobal="some value"
ret_val=""
function ReturnAndAlter () {
ret_val="Returned string"
MyGlobal="changed value"
}
ReturnAndAlter # call the bare function
str=$ret_val # and assign using the auxiliary global ret_val
echo $str
echo $MyGlobal # Here both global variables are updated.
Any new ideas? Some way of calling functions that I'm missing?
Setting global variables is the only way a function has of communicating directly with the shell that calls it. The practice of "returning" a value by capturing the standard output is a bit of a hack necessitated by the shell's semantics, which are geared towards making it easy to call other programs, not making it easy to do things in the shell itself.
So, don't worry; no, you aren't missing any cool tricks. You're doing what the shell allows you to do.
The $(…) (command expansion) is run in a sub-shell.
All changes inside the sub-shell are lost when the sub-shell close.
It is usually a bad idea to use both printing a result and changing a variable inside a function. Either make all variables or just use one printed string.
There is no other solution.

Local variable declaration in /etc/init.d/functions

On RHEL, the daemon() function in /etc/init.d/functions is defined as follows:
daemon() {
# Test syntax.
local gotbase= force= nicelevel corelimit
local pid base= user= nice= bg= pid_file=
local cgroup=
nicelevel=0
... and so on ...
I'm trying to understand why some of the local variables are defined with an equals sign and some others not. What's happening here? Is this multiple declaration and assignment?
local varname
declares a local variable, but doesn't initialize it with any value.
local varname=value
declares a local variable, and also initializes it to value. You can initialize it to an empty string by providing an empty value, as in
local varname=
So in your example, pid is declared but not initialized, while base is declared and initialized to an empty string.
For most purposes there's not much difference between an unset variable and having an empty string as the value. But some of the parameter expansion operators can distinguish them. E.g.
${varname:-default}
will expand to default if varname is unset or empty, but
${varname-default}
will expand to default only if varname is unset. So if you use
${base-default}
it will expand to the empty string, not default.

Global variables: Arrays not behaving like other variables

I have a PowerShell script that is made up of a main PS1 file that then loads a number of Modules. In one of those modules I define a variable $global:locationsXml and then proceed to add to it without the global flag, and it works great. I can reference it without the global flag from any other module.
However, I also define a $global:loadedDefinitions = #() array and add to it. But I have to refer to this variable with the global flag when adding to it with +=. I can reference it in any other module without the global flag, but in the creating module I need it. And that module is the same one where the xml variable works differently/correctly.
I also have a Hash Table that I define without the global flag, but in the top level script that loads all the modules, and that I can reference without the global flag from anywhere. Additionally I have tried initializing the problem array in the parent script, like the Hash Table, but still the array requires the global flag in the module that populates it. But NOT in a different module that just reads it.
All of this is currently being tested in Windows 7 and PS 2.0.
So, before I go tearing things apart I wonder; is there a known bug, where global arrays behave differently from other global variables, specifically when being written to in a module?
I guess including the global flag for writing to the few arrays I need won't be a big deal, but I would like to understand what is going on, especially if it is somehow intended behavior rather than a bug.
Edit: To clarify, this works
Script:
Define Hash Table without global specifier;
Load Module;
Call Function in Module;
Read and write Hash Table without global specifier;
And this works
Script:
Load Module;
Call Function in Module;
Initialize Array with global specifier;
Append to Array with global specifier;
Reference Array from anywhere else WITHOUT global specifier;
This doesn't
Script:
Load Module;
Call Function in Module;
Initialize Array WITH global specifier;
Append to Array without global specifier;
Reference Array from anywhere fails;
This approach, of only initializing the variable with the global specifier and then referencing without it works for other variables, but not for arrays, "seems" to be the behavior/bug I am seeing. It is doubly odd that the global specifier only needs to be used in the module where the Array is initialized, not in any other module. I have yet to verify if it is also just in the function where it is initialized, and/or just writing to the array, not reading.
When you read from variable without scope specifier, PowerShell first look for variable in current scope, then, if find nothing, go to parent scope, until it find variable or reach the global scope. When you write to variable without scope specifier, PowerShell write that variable in current scope only.
Set-StrictMode -Version Latest #To produce VariableIsUndefined error.
&{
$global:a=1
$global:a #1
$local:a # Error VariableIsUndefined.
$a #1 Refer to global, $a as no $a in current scope.
$a=2 # Create variable $a in current scope.
$global:a #1 Global variable have old value.
$local:a #2 New local variable have new value.
$a #2 Refer to local $a.
}
Calling object's methods, property's and indexer's accessors (including set accessors) only read from variable. Writing to object is a different from writing to variable.
Set-StrictMode -Version Latest #To produce VariableIsUndefined error.
&{
$global:a=1..3
$global:a-join',' #1,2,3
$local:a -join',' # Error VariableIsUndefined.
$a -join',' #1,2,3 Refer to global $a, as no $a in current scope.
$a[0]=4; # Write to object (Array) but not to variable, variable only read here.
$global:a-join',' #4,2,3 Global variable have different content now.
$local:a -join',' # And you still does not have local one.
$a -join',' #4,2,3 Refer to global $a, as no $a in current scope.
$a+=5 # In PowerShell V2 this is equivalents to $a=$a+5.
# There are two reference to $a here.
# First one refer to local $a, as it is write to variable.
# Second refer to global $a, as no $a in current scope.
# $a+5 expression create new object and you assing it to local variable.
$global:a-join',' #4,2,3 Global variable have old value.
$local:a -join',' #4,2,3,5 But now you have local variable with new value.
$a -join',' #4,2,3,5 Refer to local $a.
}
So if you want to write to global variable from non-global scope, then you have to use global scope specifier. But if you only want to read from global variable, which is not hided by local variable with same name, you may omit global scope specifier.

Resources