Associative arrays are local by default - bash

Associative arrays seem to be local by default when declared inside a function body, where they should be global. The following code
#!/bin/bash
f() {
declare -A map
map[x]=a
map[y]=b
}
f
echo x: ${map[x]} y: ${map[y]}
produces the output:
x: y:
while this
#!/bin/bash
declare -A map
f() {
map[x]=a
map[y]=b
}
f
echo x: ${map[x]} y: ${map[y]}
produces the output:
x: a y: b
Is it possible to declare a global associative array within a function?
Or what work-around can be used?

From: Greg Wooledge
Sent: Tue, 23 Aug 2011 06:53:27 -0700
Subject: Re: YAQAGV (Yet Another Question About Global Variables)
bash 4.2 adds "declare -g" to create global variables from within a
function.
Thank you Greg! However Debian Squeeze still has Bash 4.1.5

Fine, 4.2 adds "declare -g" but it's buggy for associative arrays so it doesn't (yet) answer the question. Here's my bug report and Chet's confirmation that there's a fix scheduled for the next release.
http://lists.gnu.org/archive/html/bug-bash/2013-09/msg00025.html
But I've serendipitously found a workaround, instead of declaring the array and assigning an initial value to it at the same time, first declare the array and then do the assignment. That is, don't do this:
declare -gA a=([x]=1 [y]=2)
but this instead:
declare -gA a; a=([x]=1 [y]=2)

You have already answered your own question with declare -g. The workaround on bash versions < 4.2 is to declare the array outside of the function.
f() {
map[y] = foo
}
declare -A map
foo
echo "${map[y]}"

This example declares a global associative array variable inside a function, in bash.
set -euf +x -o pipefail # There is no place for implicit errors in this script.
function init_arrays(){
# FYI. Multiple array declarations are not a problem. You can invoke it multiple times.
# The "-gA" switch is the trick for the global array declaration inside a function.
declare -gA my_var
}
function do_work(){
init_arrays
my_var[$1]=OPPA
}
do_work aa
echo ${my_var[aa]}
echo It is expected to get OPPA value printed above
Tested on GNU bash, version 4.4...
Important notes.
The declare -A command doesn't actually create an associative array immediately; it just sets an attribute on a variable name which allows you to assign to the name as an associative array. The array itself doesn't exist until the first assignment (!!!).
(I wanted to see a complete working example in this thread, sorry.)

For those who are stuck with Bash version < 4.2 and are not comfortable with proposed workarounds I share my custom implementation of global associative arrays. It does not have the full power of bash associative arrays and you need to be careful about special characters in array index, but gets job done.
get_array(){
local arr_name="$1"
local arr_key="$2"
arr_namekey_var="ASSOCARRAY__${arr_name}__${arr_key}"
echo "${!arr_namekey_var:=}"
}
set_array(){
local arr_name="$1"
local arr_key="$2"
local arr_value="$3"
arr_namekey_var="ASSOCARRAY__${arr_name}__${arr_key}"
if [[ -z "${arr_value}" ]]; then
eval ${arr_namekey_var}=
else
printf -v "${arr_namekey_var}" "${arr_value}"
fi
}
Few notes:
Array name and array key could be combined into a single value, but split proved convenient in practice.
__ as a separator can by hacked by malicious or careless use -- to be on the safe side use only single-underscore values in array name and key, on top of only using alphanumeric values. Of course the composition of the internal variable (separators, prefix, suffix...) can be adjusted to application and developer needs.
The default value expansion guarantees that undefined array key (and also array name!) will expand to null string.
Once you move to version of bash where you are comfortable with builtin associative arrays, these two procedures can be used as wrappers for actual associative arrays without having to refactor whole code base.

Related

Why would I use declare / typeset in a shell script instead of just X=y?

I've recently come across a shell script that uses
declare -- FOO="" which apparently is spelled typeset -- FOO="" in non-bash shells.
Why might I want to do that instead of plain FOO="" or export FOO?
The most important purpose of using declare is to control scope, or to use array types that aren't otherwise accessible.
Using Function-Local Variables
To give you an example:
print_dashes() { for (( i=0; i<10; i++; do printf '-'; done; echo; }
while read -p "Enter a number: " i; do
print_dashes
echo "You entered: $i"
done
You'd expect that to print the number the user entered, right? But instead, it'll always print the value of i that print_dashes leaves when it's complete.
Consider instead:
print_dashes() {
declare i # ''local i'' would also have the desired effect
for (( i=0; i<10; i++; do printf '-'; done; echo;
}
...now i is local, so the newly-assigned value doesn't last beyond its invocation.
Declaring Explicitly Global Variables
Contrariwise, you sometimes want to declare a global variable, and make it clear to your code's readers that you're doing that by intent, or to do so while also declaring something as an array (or otherwise where declare would otherwise implicitly specify global state). You can do that too:
myfunc() {
declare arg # make arg local
declare -g -A myfunc_args_seen # make myfunc_args_seen a global associative array
for arg; do
myfunc_args_seen["$arg"]=1
done
echo "Across all invocations of myfunc, we have seen the following arguments:"
printf ' - %q\n' "${!myfunc_args_seen[#]}"
}
Declaring Associative Arrays
Normal shell arrays can just be assigned: my_arr=( one two three )
However, that's not the case for associative arrays, which are keyed as strings. For those, you need to declare them:
declare -A my_arr=( ["one"]=1 ["two"]=2 ["three"]=3 )
declare -i cnt=0
declares an integer-only variable, which is faster for math and always evaluates in arithmetic context.
declare -l lower="$1"
declares a variabl that automatically lowercases anything put in it, without any special syntax on access.
declare -r unchangeable="$constant"
declares a variable read-only.
Take a look at https://unix.stackexchange.com/questions/254367/in-bash-scripting-whats-the-different-between-declare-and-a-normal-variable for some useful discussion - you might not need these things often, but if you don't know what's available you're likely to work harder than you should.
A great reason to use declare, typeset, and/or readonly is code compartmentalization and reuse (i.e. encapsulation). You can write code in one script that can be sourced by others.
(Note declared/typeset/readonly constants/variables/functions lose their "readonly-ness" in a subshell, but they retain it when a child script sources their defining script since sourcing loads a script into the current shell, not a subshell.)
Since sourcing loads code from the script into the current shell though, the namespaces will overlap. To prevent a variable in a child script from being overwritten by its parent (or vice-versa, depending on where the script is sourced and the variable used), you can declare a variable readonly so it won't get overwritten.
You have to be careful with this because once you declare something readonly, you cannot unset it, so you do not want to declare something readonly that might naturally be redefined in another script. For example, if you're writing a library for general use that has logging functions, you might not want to use typeset -f on a function called warn, error, or info, since it is likely other scripts will create similar logging functions of their own with that name. In this case, it is actually standard practice to prefix the function, variable, and/or constant name with the name of the defining script and then make it readonly (e.g. my_script_warn, my_script_error, etc.). This preserves the values of the functions, variables, and/or constants as used in the logic in the code in the defining script so they don't get overwritten by sourcing scripts and accidentally fail.

Why doesn't this custom sourcing function make my declared variable globally available?

I'm facing a very weird issue. I know I'm missing something basic but for the life of me I can't quite figure out what.
Consider these declarations in a file tmp.sh:
declare -A aa
aa[1]=hello
aa[2]=world
myfunc() {
echo exists
}
myvar=exists
I source the script as source tmp.sh and run:
myfunc
echo $myvar
echo ${aa[#]}
The output is:
exists
exists
hello world
Now I do the same thing but put the source statement in a function:
mysource() {
filename="$1"
source "$filename"
}
This time the output is:
exists
exists
What's going on here?
Add the -g option to declare. [1]
From the manual
-g create global variables when used in a shell function; otherwise ignored (by default, declare declares local scope variables when used in shell functions)
Also useful to mention from chepner's comment below
source works by executing the contents of the file exactly as if you replaced the source command with contents of the file. Even though the declare statements are not in a function in your file, they are part of the function that calls source.
[1] The -g option requires Bash 4.2 or above.
To complement 123's helpful answer:
By default, declare creates a local variable when used in a function (to put it differently: inside a function, declare by default behaves the same as local).
To create a global variable from inside a function:
Bash 4.2+:
use declare -g (e.g., declare -g foo='bar')
Older Bash versions, including 3.x:
simply assign a value to the variable (e.g., foo='bar'), do not use declare.
As an aside:
Your sample code uses declare -A to declare an associative array, which requires Bash 4.0.
Associative arrays are the only types of (non-environment) variables that strictly need a declare statement for their creation - you cannot create an associative array without declare -A, whereas you can create (non-integer-typed) scalars and arrays implicitly by simple assignment.
Thus, given that declare -g requires Bash 4.2, there is no solution to your problem if you happen to be stuck on 4.0 or 4.1.
3.x versions of Bash don't face this problem, because they don't support declare -A altogether.

Strange behaviour of declare/typeset with "options" array in zsh

I attempted to declare a variable named options in a script for zsh. Turned out that it's some reserved name and zsh stores an associative array under it.
function mcve() {
options='';
}
$ mcve
mcve:1: options: attempt to set slice of associative array
Tried to look at its contents with declare and encountered a strange behaviour. The output is different before and after first occurence of the above error.
$ zsh
$ declare options
options
$ zsh
$ mcve
mcve:1: options: attempt to set slice of associative array
$ declare options
options=(autolist on printexitvalue off...<20 more lines>)
What's happening? Why is output different? Is the options array declared at the moment of first attempt to use it?
I've heard that typeset should be used instead of declare, but my man zshbuiltins says they're perfectly equal.
Also, this runs without failure:
function mcve() {
declare options;
options='';
echo ok;
}
$ zsh
$ mcve
ok
Why is this different?
The options associative array is documented in man zshmodules, under ZSH/PARAMETERS. I can't explain the behavior of declare options, but I will note that print $options[#] will output a list of on/off values even when declare options shows nothing.
In your last example, declare options inside a function definition always declares a new local variable, whether or not a global by the same name already exists.

why bash throws unbound variable warning when I declare a local array in function whose name is shadowing a global one?

In this example declaring local variable with different name from that of global scope produces no error but when name is the same as global I get:
line 5: !1: unbound variable
code:
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr myArr[#]
Just to make sure we are on the same sheet of paper, here is the version working on Bash 3.2 (works fine quoted or unquoted). You must either have an environment setting or stray characters in your file, of something unrelated to your script causing issues:
#!/bin/bash
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr "myArr[#]"
exit 0
Version
$ bash --version
GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
Output
$ bash array_indirect_ref.sh
one two three
one two three
Execution
$ bash -x array_indirect_ref.sh
+ set -u
+ myArr=(one two three)
+ get_arr 'myArr[#]'
+ myArr2=("${!1}")
+ local myArr2
+ echo one two three
one two three
+ myArr=("${!1}")
+ local myArr
+ echo one two three
one two three
Update: it appears that how you declare the passed array inside your function affects whether or not shadowed names will work, even in new bash versions.
I have some bash code that used to work, as of last week, but now fails after I updated cygwin to its current code.
~~~~~~~~~~
My cygwin bash version is now 4.3.39:
$ bash --version
GNU bash, version 4.3.39(2)-release (i686-pc-cygwin)
which is the latest.
~~~~~~~~~~
Consider this bash code:
#!/bin/bash
set -e # exit on first failed command
set -u # exit if encounter never set variable
testArrayArg1() {
declare -a argArray=("${!1}")
echo "testArrayArg1: ${argArray[#]}"
}
testArrayArg2() {
declare -a anArray=("${!1}")
echo "testArrayArg2: ${anArray[#]}"
}
anArray=("a" "b" "c")
testArrayArg1 anArray[#]
testArrayArg2 anArray[#]
Note that testArrayArg2 function uses an array name (anArray) which shadows the subsequent variable name in the script.
Also note that the way I pass the array to the function (anArray[#]) and the way that I declare the array in the function (declare -a anArray=("${!1}")) are taken from Ken Bertelson's answer here.
Both functions above used to always work.
Now, after my cygwin/bash update, testArrayArg1 still works but testArrayArg2 which uses a shadowed array name fails:
$ bash t.sh
testArrayArg1: a b c
t.sh: line 11: !1: unbound variable
Anyone know what changed recently in bash to cause this?
~~~~~~~~~~
I can fix this if I change how I declare the array inside the function from declare -a anArray=("${!1}") to your "local" idiom of local anArray=("${!1}").
So, this code
testArrayArg3() {
local anArray=("${!1}")
echo "testArrayArg3: ${anArray[#]}"
}
testArrayArg3 anArray[#]
works:
testArrayArg3: a b c
~~~~~~~~~~
OK, so the local anArray=("${!1}") function array arg declaration idiom seems to work.
This idiom is mentioned in that SO link that I mentioned above in a hidden comment under Ken Bertelson's answer. To see it, click on the "show 3 more" link and check out Mike Q's comment.
Is it every bit as good as the declare -a anArray=("${!1}") idiom, or does it have drawbacks of its own?
I have some critical code that depends on passing arrays to bash functions, so I really need to get this straight.

bash command expansion

The following bash command substitution does not work as I thought.
echo $TMUX_$(echo 1)
only prints 1 and I am expecting the value of the variable $TMUX_1.I also tried:
echo ${TMUX_$(echo 1)}
-bash: ${TMUXPWD_$(echo 1)}: bad substitution
Any suggestions ?
If I understand correctly what you're looking for, you're trying to programatically construct a variable name and then access the value of that variable. Doing this sort of thing normally requires an eval statement:
eval "echo \$TMUX_$(echo 1)"
Important features of this statement include the use of double-quotes, so that the $( ) gets properly interpreted as a command substitution, and the escaping of the first $ so that it doesn't get evaluated the first time through. Another way to achieve the same thing is
eval 'echo $TMUX_'"$(echo 1)"
where in this case I used two strings which automatically get concatenated. The first is single-quoted so that it's not evaluated at first.
There is one exception to the eval requirement: Bash has a method of indirect referencing, ${!name}, for when you want to use the contents of a variable as a variable name. You could use this as follows:
tmux_var = "TMUX_$(echo 1)"
echo ${!tmux_var}
I'm not sure if there's a way to do it in one statement, though, since you have to have a named variable for this to work.
P.S. I'm assuming that echo 1 is just a stand-in for some more complicated command ;-)
Are you looking for arrays? Bash has them. There are a number of ways to create and use arrays in bash, the section of the bash manpage on arrays is highly recommended. Here is a sample of code:
TMUX=( "zero", "one", "two" )
echo ${TMUX[2]}
The result in this case is, of course, two.
Here are a few short lines from the bash manpage:
Bash provides one-dimensional indexed and associative array variables. Any variable may be
used as an indexed array; the declare builtin will explicitly declare an array. There is
no maximum limit on the size of an array, nor any requirement that members be indexed or
assigned contiguously. Indexed arrays are referenced using integers (including arithmetic
expressions) and are zero-based; associative arrays are referenced using arbitrary
strings.
An indexed array is created automatically if any variable is assigned to using the syntax
name[subscript]=value. The subscript is treated as an arithmetic expression that must
evaluate to a number greater than or equal to zero. To explicitly declare an indexed
array, use declare -a name (see SHELL BUILTIN COMMANDS below). declare -a name[subscript]
is also accepted; the subscript is ignored.
This works (tested):
eval echo \$TMUX_`echo 1`
Probably not very clear though. Pretty sure any solutions will require backticks around the echo to get that to work.

Resources