Strange behaviour of declare/typeset with "options" array in zsh - shell

I attempted to declare a variable named options in a script for zsh. Turned out that it's some reserved name and zsh stores an associative array under it.
function mcve() {
options='';
}
$ mcve
mcve:1: options: attempt to set slice of associative array
Tried to look at its contents with declare and encountered a strange behaviour. The output is different before and after first occurence of the above error.
$ zsh
$ declare options
options
$ zsh
$ mcve
mcve:1: options: attempt to set slice of associative array
$ declare options
options=(autolist on printexitvalue off...<20 more lines>)
What's happening? Why is output different? Is the options array declared at the moment of first attempt to use it?
I've heard that typeset should be used instead of declare, but my man zshbuiltins says they're perfectly equal.
Also, this runs without failure:
function mcve() {
declare options;
options='';
echo ok;
}
$ zsh
$ mcve
ok
Why is this different?

The options associative array is documented in man zshmodules, under ZSH/PARAMETERS. I can't explain the behavior of declare options, but I will note that print $options[#] will output a list of on/off values even when declare options shows nothing.
In your last example, declare options inside a function definition always declares a new local variable, whether or not a global by the same name already exists.

Related

Why would I use declare / typeset in a shell script instead of just X=y?

I've recently come across a shell script that uses
declare -- FOO="" which apparently is spelled typeset -- FOO="" in non-bash shells.
Why might I want to do that instead of plain FOO="" or export FOO?
The most important purpose of using declare is to control scope, or to use array types that aren't otherwise accessible.
Using Function-Local Variables
To give you an example:
print_dashes() { for (( i=0; i<10; i++; do printf '-'; done; echo; }
while read -p "Enter a number: " i; do
print_dashes
echo "You entered: $i"
done
You'd expect that to print the number the user entered, right? But instead, it'll always print the value of i that print_dashes leaves when it's complete.
Consider instead:
print_dashes() {
declare i # ''local i'' would also have the desired effect
for (( i=0; i<10; i++; do printf '-'; done; echo;
}
...now i is local, so the newly-assigned value doesn't last beyond its invocation.
Declaring Explicitly Global Variables
Contrariwise, you sometimes want to declare a global variable, and make it clear to your code's readers that you're doing that by intent, or to do so while also declaring something as an array (or otherwise where declare would otherwise implicitly specify global state). You can do that too:
myfunc() {
declare arg # make arg local
declare -g -A myfunc_args_seen # make myfunc_args_seen a global associative array
for arg; do
myfunc_args_seen["$arg"]=1
done
echo "Across all invocations of myfunc, we have seen the following arguments:"
printf ' - %q\n' "${!myfunc_args_seen[#]}"
}
Declaring Associative Arrays
Normal shell arrays can just be assigned: my_arr=( one two three )
However, that's not the case for associative arrays, which are keyed as strings. For those, you need to declare them:
declare -A my_arr=( ["one"]=1 ["two"]=2 ["three"]=3 )
declare -i cnt=0
declares an integer-only variable, which is faster for math and always evaluates in arithmetic context.
declare -l lower="$1"
declares a variabl that automatically lowercases anything put in it, without any special syntax on access.
declare -r unchangeable="$constant"
declares a variable read-only.
Take a look at https://unix.stackexchange.com/questions/254367/in-bash-scripting-whats-the-different-between-declare-and-a-normal-variable for some useful discussion - you might not need these things often, but if you don't know what's available you're likely to work harder than you should.
A great reason to use declare, typeset, and/or readonly is code compartmentalization and reuse (i.e. encapsulation). You can write code in one script that can be sourced by others.
(Note declared/typeset/readonly constants/variables/functions lose their "readonly-ness" in a subshell, but they retain it when a child script sources their defining script since sourcing loads a script into the current shell, not a subshell.)
Since sourcing loads code from the script into the current shell though, the namespaces will overlap. To prevent a variable in a child script from being overwritten by its parent (or vice-versa, depending on where the script is sourced and the variable used), you can declare a variable readonly so it won't get overwritten.
You have to be careful with this because once you declare something readonly, you cannot unset it, so you do not want to declare something readonly that might naturally be redefined in another script. For example, if you're writing a library for general use that has logging functions, you might not want to use typeset -f on a function called warn, error, or info, since it is likely other scripts will create similar logging functions of their own with that name. In this case, it is actually standard practice to prefix the function, variable, and/or constant name with the name of the defining script and then make it readonly (e.g. my_script_warn, my_script_error, etc.). This preserves the values of the functions, variables, and/or constants as used in the logic in the code in the defining script so they don't get overwritten by sourcing scripts and accidentally fail.

Why is this Bash script not inheriting all environment variables?

I'm trying something very straightforward:
PEOPLE=(
"nick"
"bob"
)
export PEOPLE="$(IFS=, ; echo "${PEOPLE[*]}")"
echo "$PEOPLE" # prints 'nick,bob'
./process-people.sh
For some reason, process-people.sh isn't seeing $PEOPLE. As in, if I echo "$PEOPLE" from inside process-people.sh, it prints an empty line.
From what I understand, the child process created by invoking ./process-people.sh should inherit all the parent process's environment variables, including $PEOPLE.
Yet, I've tried this on both Bash 3.2.57(1)-release and 4.2.46(2)-release and it doesn't work.
What's going on here?
That's a neat solution you have there for joining the elements of a Bash array into a string. Did you know that in Bash you cannot export array variables to the environment? And if a variable is not in the environment, then the child process will not see it.
Ah. But you aren't exporting an array, are you. You're converting the array into a string and then exporting that. So it should work.
But this is Bash! Where various accidents of history conspire to give you the finger.
As #PesaThe and #chepner pointed out in the comments below, you cannot actually convert a Bash array variable to a string variable. According to the Bash reference on arrays:
Referencing an array variable without a subscript is equivalent to referencing with a subscript of 0.
So when you call export PEOPLE=... where PEOPLE was previously assigned an array value, what you're actually doing is PEOPLE[0]=.... Here's a fuller example:
PEOPLE=(
"nick"
"bob"
)
export PEOPLE="super"
echo "$PEOPLE" # masks the fact that PEOPLE is still an array and just prints 'super'
echo "${PEOPLE[*]}" # prints 'super bob'
It's unfortunate that the export silently fails to export the array to the environment (it returns 0), and it's confusing that Bash equates ARRAY_VARIABLE to ARRAY_VARIABLE[0] in certain situations. We'll just have to chalk that up to a combination of history and backwards compatibility.
Here's a working solution to your problem:
PEOPLE_ARRAY=(
"nick"
"bob"
)
export PEOPLE="$(IFS=, ; echo "${PEOPLE_ARRAY[*]}")"
echo "$PEOPLE" # prints 'nick,bob'
./process-people.sh
The key here is to assign the array and derived string to different variables. Since PEOPLE is a proper string variable, it will export just fine and process-people.sh will work as expected.
It's not possible to directly change a Bash array variable into a string variable. Once a variable is assigned an array value, it becomes an array variable. The only way to change it back to a string variable is to destroy it with unset and recreate it.
Bash has a couple of handy commands for inspecting variables that are useful for investigating these kinds of issues:
printenv PEOPLE # prints 'nick,bob'
declare -p PEOPLE_ARRAY # prints: declare -ax PEOPLE_ARRAY='([0]="nick" [1]="bob")'
printenv will only return a value for environment variables, vs. echo, which will print a result whether the variable has been properly exported or not.
declare -p will show the full value of a variable, without the gotchas related to including or leaving out array index references (e.g. ARRAY_VARIABLE[*]).

Why doesn't this custom sourcing function make my declared variable globally available?

I'm facing a very weird issue. I know I'm missing something basic but for the life of me I can't quite figure out what.
Consider these declarations in a file tmp.sh:
declare -A aa
aa[1]=hello
aa[2]=world
myfunc() {
echo exists
}
myvar=exists
I source the script as source tmp.sh and run:
myfunc
echo $myvar
echo ${aa[#]}
The output is:
exists
exists
hello world
Now I do the same thing but put the source statement in a function:
mysource() {
filename="$1"
source "$filename"
}
This time the output is:
exists
exists
What's going on here?
Add the -g option to declare. [1]
From the manual
-g create global variables when used in a shell function; otherwise ignored (by default, declare declares local scope variables when used in shell functions)
Also useful to mention from chepner's comment below
source works by executing the contents of the file exactly as if you replaced the source command with contents of the file. Even though the declare statements are not in a function in your file, they are part of the function that calls source.
[1] The -g option requires Bash 4.2 or above.
To complement 123's helpful answer:
By default, declare creates a local variable when used in a function (to put it differently: inside a function, declare by default behaves the same as local).
To create a global variable from inside a function:
Bash 4.2+:
use declare -g (e.g., declare -g foo='bar')
Older Bash versions, including 3.x:
simply assign a value to the variable (e.g., foo='bar'), do not use declare.
As an aside:
Your sample code uses declare -A to declare an associative array, which requires Bash 4.0.
Associative arrays are the only types of (non-environment) variables that strictly need a declare statement for their creation - you cannot create an associative array without declare -A, whereas you can create (non-integer-typed) scalars and arrays implicitly by simple assignment.
Thus, given that declare -g requires Bash 4.2, there is no solution to your problem if you happen to be stuck on 4.0 or 4.1.
3.x versions of Bash don't face this problem, because they don't support declare -A altogether.

why bash throws unbound variable warning when I declare a local array in function whose name is shadowing a global one?

In this example declaring local variable with different name from that of global scope produces no error but when name is the same as global I get:
line 5: !1: unbound variable
code:
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr myArr[#]
Just to make sure we are on the same sheet of paper, here is the version working on Bash 3.2 (works fine quoted or unquoted). You must either have an environment setting or stray characters in your file, of something unrelated to your script causing issues:
#!/bin/bash
set -u
function get_arr {
local myArr2=("${!1}")
echo ${myArr2[*]}
local myArr=("${!1}")
echo ${myArr[*]}
}
myArr=(one two three)
get_arr "myArr[#]"
exit 0
Version
$ bash --version
GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
Output
$ bash array_indirect_ref.sh
one two three
one two three
Execution
$ bash -x array_indirect_ref.sh
+ set -u
+ myArr=(one two three)
+ get_arr 'myArr[#]'
+ myArr2=("${!1}")
+ local myArr2
+ echo one two three
one two three
+ myArr=("${!1}")
+ local myArr
+ echo one two three
one two three
Update: it appears that how you declare the passed array inside your function affects whether or not shadowed names will work, even in new bash versions.
I have some bash code that used to work, as of last week, but now fails after I updated cygwin to its current code.
~~~~~~~~~~
My cygwin bash version is now 4.3.39:
$ bash --version
GNU bash, version 4.3.39(2)-release (i686-pc-cygwin)
which is the latest.
~~~~~~~~~~
Consider this bash code:
#!/bin/bash
set -e # exit on first failed command
set -u # exit if encounter never set variable
testArrayArg1() {
declare -a argArray=("${!1}")
echo "testArrayArg1: ${argArray[#]}"
}
testArrayArg2() {
declare -a anArray=("${!1}")
echo "testArrayArg2: ${anArray[#]}"
}
anArray=("a" "b" "c")
testArrayArg1 anArray[#]
testArrayArg2 anArray[#]
Note that testArrayArg2 function uses an array name (anArray) which shadows the subsequent variable name in the script.
Also note that the way I pass the array to the function (anArray[#]) and the way that I declare the array in the function (declare -a anArray=("${!1}")) are taken from Ken Bertelson's answer here.
Both functions above used to always work.
Now, after my cygwin/bash update, testArrayArg1 still works but testArrayArg2 which uses a shadowed array name fails:
$ bash t.sh
testArrayArg1: a b c
t.sh: line 11: !1: unbound variable
Anyone know what changed recently in bash to cause this?
~~~~~~~~~~
I can fix this if I change how I declare the array inside the function from declare -a anArray=("${!1}") to your "local" idiom of local anArray=("${!1}").
So, this code
testArrayArg3() {
local anArray=("${!1}")
echo "testArrayArg3: ${anArray[#]}"
}
testArrayArg3 anArray[#]
works:
testArrayArg3: a b c
~~~~~~~~~~
OK, so the local anArray=("${!1}") function array arg declaration idiom seems to work.
This idiom is mentioned in that SO link that I mentioned above in a hidden comment under Ken Bertelson's answer. To see it, click on the "show 3 more" link and check out Mike Q's comment.
Is it every bit as good as the declare -a anArray=("${!1}") idiom, or does it have drawbacks of its own?
I have some critical code that depends on passing arrays to bash functions, so I really need to get this straight.

Associative arrays are local by default

Associative arrays seem to be local by default when declared inside a function body, where they should be global. The following code
#!/bin/bash
f() {
declare -A map
map[x]=a
map[y]=b
}
f
echo x: ${map[x]} y: ${map[y]}
produces the output:
x: y:
while this
#!/bin/bash
declare -A map
f() {
map[x]=a
map[y]=b
}
f
echo x: ${map[x]} y: ${map[y]}
produces the output:
x: a y: b
Is it possible to declare a global associative array within a function?
Or what work-around can be used?
From: Greg Wooledge
Sent: Tue, 23 Aug 2011 06:53:27 -0700
Subject: Re: YAQAGV (Yet Another Question About Global Variables)
bash 4.2 adds "declare -g" to create global variables from within a
function.
Thank you Greg! However Debian Squeeze still has Bash 4.1.5
Fine, 4.2 adds "declare -g" but it's buggy for associative arrays so it doesn't (yet) answer the question. Here's my bug report and Chet's confirmation that there's a fix scheduled for the next release.
http://lists.gnu.org/archive/html/bug-bash/2013-09/msg00025.html
But I've serendipitously found a workaround, instead of declaring the array and assigning an initial value to it at the same time, first declare the array and then do the assignment. That is, don't do this:
declare -gA a=([x]=1 [y]=2)
but this instead:
declare -gA a; a=([x]=1 [y]=2)
You have already answered your own question with declare -g. The workaround on bash versions < 4.2 is to declare the array outside of the function.
f() {
map[y] = foo
}
declare -A map
foo
echo "${map[y]}"
This example declares a global associative array variable inside a function, in bash.
set -euf +x -o pipefail # There is no place for implicit errors in this script.
function init_arrays(){
# FYI. Multiple array declarations are not a problem. You can invoke it multiple times.
# The "-gA" switch is the trick for the global array declaration inside a function.
declare -gA my_var
}
function do_work(){
init_arrays
my_var[$1]=OPPA
}
do_work aa
echo ${my_var[aa]}
echo It is expected to get OPPA value printed above
Tested on GNU bash, version 4.4...
Important notes.
The declare -A command doesn't actually create an associative array immediately; it just sets an attribute on a variable name which allows you to assign to the name as an associative array. The array itself doesn't exist until the first assignment (!!!).
(I wanted to see a complete working example in this thread, sorry.)
For those who are stuck with Bash version < 4.2 and are not comfortable with proposed workarounds I share my custom implementation of global associative arrays. It does not have the full power of bash associative arrays and you need to be careful about special characters in array index, but gets job done.
get_array(){
local arr_name="$1"
local arr_key="$2"
arr_namekey_var="ASSOCARRAY__${arr_name}__${arr_key}"
echo "${!arr_namekey_var:=}"
}
set_array(){
local arr_name="$1"
local arr_key="$2"
local arr_value="$3"
arr_namekey_var="ASSOCARRAY__${arr_name}__${arr_key}"
if [[ -z "${arr_value}" ]]; then
eval ${arr_namekey_var}=
else
printf -v "${arr_namekey_var}" "${arr_value}"
fi
}
Few notes:
Array name and array key could be combined into a single value, but split proved convenient in practice.
__ as a separator can by hacked by malicious or careless use -- to be on the safe side use only single-underscore values in array name and key, on top of only using alphanumeric values. Of course the composition of the internal variable (separators, prefix, suffix...) can be adjusted to application and developer needs.
The default value expansion guarantees that undefined array key (and also array name!) will expand to null string.
Once you move to version of bash where you are comfortable with builtin associative arrays, these two procedures can be used as wrappers for actual associative arrays without having to refactor whole code base.

Resources