Why is this Bash script not inheriting all environment variables? - bash

I'm trying something very straightforward:
PEOPLE=(
"nick"
"bob"
)
export PEOPLE="$(IFS=, ; echo "${PEOPLE[*]}")"
echo "$PEOPLE" # prints 'nick,bob'
./process-people.sh
For some reason, process-people.sh isn't seeing $PEOPLE. As in, if I echo "$PEOPLE" from inside process-people.sh, it prints an empty line.
From what I understand, the child process created by invoking ./process-people.sh should inherit all the parent process's environment variables, including $PEOPLE.
Yet, I've tried this on both Bash 3.2.57(1)-release and 4.2.46(2)-release and it doesn't work.
What's going on here?

That's a neat solution you have there for joining the elements of a Bash array into a string. Did you know that in Bash you cannot export array variables to the environment? And if a variable is not in the environment, then the child process will not see it.
Ah. But you aren't exporting an array, are you. You're converting the array into a string and then exporting that. So it should work.
But this is Bash! Where various accidents of history conspire to give you the finger.
As #PesaThe and #chepner pointed out in the comments below, you cannot actually convert a Bash array variable to a string variable. According to the Bash reference on arrays:
Referencing an array variable without a subscript is equivalent to referencing with a subscript of 0.
So when you call export PEOPLE=... where PEOPLE was previously assigned an array value, what you're actually doing is PEOPLE[0]=.... Here's a fuller example:
PEOPLE=(
"nick"
"bob"
)
export PEOPLE="super"
echo "$PEOPLE" # masks the fact that PEOPLE is still an array and just prints 'super'
echo "${PEOPLE[*]}" # prints 'super bob'
It's unfortunate that the export silently fails to export the array to the environment (it returns 0), and it's confusing that Bash equates ARRAY_VARIABLE to ARRAY_VARIABLE[0] in certain situations. We'll just have to chalk that up to a combination of history and backwards compatibility.
Here's a working solution to your problem:
PEOPLE_ARRAY=(
"nick"
"bob"
)
export PEOPLE="$(IFS=, ; echo "${PEOPLE_ARRAY[*]}")"
echo "$PEOPLE" # prints 'nick,bob'
./process-people.sh
The key here is to assign the array and derived string to different variables. Since PEOPLE is a proper string variable, it will export just fine and process-people.sh will work as expected.
It's not possible to directly change a Bash array variable into a string variable. Once a variable is assigned an array value, it becomes an array variable. The only way to change it back to a string variable is to destroy it with unset and recreate it.
Bash has a couple of handy commands for inspecting variables that are useful for investigating these kinds of issues:
printenv PEOPLE # prints 'nick,bob'
declare -p PEOPLE_ARRAY # prints: declare -ax PEOPLE_ARRAY='([0]="nick" [1]="bob")'
printenv will only return a value for environment variables, vs. echo, which will print a result whether the variable has been properly exported or not.
declare -p will show the full value of a variable, without the gotchas related to including or leaving out array index references (e.g. ARRAY_VARIABLE[*]).

Related

Why would I use declare / typeset in a shell script instead of just X=y?

I've recently come across a shell script that uses
declare -- FOO="" which apparently is spelled typeset -- FOO="" in non-bash shells.
Why might I want to do that instead of plain FOO="" or export FOO?
The most important purpose of using declare is to control scope, or to use array types that aren't otherwise accessible.
Using Function-Local Variables
To give you an example:
print_dashes() { for (( i=0; i<10; i++; do printf '-'; done; echo; }
while read -p "Enter a number: " i; do
print_dashes
echo "You entered: $i"
done
You'd expect that to print the number the user entered, right? But instead, it'll always print the value of i that print_dashes leaves when it's complete.
Consider instead:
print_dashes() {
declare i # ''local i'' would also have the desired effect
for (( i=0; i<10; i++; do printf '-'; done; echo;
}
...now i is local, so the newly-assigned value doesn't last beyond its invocation.
Declaring Explicitly Global Variables
Contrariwise, you sometimes want to declare a global variable, and make it clear to your code's readers that you're doing that by intent, or to do so while also declaring something as an array (or otherwise where declare would otherwise implicitly specify global state). You can do that too:
myfunc() {
declare arg # make arg local
declare -g -A myfunc_args_seen # make myfunc_args_seen a global associative array
for arg; do
myfunc_args_seen["$arg"]=1
done
echo "Across all invocations of myfunc, we have seen the following arguments:"
printf ' - %q\n' "${!myfunc_args_seen[#]}"
}
Declaring Associative Arrays
Normal shell arrays can just be assigned: my_arr=( one two three )
However, that's not the case for associative arrays, which are keyed as strings. For those, you need to declare them:
declare -A my_arr=( ["one"]=1 ["two"]=2 ["three"]=3 )
declare -i cnt=0
declares an integer-only variable, which is faster for math and always evaluates in arithmetic context.
declare -l lower="$1"
declares a variabl that automatically lowercases anything put in it, without any special syntax on access.
declare -r unchangeable="$constant"
declares a variable read-only.
Take a look at https://unix.stackexchange.com/questions/254367/in-bash-scripting-whats-the-different-between-declare-and-a-normal-variable for some useful discussion - you might not need these things often, but if you don't know what's available you're likely to work harder than you should.
A great reason to use declare, typeset, and/or readonly is code compartmentalization and reuse (i.e. encapsulation). You can write code in one script that can be sourced by others.
(Note declared/typeset/readonly constants/variables/functions lose their "readonly-ness" in a subshell, but they retain it when a child script sources their defining script since sourcing loads a script into the current shell, not a subshell.)
Since sourcing loads code from the script into the current shell though, the namespaces will overlap. To prevent a variable in a child script from being overwritten by its parent (or vice-versa, depending on where the script is sourced and the variable used), you can declare a variable readonly so it won't get overwritten.
You have to be careful with this because once you declare something readonly, you cannot unset it, so you do not want to declare something readonly that might naturally be redefined in another script. For example, if you're writing a library for general use that has logging functions, you might not want to use typeset -f on a function called warn, error, or info, since it is likely other scripts will create similar logging functions of their own with that name. In this case, it is actually standard practice to prefix the function, variable, and/or constant name with the name of the defining script and then make it readonly (e.g. my_script_warn, my_script_error, etc.). This preserves the values of the functions, variables, and/or constants as used in the logic in the code in the defining script so they don't get overwritten by sourcing scripts and accidentally fail.

How does "FOO= myprogram" in bash make "if(getent("FOO"))" return true in C?

I recently ran into a C program that makes use of an environmental variable as a flag to change the behavior of a certain part of the program:
if (getenv("FOO")) do_this_if_foo();
You'd then request the program by prepending the environment variable, but without actually setting it to anything:
FOO= mycommand myargs
Note that the intention of this was to trigger the flag - if you didn't want the added operation, you just wouldn't include the FOO=. However, I've never seen an environment variable set like this before. Every example I can find of prepended variables sets a value, FOO=bar mycommand myargs, rather than leaving it empty like that.
What exactly is happening here, that allows this flag to work without being set? And are there potential issues with implementing environmental variables like this?
The bash manual says:
A variable may be assigned to by a statement of the form
name=[value]
If value is not given, the variable is assigned the null string.
Note that "null" (in the sense of e.g. JavaScript null) is not a thing in the shell. When the bash manual says "null string", it means an empty string (i.e. a string whose length is zero).
Also:
When a simple command is executed, the shell performs the following expansions, assignments, and redirections, from left to right.
[...]
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
So all FOO= mycommand does is set the environment variable FOO to the empty string while executing mycommand. This satisfies if (getenv("FOO")) because it only checks for the presence of the variable, not whether it has a (non-empty) value.
Of course, any other value would work as well: FOO=1 mycommand, FOO=asdf mycommand, etc.
FOO= is just setting the variable to null (to be precise it's setting the variable to a zero-byte string, which thus returns a pointer to a NUL terminator - thanks #CharlesDuffy). Given the code you posted it could be FOO='bananas'and produce the same behavior. It's very odd to write code that way though. The common reason to set a variable on the command line is to pass a value for that variable into the script, e.g. to set debugging or logging level flags is extremely common, e.g. (pseudocode):
debug=1 logLevel=3 myscript
myscript() {
if (debug == 1) {
if (loglevel > 0) {
printf "Entering myscript()\n" >> log
if (logLevel > 1) {
printf "Arguments: %s\n" "$*" >> log
}
}
}
do_stuff()
}
Having just a "variable exists" test is a bit harder to work with because then you have to specifically unset the variable to clear the flag instead of just setting FOO=1 when you want to do something and otherwise your script doesn't care when FOO is null or 0 or unset or anything else.

Using a variable for associative array key in Bash

I'm trying to create associative arrays based on variables. So below is a super simplified version of what I'm trying to do (the ls command is not really what I want, just used here for illustrative purposes)...
I have a statically defined array (text-a,text-b). I then want to iterate through that array, and create associative arrays with those names and _AA appended to them (so associative arrays called text-a_AA and text-b_AA).
I don't really need the _AA appended, but was thinking it might be
necessary to avoid duplicate names since $NAME is already being used
in the loop.
I will need those defined and will be referencing them in later parts of the script, and not just within the for loop seen below where I'm trying to define them... I want to later, for example, be able to reference text-a_AA[NUM] (again, using variables for the text-a_AA part). Clearly what I have below doesn't work... and from what I can tell, I need to be using namerefs? I've tried to get the syntax right, and just can't seem to figure it out... any help would be greatly appreciated!
#!/usr/bin/env bash
NAMES=('text-a' 'text-b')
for NAME in "${NAMES[#]}"
do
NAME_AA="${NAME}_AA"
$NAME_AA[NUM]=$(cat $NAME | wc -l)
done
for NAME in "${NAMES[#]}"
do
echo "max: ${$NAME_AA[NUM]}"
done
You may want to use "NUM" as the name of the associative array and file name as the key. Then you can rewrite your code as:
NUM[${NAME}_AA]=$(wc -l < "$NAME")
Then rephrase your loop as:
for NAME in "${NAMES[#]}"
do
echo "max: ${NUM[${NAME}_AA]}"
done
Check your script at shellcheck.net
As an aside: all uppercase is not a good practice for naming normal shell variables. You may want to take a look at:
Correct Bash and shell script variable capitalization

Bash functions returning values meanwhile altering global variables

I'm just struggling with bash functions, and trying to return string values meanwhile some global variable is modified inside the function. An example:
MyGlobal="some value"
function ReturnAndAlter () {
MyGlobal="changed global value"
echo "Returned string"
}
str=$(ReturnAndAlter)
echo $str # prints out 'Returned value' as expected
echo $MyGlobal # prints out the initial value, not changed
This is because $(...) (and also `...` if used instead) cause the function to have its own environment, so the global variable is never affected.
I found a very dirty workaround by returning the value into another global variable and calling the function only using its name, but think that there should be a cleaner way to do it.
My dirty solution:
MyGlobal="some value"
ret_val=""
function ReturnAndAlter () {
ret_val="Returned string"
MyGlobal="changed value"
}
ReturnAndAlter # call the bare function
str=$ret_val # and assign using the auxiliary global ret_val
echo $str
echo $MyGlobal # Here both global variables are updated.
Any new ideas? Some way of calling functions that I'm missing?
Setting global variables is the only way a function has of communicating directly with the shell that calls it. The practice of "returning" a value by capturing the standard output is a bit of a hack necessitated by the shell's semantics, which are geared towards making it easy to call other programs, not making it easy to do things in the shell itself.
So, don't worry; no, you aren't missing any cool tricks. You're doing what the shell allows you to do.
The $(…) (command expansion) is run in a sub-shell.
All changes inside the sub-shell are lost when the sub-shell close.
It is usually a bad idea to use both printing a result and changing a variable inside a function. Either make all variables or just use one printed string.
There is no other solution.

bash command expansion

The following bash command substitution does not work as I thought.
echo $TMUX_$(echo 1)
only prints 1 and I am expecting the value of the variable $TMUX_1.I also tried:
echo ${TMUX_$(echo 1)}
-bash: ${TMUXPWD_$(echo 1)}: bad substitution
Any suggestions ?
If I understand correctly what you're looking for, you're trying to programatically construct a variable name and then access the value of that variable. Doing this sort of thing normally requires an eval statement:
eval "echo \$TMUX_$(echo 1)"
Important features of this statement include the use of double-quotes, so that the $( ) gets properly interpreted as a command substitution, and the escaping of the first $ so that it doesn't get evaluated the first time through. Another way to achieve the same thing is
eval 'echo $TMUX_'"$(echo 1)"
where in this case I used two strings which automatically get concatenated. The first is single-quoted so that it's not evaluated at first.
There is one exception to the eval requirement: Bash has a method of indirect referencing, ${!name}, for when you want to use the contents of a variable as a variable name. You could use this as follows:
tmux_var = "TMUX_$(echo 1)"
echo ${!tmux_var}
I'm not sure if there's a way to do it in one statement, though, since you have to have a named variable for this to work.
P.S. I'm assuming that echo 1 is just a stand-in for some more complicated command ;-)
Are you looking for arrays? Bash has them. There are a number of ways to create and use arrays in bash, the section of the bash manpage on arrays is highly recommended. Here is a sample of code:
TMUX=( "zero", "one", "two" )
echo ${TMUX[2]}
The result in this case is, of course, two.
Here are a few short lines from the bash manpage:
Bash provides one-dimensional indexed and associative array variables. Any variable may be
used as an indexed array; the declare builtin will explicitly declare an array. There is
no maximum limit on the size of an array, nor any requirement that members be indexed or
assigned contiguously. Indexed arrays are referenced using integers (including arithmetic
expressions) and are zero-based; associative arrays are referenced using arbitrary
strings.
An indexed array is created automatically if any variable is assigned to using the syntax
name[subscript]=value. The subscript is treated as an arithmetic expression that must
evaluate to a number greater than or equal to zero. To explicitly declare an indexed
array, use declare -a name (see SHELL BUILTIN COMMANDS below). declare -a name[subscript]
is also accepted; the subscript is ignored.
This works (tested):
eval echo \$TMUX_`echo 1`
Probably not very clear though. Pretty sure any solutions will require backticks around the echo to get that to work.

Resources