Interpolation rules when defining a function - bash

At a prompt, I can type:
$ e() { echo $1; }
and get a function which echoes its first argument. I do not understand why this works. Since $1 is undefined in the current environment, it seems that the above should be equivalent to:
$ e() { echo ; }
What is the relevant quoting/interpolation rule that allows this behavior? Note that this has nothing to do with $1 being special: if you use $FOO, you get a function that echoes the value of $FOO at the time the function is called rather than the value of $FOO when the function is defined.

Not sure how I missed this, since it's pretty clear in section 2.9.5:
When the function is declared, none of the expansions in wordexp shall be performed on the text in compound-command or io-redirect; all expansions shall be performed as normal each time the function is called. Similarly, the optional io-redirect redirections and any variable assignments within compound-command shall be performed during the execution of the function itself, not the function definition. See Consequences of Shell Errors for the consequences of failures of these operations on interactive and non-interactive shells.

Variables like $1 are special variables representing parameters passed from the command line. See the "Positional Parameters" section here: http://tldp.org/LDP/abs/html/internalvariables.html

Related

Why does function call from PS1 require escaping?

I'm setting my prompt inside of .bash_profile like this
export PS1="\w\$(getBranchName)\n ---->"
My getBranchName function exists, and this works, fine.
My question is, why do I need to escape the call to getBranchName like this \$(getBranchName).
In other words, why doesn't this code work, instead?
export PS1="\w$(getBranchName)\n ---->"
If curious, this is what the getBranchName function looks like
esc="\033"
redf="${esc}[31m"
green="${esc}[32m"
purple="${esc}[35m"
cyanf="${esc}[36m"
reset="${esc}[0m"
getBranchName() {
if [[ "$(__git_ps1 '%s')" == "master" ]]
then
echo -e "${redf}$(__git_ps1)${reset}";
else
echo -e "${cyanf}$(__git_ps1)${reset}";
fi
}
export PS1="\w\$(getBranchName)\n ---->"
You need to escape the dollar because you want to store this exact text in your variable.
Try it by typing echo "$PS1". You should see the exact text : \w$(getBranchName)\n ---->
If you didn't escape it, the function would be evaluated only once, during the allocation.
The bottom line is that PS1 is a special variable : every time you display a new line in the console, the variable is evaluated to extract the display settings.
The PS1 variable is basically a template string (which might contain function calls) which is evaluated each time the prompt is shown.
If you want to evaluate a function each time, so that each prompt shows the result of this new execution, you need to escape the call.
If you would embed the function call directly in the string, the function would be called once immediately (i.e. likely during login) and your PS1 will contain the result of this single function call as evaluated during your initial login. Thus, the value won't be updated again since the function is not called anymore (since the PS1 doesn't contain the function call anymore but only the static result of one).
It's escaped because you want it to run when the shell evaluates $PS1 each time it's displayed, not just during the assignment.
The other expansions (which should be using tput unless you actually like random control codes all over your non-ANSI terminals) you want to be expanded just once, when you assign to PS1.

How does "FOO= myprogram" in bash make "if(getent("FOO"))" return true in C?

I recently ran into a C program that makes use of an environmental variable as a flag to change the behavior of a certain part of the program:
if (getenv("FOO")) do_this_if_foo();
You'd then request the program by prepending the environment variable, but without actually setting it to anything:
FOO= mycommand myargs
Note that the intention of this was to trigger the flag - if you didn't want the added operation, you just wouldn't include the FOO=. However, I've never seen an environment variable set like this before. Every example I can find of prepended variables sets a value, FOO=bar mycommand myargs, rather than leaving it empty like that.
What exactly is happening here, that allows this flag to work without being set? And are there potential issues with implementing environmental variables like this?
The bash manual says:
A variable may be assigned to by a statement of the form
name=[value]
If value is not given, the variable is assigned the null string.
Note that "null" (in the sense of e.g. JavaScript null) is not a thing in the shell. When the bash manual says "null string", it means an empty string (i.e. a string whose length is zero).
Also:
When a simple command is executed, the shell performs the following expansions, assignments, and redirections, from left to right.
[...]
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
So all FOO= mycommand does is set the environment variable FOO to the empty string while executing mycommand. This satisfies if (getenv("FOO")) because it only checks for the presence of the variable, not whether it has a (non-empty) value.
Of course, any other value would work as well: FOO=1 mycommand, FOO=asdf mycommand, etc.
FOO= is just setting the variable to null (to be precise it's setting the variable to a zero-byte string, which thus returns a pointer to a NUL terminator - thanks #CharlesDuffy). Given the code you posted it could be FOO='bananas'and produce the same behavior. It's very odd to write code that way though. The common reason to set a variable on the command line is to pass a value for that variable into the script, e.g. to set debugging or logging level flags is extremely common, e.g. (pseudocode):
debug=1 logLevel=3 myscript
myscript() {
if (debug == 1) {
if (loglevel > 0) {
printf "Entering myscript()\n" >> log
if (logLevel > 1) {
printf "Arguments: %s\n" "$*" >> log
}
}
}
do_stuff()
}
Having just a "variable exists" test is a bit harder to work with because then you have to specifically unset the variable to clear the flag instead of just setting FOO=1 when you want to do something and otherwise your script doesn't care when FOO is null or 0 or unset or anything else.

Deferred evaluation of bash variables

I need to define a string (options) which contains a variable (group) that is going to be available later in the script.
This is what I came up with, using a literal string that gets evaluated later.
#!/bin/bash
options='--group="$group"' #$group is not available at this point
#
# Some code...
#
group='trekkie'
eval echo "$options" # the result is used elsewhere
It works, however it makes use of eval which I would like to avoid if not absolutely necessary (I don't want to risk potential problems because of unpredictable data).
I've asked for help in multiple places and I've got a couple of answers that were directing me to use indirect variables.
The problem is I simply fail to see how indirect variables might help me with my problem. As far as I understand they only offer a way of indirectly referencing other variables like this:
options="--group="$group""
a=options
group='trekkies'
echo "${!a}" # spits out --group=
I would also like to avoid using functions if possible because I don't want to make things more complicated than they need to be.
More Idiomatic: Using Parameter Expansion
Don't attempt to define the --group="$group" argument up-front when you don't yet know the group name; instead, set a flag that indicates whether the argument is needed, and honor that flag when forming your final argument list.
By going the below approach, you avoid any need for "deferred evaluation":
#!/bin/bash
# initialize your flag as unset
unset needs_group
# depending on your application logic, optionally set that flag
if [[ $application_logic_here ]]; then
needs_group=1
fi
# ...so, the actual group can be defined later, when it's known...
group=trekkies
# and then check the flag to determine whether to pass the argument:
yourcommand ${needs_group+--group="$group"}
If you don't need the flag to be separate from the group variable, this is even easier:
# pass --group="$group" only if "$group" is a defined shell variable
yourcommand ${group+--group="$group"}
The relevant syntax is a parameter expansion: ${var+value} expands to value only if var is defined; and unlike most parameter expansions, its value can parse to multiple words with quoting applied.
Alternately: One-Liner Function Shims
Here, you really are defining --group="$group" before the group is known:
#!/bin/bash
if [[ $application_logic_here ]]; then
with_optional_group() { "$#" --group="$group"; }
else
with_optional_group() { "$#"; }
fi
group=trekkies
with_optional_group yourcommand

Bash functions returning values meanwhile altering global variables

I'm just struggling with bash functions, and trying to return string values meanwhile some global variable is modified inside the function. An example:
MyGlobal="some value"
function ReturnAndAlter () {
MyGlobal="changed global value"
echo "Returned string"
}
str=$(ReturnAndAlter)
echo $str # prints out 'Returned value' as expected
echo $MyGlobal # prints out the initial value, not changed
This is because $(...) (and also `...` if used instead) cause the function to have its own environment, so the global variable is never affected.
I found a very dirty workaround by returning the value into another global variable and calling the function only using its name, but think that there should be a cleaner way to do it.
My dirty solution:
MyGlobal="some value"
ret_val=""
function ReturnAndAlter () {
ret_val="Returned string"
MyGlobal="changed value"
}
ReturnAndAlter # call the bare function
str=$ret_val # and assign using the auxiliary global ret_val
echo $str
echo $MyGlobal # Here both global variables are updated.
Any new ideas? Some way of calling functions that I'm missing?
Setting global variables is the only way a function has of communicating directly with the shell that calls it. The practice of "returning" a value by capturing the standard output is a bit of a hack necessitated by the shell's semantics, which are geared towards making it easy to call other programs, not making it easy to do things in the shell itself.
So, don't worry; no, you aren't missing any cool tricks. You're doing what the shell allows you to do.
The $(…) (command expansion) is run in a sub-shell.
All changes inside the sub-shell are lost when the sub-shell close.
It is usually a bad idea to use both printing a result and changing a variable inside a function. Either make all variables or just use one printed string.
There is no other solution.

Scope of variables in KSH

I have written a sample KornShell function to split a String, put it in an array and then print out the values.
The code is as below
#!/usr/bin/ksh
splitString() {
string="abc#hotmail.com;xyz#gmail.com;uvw#yahoo.com"
oIFS="$IFS";
IFS=';'
set -A str $string
IFS="$oIFS"
}
splitString
echo "strings count = ${#str[#]}"
echo "first : ${str[0]}";
echo "second: ${str[1]}";
echo "third : ${str[2]}";
Now the echo does not print out the values of the array, so I assume it has something to do with the scope of the array defined.
I am new to Shell scripting, can anybody help me out with understanding the scope of variables in the example above?
The default scope of a variable is the whole script.
However, when you declare a variable inside a function, the variable becomes local to the function that declares it. Ksh has dynamic scoping, so the variable is also accessible in functions that are invoked by the function that declares the variable. This is tersely documented in the section on functions in the manual. Note that in AT&T ksh (as opposed to pdksh and derivatives, and the similar features of bash and zsh), this only applies to functions defined with the function keyword, not to functions defined with the traditional f () { … } syntax. In AT&T ksh93, all variables declared in functions defined with the traditional syntax are global.
The main way of declaring a variable is with the typeset builtin. It always makes a variable local (in AT&T ksh, only in functions declared with function). If you assign to a variable without having declared it with typeset, it's global.
The ksh documentation does not specify whether set -A makes a variable local or global, and different versions make it either. Under ksh 93u, pdksh or mksh, the variable is global and your script does print out the value. You appear to have ksh88 or an older version of ksh where the scope is local. I think that initializing str outside the function would create a global variable, but I'm not sure.
Note that you should use a local variable to override the value of IFS: saving to another variable is not only clumsy, it's also brittle because it doesn't restore IFS properly if it was unset. Furthermore, you should turn off globbing, because otherwise if the string contains shell globbing characters ?*\[ and one of the words happens to match one or more file on your system it will be expanded, e.g. set -A $string where string is a;* will result in str containing the list of file names in the current directory.
set -A str
function splitString {
typeset IFS=';' globbing=1
case $- in *f*) globbing=;; esac
set -f
set -A str $string
if [ -n "$globbing" ]; then set +f; fi
}
splitString "$string"
Variables are normally global to the shell they're defined in from the time they're defined.
The typeset command can make them local to the function they're defined in, or alternatively to make them automatically exported (even when they're updated.)
Read up "typeset" and "integer" in the manpage, or Korn's book.

Resources