Why does this Bash pathname expansion not take place? - bash

I'm struggling with Bash variable expansion. Please see the following code:
~/tmp 689$ a=~/Library/Application\ *; echo $a
/Users/foo/Library/Application *
~/tmp 690$ echo ~/Library/Application\ *
/Users/foo/Library/Application Scripts /Users/foo/Library/Application Support
As the order of expansion is brace->tilde->parameter->....->pathname,
why is pathname expansion not applied to $a in the same way that it is in the 2nd command?
[added]
Does whitespace escaping have hidden behaviour regarding the following output?
~/tmp 705$ a=~/Library/Application*; echo $a
/Users/foo/Library/Application Scripts /Users/foo/Library/Application Support

To do what you meant to do, you'd have to use the following:
a=(~/Library/Application\ *) # use an *array* to capture the pathname-expanded results
echo "${a[#]}" # output all array elements (without further expansion)
As for why your code didn't work:
In the context of variable assignment involving only literals or string interpolation (references to other variables), NO pathname expansion takes place, even with unquoted strings (e.g., a=*, a="*", and a='*' all assign literal *)[1].
(By contrast, pathname expansion is applied to unquoted strings inside an array definition (e.g., a=(*), or inside a command substitution (e..g, a=$(echo *)).)
Thus, the literal content of $a is /Users/foo/Library/Application *
Executing echo $a - i.e., NOT double-quoting the variable reference $a - then applies word splitting and does the following:
it prints literal '/Users/foo/Library/Application' (the 1st word - no expansion applied, due to its contents)
it prints the pathname expansion applied to * (the 2nd word - i.e., it expands to matching filenames in the current dir.)
The fact that the latter results in * in your case implies that you happen to be running the echo command from an empty directory (save for hidden files, assuming the default configuration).
[1]Whether the string is unquoted or not does, however, matter with respect to tilde expansion; e.g., a=~ expands ~ to the user's home directory, whereas a='~' or a="~" assign literal ~.

Related

How can I confirm whether whitespace or special characters are escaped in a wildcard pattern?

I know that when you use a for loop in Bash, the items that you loop through are separated using the $IFS variable.
However, if I run the following commands, I correctly show the two files I have created - even though they have spaces:
touch file\ {1..2}.txt
for file in *.txt; do
echo "Found: ${file}"
done
The output is:
Found: file 1.txt
Found: file 2.txt
I am assuming that this is because when the shell sees the wildcard pattern, it expands it and escapes any special characters or whitespace. This is in contrast to if I run:
touch file\ {1..2}.txt
files=$(ls *.txt)
for file in $files; do
echo "Found: ${file}"
done
This results in:
Found: file
Found: 1.txt
Found: file
Found: 2.txt
Which makes sense - by default $IFS contains whitespace, so the file names are split.
What I want to understand is:
Am I correct that wildcard expansion results in a set of strings that contain escaped special characters
Where is it documented that this is the case, if I am correct?
Is there any way to show that this is happening?
I was hoping I could use something like set -x to show what the wildcard expands to and actually see the escaped characters, because I really want to be able to understand what is going on here.
I am writing a long series of articles on effective shell usage (effective-shell.com) and I'm struggling to find a way to explain the differences of behaviour here, I'm assuming that the shell is escaping characters but I'd like to know if this is the case and how to see it if possible!
Thanks in advance.
done
Am I correct that wildcard expansion results in a set of strings that contain escaped special characters
No. There is no need for the shell to escape special characters at that point, because filename expansion is the last word expansion to be performed; strings resulting from it are not subjected to word splitting or any other expansion; they stay as-is. This is documented in the manual as follows:
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion.

Bash parameter expansion with function/alias output

So I got the following alias:
alias current_dir="pwd | sed -e 's/ /\\ /'"
In another place, I first safe my returned string in order to use parameter expansion to lowercase the string, like so:
CURRENT_DIR=$(current_dir)
echo "${CURRENT_DIR,,}"
But I wonder if it is possible to directly use parameter expansion on a alias/function call? I tried the following possibilities but they all didn't work for me:
echo "${current_dir,,}" # just an empty echo
echo "${$(current_dir),,}" # bad substitution
echo "${"$(current_dir)",,}" # bad substitution
No, it's not possible. You have to save the output in an intermediate variable. It's unavoidable.
You could use
declare -l CURRENT_DIR=$(current_dir)
Although Shellcheck has some sage words about declare and command substitution on the same line
However, to get a properly shell-quoted/escaped version of the string, use one of
$ mkdir '/tmp/a dir "with quotes and spaces"'
$ cd !$
$ printf -v CURRENT_DIR "%q" "$PWD"
$ echo "$CURRENT_DIR"
/tmp/a\ dir\ \"with\ quotes\ and\ spaces\"
$ CURRENT_DIR=${PWD#Q}
$ echo "$CURRENT_DIR"
'/tmp/a dir "with quotes and spaces"'
Get out of the habit of using ALLCAPS variable names, leave those as
reserved by the shell. One day you'll write PATH=something and then
wonder why
your script is broken.
${var#operator} showed up in bash 4.4:
${parameter#operator}
Parameter transformation. The expansion is either a transformation of the
value of parameter or information about parameter itself, depending on the
value of operator. Each operator is a single letter:
Q The expansion is a string that is the value of parameter quoted in
a format that can be reused as input.
E The expansion is a string that is the value of parameter with back-
slash escape sequences expanded as with the $'...' quoting mechan-
sim.
P The expansion is a string that is the result of expanding the value
of parameter as if it were a prompt string (see PROMPTING below).
A The expansion is a string in the form of an assignment statement or
declare command that, if evaluated, will recreate parameter with
its attributes and value.
a The expansion is a string consisting of flag values representing
parameter's attributes.
If parameter is # or *, the operation is applied to each positional param-
eter in turn, and the expansion is the resultant list. If parameter is an
array variable subscripted with # or *, the case modification operation is
applied to each member of the array in turn, and the expansion is the
resultant list.

How can I create a bash environment variable that prefixes an environment variable before a command?

I seem to be able to create environment variables that execute commands; like this:
$ cat ./src
FOO="echo"
$ . ./src
$ echo $FOO
echo
$ $FOO hello
hello
$
Is there a way I can modify that environment variable so that it prefixes the setting of another environment variable before the command? I.e. is there a way to work around the following problem?
$ cat ./src
FOO="MY_DIR=/tmp echo"
$ . ./src
$ echo $FOO
MY_DIR=/tmp echo
$ $FOO hello
-bash: MY_DIR=/tmp: No such file or directory
$
I.e. what I'd like to happen is to have an environment variable that does the equivalent of the following manually typed in the shell:
$ MY_DIR=/tmp echo hello
hello
$
...similar to how sans envvar-prefix, $FOO effectively had the same effect as typing echo at the shell.
/tmp/ exists of course, btw:
$ ls -ld /tmp/
drwxrwxrwt. 25 root root 500 May 19 11:35 /tmp/
$
Update:
I have a constraint that "FOO" must be invoked like $FOO hello and not FOO hello. So unfortunately a function like in #John Kugelman's (current) answer can't be a solution, even if it's more proper.
It's best to put data into variables, code into functions. Functions are more natural, expressive, and flexible than variables holding code. They look just like any other command but can take arbitrary actions, including but not limited to prepending commands and variable assignments.
foo() {
MY_DIR=/tmp echo "$#"
}
foo hello
Here "$#" is a placeholder for the arguments passed to foo().
I have a constraint that "FOO" must be invoked like $FOO hello and not FOO hello.
That constraint is impossible, I'm afraid.
I am curious about the mechanics of what's going on here: i.e. why can you make an environment variable that's sort of "aliased" to a command (I know true aliasing is something else), but that mechanism doesn't accommodate the seemingly small change to prefix "stuff" to the command?
Bash expands commands in several passes in a fixed, prescribed order. Very early on it splits the command into words and then marks the variable assignments with invisible flags. It expands $variable references in a later pass. It doesn't look at the results to see if they look like additional variable expansions. The equal signs are effectively ignored.
If you want to know the nitty gritty details, open up the Bash man page. It's incredibly long and the details are scattered throughout. Let me pull out the key sections and some choice quotes to help you digest it:
Shell Grammar, Simple Commands
A simple command is a sequence of optional variable assignments followed by blank-separated words and redirections, and terminated by a control operator.
Simple Command Expansion
When a simple command is executed, the shell performs the following expansions, assignments, and redirections, from left to right.
The words that the parser has marked as variable assignments (those preceding the command name) and redirections are saved for later processing.
The words that are not variable assignments or redirections are expanded. If any words remain after expansion, the first word is taken to be the name of the command and the remaining words are the arguments.
...
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are added to the environment of the executed command and do not affect the current shell environment.
Expansion
Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.
The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.
Expansion, Parameter Expansion
The $ character introduces parameter expansion, command substitution, or arithmetic expansion.
Assignments are marked in step 1 and variables (AKA parameters) are expanded in step 4.
The only things that happen after variable expansion are:
Word splitting. A variable can expand to multiple words if it contains whitespace. (Or to be more precise, if it contains any of the characters in the inter-field separator variable $IFS.)
Pathname expansion. Also known as globbing, or wildcards. If a variable contains *, ?, or [ they'll be expanded to the names of matching files, if there are any.
Quote removal. This pass happens after variable expansion, but it specifically does not apply to the results of any previous expansion step. So quotes the user typed are removed, but quotes that were the results of a substitution are retained.
Neither word splitting nor pathname expansion are what you need, so that's why it's not possible to store an assignment in a variable.

Bash: How do I properly quote the results of a command

My problem boils down to this:
echo $(echo '*')
That outputs the names of all the files in the current directory.
I do not want that. I want a literal asterisk (*).
How do I do this in a generic way?
My above example is simplified. The asterisk is not literally written in my bash script - it comes from the result of another command.
So this is perhaps closer to my real situation:
echo $(my-special-command)
I just want to get the literal output of my-special-command; I do not want any embedded asterisks (or other special characters) to be expanded.
How do I do this in a general-purpose way?
I suppose I could do set -f before running the command, but how can I be sure that covers everything? That turns off pathname expansion, but what about other kinds? I have zero control over what output might be produced by my-special-command, so must be able to handle everything properly.
Just enclose the Command substitution with double quotes:
echo "$(my-special-command)"
Its called globbing, you have multiply ways to prevent it:
echo * # will expand to files / directories
echo "*" # Will print *
echo '*' # Will also print *
In your example you can simple write:
echo "$(echo '*')"
You can also turn off globbing in your script by calling it with bash -f script.sh or inside your code:
#!/usr/bin/env bash
set -f
echo *
From the "Command Substitution" section of the man page:
If the [command] substitution appears within double quotes, word splitting and
pathname expansion are not performed on the results.
By quoting the command expansion, you prevent its result, *, from undergoing pathname expansion.
$ echo "$(echo "*")"

Quoting vs not quoting the variable on the RHS of a variable assignment

In shell scripting, what is the difference between these two when assigning one variable to another:
a=$b
and
a="$b"
and when should I use one over the other?
I think there is no big difference here. Yes, it is advisable to enclose a variable in double quotes when that variable is being referenced. However, $x does not seem to be referenced here in your question.
y=$x does not by itself affect how whitespaces will be handled. It is only when $y is actually used that quoting matters. For example:
$ x=" a b "
$ y=$x
$ echo $y
a b
$ echo "$y"
a b
From section 2.9.1 of the POSIX shell syntax specification:
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
String-splitting and globbing (the steps which double quotes suppress) are not in this list.
Thus, the quotes are superfluous in all simple assignments (not speaking here to those implemented with arguments to declare, export or similar commands) except those where (1) the behavior of single-quoted, not double-quoted, strings are desired; or (2) whitespace or other content in the value would be otherwise parsed as syntactic rather than literal.
(Note that the decision on how to parse a command -- thus, whether it is an assignment, a simple command, a compound command, or something else -- takes place before parameter expansions; thus, var=$1 is determined to be an assignment before the value of $1 is ever considered! Were this untrue, such that data could silently become syntax, it would be far more difficult -- if not impossible -- to write secure code handling untrusted data in bash).
There are no (good) reasons to double quote the RHS of a variable assignment when used as a statement on its own.
The RHS of an assignment statement is not subject to word splitting (or brace expansion), etc. so cannot need quotes to assign correctly. All other expansions (as far as I'm aware) do occur on the RHS but also occur in double quotes so the quoting serves no purpose.
That being said there are reasons not to quote the RHS. Namely how to address error "bash: !d': event not found" in Bash command substitution (specifically see my answer and rici's answer).
Here are some other examples: ( having two files in the current directory t.sh and file)
a='$(ls)' # no command substitution
b="$(ls)" # command substitution, no word splitting
c='*' # no filename expansion
d="*" # no filename expansion
e=* # no filename expansion
f=$a # no expansions or splittings
g="$a" # no expansions or splittings
h=$d # no expansions or splittings
echo ---'$a'---
echo $a # no command substitution
echo ---'$b'---
echo $b # word splitting
echo ---'"$b"'---
echo "$b" # no word splitting
echo ---'$c'---
echo $c # filename expansion, word splitting
echo ---'"$c"'---
echo "$c" # no filename expansion, no word splitting
echo ---'$d'---
echo $d # filename expansion, word splitting
echo ---'"$d"'---
echo "$d" # no filename expansion, no word splitting
echo ---'"$e"'---
echo "$e" # no filename expansion, no word splitting
echo ---'$e'---
echo $e # filename expansion, word splitting
echo ---'"$f"'---
echo "$f" # no filename expansion, no word splitting
echo ---'"$g"'---
echo "$g" # no filename expansion, no word splitting
echo ---'$h'---
echo $h # filename expansion, word splitting
echo ---'"$h"'---
echo "$h" # no filename expansion, no word splitting
Output:
---$a---
$(ls)
---$b---
file t.sh
---"$b"---
file
t.sh
---$c---
file t.sh
---"$c"---
*
---$d---
file t.sh
---"$d"---
*
---"$e"---
*
---$e---
file t.sh
---"$f"---
$(ls)
---"$g"---
$(ls)
---$h---
file t.sh
---"$h"---
*
One interesting thing to notice is that command substitution occurs in variable assignments if they are in double quotes, and if the RHS is given explicitly as "$(ls)" and not implicitly as "$a"..
Advanced Bash-Scripting Guide: Chapter 5: Quoting
When referencing a variable, it is
generally advisable to enclose its
name in double quotes. This prevents
reinterpretation of all special
characters within the quoted string.
Use double quotes to prevent word
splitting. An argument enclosed in
double quotes presents itself as a
single word, even if it contains
whitespace separators.

Resources