Can the '-' character, as is, be an alias in bash? - bash

I am trying to make bash alias the '-' character, but this does not work, e.g.,
% alias "-"=date
bash: alias: -=: invalid option
can this be done? and if we are at it, how about alias '='=date ?
`

The behavior you want depends on shell-specific extensions; even when the POSIX standard does specify alias behavior (which is only the case for shells implementing both XSI and user portability extensions), the set of allowed names is not required to include either - or =:
3.10 Alias Name
In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set and any of the following characters: '!', '%', ',', '#'.
Implementations may allow other characters within alias names as an extension.
That said, when defining an alias in bash, -- can be used to cause subsequent arguments not to be parsed as options (per POSIX syntax guidelines entry #10):
alias -- -=date
Another option available in practice with bash (tested on both 3.2.57(1) and 4.3.46(1), but not required by the POSIX standard to be supported with these names) is to define functions:
$ =() { date "$#"; }
$ -() { date "$#"; }
$ =
Sat Aug 13 18:12:37 CDT 2016
$ -
Sat Aug 13 18:12:08 CDT 2016
Again, this goes beyond the set of names required by POSIX:
2.9.5 Function Definition Command
The format of a function definition command is as follows:
fname() compound-command[io-redirect ...]
The function is named fname; the application shall ensure that it is a name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name). An implementation may allow other characters in a function name as an extension. The implementation shall maintain separate name spaces for functions and variables.
3.230 Name
In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.
...and, thus, being defined neither by POSIX nor by bash's own documentation, may be subject to change in future releases.

You can have an alias with the name - like this:
alias -- -=date
I'm not aware of any way of defining an alias named =.

Related

Should shell aliases be substituted when parsed or when executed?

Consider the script:
alias al='echo A'
foo(){ echo $(al);}
alias al='echo B'
foo
bash, ksh, and zsh prints B, while dash and yash prints A.
Which is correct? Or both?
According to the POSIX standard (Vol. Shell & Utilities) 2.3.1 Alias Substitution,
After a token has been delimited, but before applying the grammatical rules in Shell Grammar, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
It seems aliases should be resolved when the function id defined, as you don't know this is in a function before applying the grammatical rules. However, this doesn't make much sense to me, as you can't identify a word to be the command name word ebfore applying the grammatical rules, either.

Why can't environment variables with dashes be accessed in bash 4.1.2?

On a CentOS 5 host (with bash 3.2.32), we use Ruby (1.8.7) to
ENV['AWS_foo-bar_ACCESS_KEY'] = xxxxx
Then, using bash, we run a shell script that does:
BUCKET_NAME=$1
AWS_ACCESS_KEY_ID_VAR="AWS_${BUCKET_NAME}_ACCESS_KEY_ID"
AWS_ACCESS_KEY_ID="${!AWS_ACCESS_KEY_ID_VAR}"
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
This works fine on CentOS 5.
However, on CentOS 6 (with bash 4.1.2), we get the error
-bash: export: `AWS_foo-bar_ACCESS_KEY_ID=xxxxx': not a valid identifier
It is our understanding that this fails because - is not allowed in the variable name. But why does this work on bash 3.2 and not bash 4.1?
The "why" is almost irrelevant: The POSIX standard makes it very clear that export is only required to support arguments which are valid names, and anything with a dash is not a valid name. Thus, no POSIX shell is required to support exporting or expanding variable names with dashes, via indirect expansion or otherwise.
It's worth noting that ShellShock -- a major security bug caused by sloppy handling of environment contents -- is fixed in the bash 4.1 present in the current CentOS 6 updates repo; increased rigor in an area which spawned security bugs should be no surprise.
The remainder of this answer will focus on demonstrating that the new behavior of bash 4.1 is explicitly allowed, or even required, by POSIX -- and thus that the prior behavior was an undefined implementation artifact.
To quote POSIX on environment variables:
These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to IEEE Std 1003.1-2001, the value shall be composed of characters from the portable character set (except NUL and as indicated below). There is no meaning associated with the order of strings in the environment. If more than one string in a process' environment has the same name, the consequences are undefined.
Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities.
Note: Other applications may have difficulty dealing with environment variable names that start with a digit. For this reason, use of such names is not recommended anywhere.
Thus:
Tools (including the shell) are required to fully support environment variable names with uppercase and lowercase letters, digits (except in the first position), and the underscore.
Tools (including the shell) may modify their behavior based on environment variables with names that comply with the above and additionally do not contain lowercase letters.
Tools (including the shell) should tolerate other names -- meaning they shouldn't crash or misbehave in their presence -- but are not required to support them.
Finally, shells are explicitly allowed to discard environment variable names which are not also shell variable names. From the relevant standard:
It is unspecified whether environment variables that were passed to the shell when it was invoked, but were not used to initialize shell variables (see Shell Variables) because they had invalid names, are included in the environment passed to execl() and (if execl() fails as described above) to the new shell.
Moreover, what defines a valid shell name is well-defined:
Name - In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.
Notably, only underscores (not dashes) are considered part of a valid name in a POSIX-compliant shell.
...and the POSIX specification for export explicitly uses the word "name" (which it defined in the text quoted above), and describes it as applying to "variables" (shell variables, the restrictions on names for which are also subject to restrictions quoted elsewhere in this document):
The shell shall give the export attribute to the variables corresponding to the specified names, which shall cause them to be in the environment of subsequently executed commands. If the name of a variable is followed by = word, then the value of that variable shall be set to word.
All the above being said -- if your operating system provides a /proc/self/environ which represents the state of your enviroment variables at process startup (before a shell has, as it's allowed to do, potentially discarded any variables which don't have valid names in shell), you can extract content with invalid names like so:
# using a lower-case name where possible is in line with POSIX guidelines, see above
aws_access_key_id_var="AWS_${BUCKET_NAME}_ACCESS_KEY_ID"
while IFS= read -r -d '' var; do
[[ $var = "$aws_access_key_id_var"=* ]] || continue
val=${var#"${aws_access_key_id_var}="}
break
done </proc/self/environ
echo "Extracted value: $val"

How to create a variable with (-) hyphen as one of the character in it in shell scripting

I have a requirement of using a variable as abc-def which i am passing as a parameter and want to use in the shell script.
ex:
#!/bin/bash
abc-def="xyz"
echo "$abc-def"
there is an hyphen in the variable, i will have to use abc-def as a parameter and script needs to understand it wherever i will use.
You don't.
Variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of upper and lowercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names.
With a suitable recent bash, you can create an associative array, and use the "variable" name as an array key:
#!/bin/bash
declare -A vars
name="abc-def"
value=xyz
vars["$name"]=$value
echo "${vars["$name"]}"

How do leading key=value pairs on shell commands work? Why are they not an error?

(my_virtualenv)my_pc:~/path$ ASDF='asdf' python
...
>>> import os
>>> os.environ['ASDF']
'asdf'
So how does this work? Why doesn't the interpreter look for the ASDF command, and report an error if it doesn't find it?
Because leading var=value pairs are recognized as command-specific environment variable names as part of the standard for shells compatible with POSIX sh.
From 2.10.2 ("Shell Grammar Rules") of the Shell Command Language specification:
7. [Assignment preceding command name]
7.a. [When the first word]
If the TOKEN does not contain the character '=', rule 1 is applied. Otherwise, 7b shall be applied.
7.b. [Not the first word]
If the TOKEN contains the equal sign character:
If it begins with '=', the token WORD shall be returned.
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Otherwise, it is unspecified whether it is ASSIGNMENT_WORD or WORD that is returned.
Assignment to the NAME shall occur as specified in Simple Commands.
From 2.9.1 ("Simple Commands") of the Shell Command Language specification:
If no command name results, variable assignments shall affect the current execution environment. Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment (except for special built-ins). [...]
Emphasis added.

Why would I not leave extglob enabled in bash?

I just found out about the bash extglob shell option here:-
How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?
All the answers that used shopt -s extglob also mentioned shopt -u extglob to turn it off.
Why would I want to turn something so useful off? Indeed why isn't it on by default?
Presumably it has the potential for giving some nasty surprises.
What are they?
No nasty surprises -- default-off behavior is only there for compatibility with traditional, standards-compliant pattern syntax.
Which is to say: It's possible (albeit unlikely) that someone writing fo+(o).* actually intended the + and the parenthesis to be treated as literal parts of the pattern matched by their code. For bash to interpret this expression in a different manner than what the POSIX sh specification calls for would be to break compatibility, which is right now done by default in very few cases (echo -e with xpg_echo unset being the only one that comes immediately to mind).
This is different from the usual case where bash extensions are extending behavior undefined by the POSIX standard -- cases where a baseline POSIX shell would typically throw an error, but bash instead offers some new and different explicitly documented behavior -- because the need to treat these characters as matching themselves is defined by POSIX.
To quote the relevant part of the specification, with emphasis added:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.
When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns:
? - A question-mark is a pattern that shall match any character.
* - An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters.
[ - The open bracket shall introduce a pattern bracket expression.
Thus, the standard explicitly requires any non-NUL character other than ?, * or [ or those listed elsewhere as requiring quoting to match themselves. Bash's behavior of having extglob off by default allows it to conform with this standard in its default configuration.
However, for your own scripts and your own interactive shell, unless you're making a habit of running code written for POSIX sh with unusual patterns included, enabling extglob is typically worth doing.
Being a Kornshell person, I have extglob on in my .bashrc by default because that's the way it is in Kornshell, and I use it a lot.
For example:
$ find !(target) -name "*.xml"
In Kornshell, this is no problem. In BASH, I need to set extglob. I also set lithist and set -o vi. This allows me to use VI commands in using my shell history, and when I hit v, it shows my code as a bunch of lines.
Without lithist set:
for i in *;do;echo "I see $i";done
With listhist set:
for i in *
do
echo "I see $i"
done
Now, only if BASH had the print statement, I'd be all set.

Resources