Should awk expand escape sequences in command-line assigned variables? - bash

I've recently discovered that Awk's -v VAR=VAL syntax for initializing variables on the command line expands escape sequences in VAL. I previously thought that it was a good way to pass strings into Awk without needing to run an escaping function over them first.
For example, the following script:
awk -v VAR='x\tx' 'BEGIN{printf("%s\n", VAR);}'
I would expect to print
x\tx
but actually prints:
x x
An aside: environment variables to pass strings in unmodified instead, this question isn't asking how to get the behaviour I previously expected.
Here's what the man page has to say on the matter:
-v var=val, --assign var=val Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the
BEGIN block of an AWK program.
And further down:
String Constants
String constants in AWK are sequences of characters enclosed between double quotes (like "value"). Within strings, certain escape
sequences are recognized, as in C. These are:
... list of escape seqeuences ...
The escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).
In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally when used in
regular expression constants. Thus, /a\52b/ is equivalent to /a*b/.
The way I read this, val in -v var=val is not a string constant, and there is no text to indicate that the string constant escaping rules apply.
My questions:
Is there a more authoritative source for the awk language than the man page, and if so what does it specify?
What does POSIX have to say about this, if anything?
Do all versions of Awk behave this way, i.e. can I rely on the expansion being done if I actually want it?

The assignment is a string constant.
The relevant sections from the standard are:
-v assignment
The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.
and
An operand that begins with an underscore or alphabetic character from the portable character set (see the table in XBD Portable Character Set ), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the '=' character, shall specify a variable assignment rather than a pathname. The characters before the '=' represent the name of an awk variable; if that name is an awk reserved word (see Grammar ) the behavior is undefined. The characters following the <equals-sign> shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote ( ' )' character, as a STRING token (see Grammar ), except that if the last character is an unescaped , it shall be interpreted as a literal rather than as the first character of the sequence "\""

Related

Bash - Why does $VAR1=FOO or 'VAR=FOO' (with quotes) return command not found?

For each of two examples below I'll try to explain what result I expected and what I got instead. I'm hoping for you to help me understand why I was wrong.
1)
VAR1=VAR2
$VAR1=FOO
result: -bash: VAR2=FOO: command not found
In the second line, $VAR1 gets expanded to VAR2, but why does Bash interpret the resulting VAR2=FOO as a command name rather than a variable assignment?
2)
'VAR=FOO'
result: -bash: VAR=FOO: command not found
Why do the quotes make Bash treat the variable assignment as a command name?
Could you please describe, step by step, how Bash processes my two examples?
How best to indirectly assign variables is adequately answered in other Q&A entries in this knowledgebase. Among those:
Indirect variable assignment in bash
Saving function output into a variable named in an argument
If that's what you actually intend to ask, then this question should be closed as a duplicate. I'm going to make a contrary assumption and focus on the literal question -- why your other approaches failed -- below.
What does the POSIX sh language specify as a valid assignment? Why does $var1=foo or 'var=foo' fail?
Background: On the POSIX sh specification
The POSIX shell command language specification is very specific about what constitutes an assignment, as quoted below:
4.21 Variable Assignment
In the shell command language, a word consisting of the following parts:
varname=value
When used in a context where assignment is defined to occur and at no other time, the value (representing a word or field) shall be assigned as the value of the variable denoted by varname.
The varname and value parts shall meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign, in addition to other delimiters.
Also, from section 2.9.1, on Simple Commands, with emphasis added:
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
Also, from the grammar:
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Note from this:
The command must be recognized as an assignment at the very beginning of the parsing sequence, before any expansions (or quote removal!) have taken place.
The name must be a valid name. Literal quotes are not part of a valid variable name.
The equals sign must be unquoted. In your second example, the entire string was quoted.
Assignments are recognized before tilde expansion, parameter expansion, command substitution, etc.
Why $var1=foo fails to act as an assignment
As given in the grammar, all characters before the = in an assignment must be valid characters within a variable name for an assignment to be recognized. $ is not a valid character in a name. Because assignments are recognized in step 1 of simple command processing, before expansion takes place, the literal text $var1, not the value of that variable, is used for this matching.
Why 'var=foo' fails to act as an assignment
First, all characters before the = must be valid in variable names, and ' is not valid in a variable name.
Second, an assignment is only recognized if the = is not quoted.
1)
VAR1=VAR2
$VAR1=FOO
You want to use a variable name contained in a variable for the assignment. Bash syntax does not allow this. However, there is an easy workaround :
VAR1=VAR2
declare "$VAR1"=FOO
It works with local and export too.
2)
By using single quotes (double quotes would yield the same result), you are telling Bash that what is inside is a string and to treat it as a single entity. Since it is the first item on the line, Bash tries to find an alias, or shell builtin, or an executable file in its PATH, that would be named VAR=FOO. Not finding it, it tells you there is no such command.
An assignment is not a normal command. To perform an assignment contained in a quote, you would need to use eval, like so :
eval "$VAR1=FOO" # But please don't do that in real life
Most experienced bash programmers would probably tell you to avoid eval, as it has serious drawbacks, and I am giving it as an example just to recommend against its use : while in the example above it would not involve any security risk or error potential because the value of VAR1 is known and safe, there are many cases where an arbitrary (i.e. user-supplied) value could cause a crash or unexpected behavior. Quoting inside an eval statement is also more difficult and reduces readability.
You declare VAR2 earlier in the program, right?
If you are trying to assign the value of VAR2 to VAR1, then you need to make sure and use $ in front of VAR2, like so:
VAR1=$VAR2
That will set the value of VAR2 equal to VAR1, because when you utilize the $, you are saying that value that is stored in the variable. Otherwise it doesn't recognize it as a variable.
Basically, a variable that doesn't have a $ in front of it will be interpreted as a command. Any word will. That's why we have the $ to clarify "hey this is a variable".

Bash - How to discriminate control operator from metacharacter?

According to bash manual:
control operator
A token that performs a control function. It is a newline or one of the following: ‘||’, ‘&&’, ‘&’, ‘;’, ‘;;’, ‘|’, ‘|&’, ‘(’, or ‘)’.
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.
Many characters are both control operator and metacharacter.
So how could I konw the syntax category of e.g. a ;?
Take if COND ; then CMD ; fi as an example.
; seems like a control operator in the context, for it can be substituted by newline.
However removing pre and post spaces around ; still works ok.
Isn't it supposed to be separated by sapces if it's an operator?
According to the bash manual, an operator is:
A control operator or a redirection operator. See Redirections,
for a list of redirection operators. Operators contain at least one
unquoted metacharacter.
The metacharacter is basically any character that cannot be part of a word.
Definition of word:
A sequence of characters treated as a unit by the shell. Words may not include unquoted metacharacters.
There is no need for spaces around operators because they always contain metacharacters, which makes the parser know it is not part of the word.
An exception is redirection, where e.g.
ls 2>&1
requires a space prior to the redirection statement since the operator has a parameter 2, and requires the parameter to be next to the operator (otherwise it will be a parameter to ls).

Error in string Concatenation in Shell Scripting

I am beginner to Shell scripting.
I have used a variable to store value A="MyScript". I tried to concatenate the string in subsequent steps $A_new. To my surprise it didn't work and $A.new worked.
Could you please help me in understanding these details?
Thanks
Shell variable names are composed of alphabetic characters, numbers and underscores.
3.231 Name
In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.
So when you wrote $A_new the shell interpreted the underscore (and new) as part of the variable name and expanded the variable A_new.
A period is not valid in a variable name so when the shell parsed $A.new for a variable to expand it stopped at the period and expanded the A variable.
The ${A} syntax is designed to allow this to work as intended here.
You can use any of the following to have this work correctly (in rough order of preferability):
echo "${A}_new"
echo "$A"_new
echo $A\_new
The last is least desirable because you can't quote the whole string (or the \ doesn't get removed. So since you should basically always quote your variable expansions you would end up probably doing echo "$A"\_new but that's no different then point 2 ultimately so why bother.
This happens because the underscore is the valid character in variable names.
Try this way:
${A}_new or "$A"_new
The name of a variable can contain letters ( a to z or A to Z), numbers ( 0 to 9) or the underscore character ( _).
Shell does not require any variable declaration as in programming languages as C , C++ or java. So when you write $A_new shell consider A_new as a variable, which you have not assigned any value therefore it comes to be null.
To achieve what you mentioned use as :
${A}_new
Its always a good practice to enclose variable names in braces after $ sign to avoid such situation.

Is there a method in shell to ignore escape characters?

In C#, there is a verbatim string so that,
string c = "hello \t world"; // hello world
string d = #"hello \t world"; // hello \t world
I am new to shell script, is there a similar method in shell?
Because I have many folders with the name like "Apparel & Accessories > Clothing > Activewear", I want to know if there is a easy way to process the escape characters without write so many .
test.sh
director="Apparel & Accessories > Clothing > Activewear"
# any action to escape spaces, &, > ???
hadoop fs -ls $director
For definining the specific string in your example, Apparel & Accessories > Clothing > Activewear, either double quotes or single quotes will work; referring to it later is a different story, however:
In the shell (any POSIX-compatible shell), how you refer to a variable is just as important as how you define it.
To safely refer to a previously defined variable without side-effects, enclose it in double quotes, e.g., "$directory".
To define [a variable as] a literal (verbatim) string:
(By contrast, to define a variable with embedded variable references or embedded command substitutions or embedded arithmetic expressions, use double quotes (").)
If your string contains NO single quotes:
Use a single-quoted string, e.g.:
directory='Apparel & Accessories > Clothing > Activewear'
A single-quoted string is not subject to any interpretation by the shell, so it's generally the safest option for defining a literal. Note that the string may span multiple lines; e.g.:
multiline='line 1
line 2'
If your string DOES contain single quotes (e.g., I'm here.) and you want a solution that works in all POSIX-compatible shells:
Break the string into multiple (single-quoted) parts and splice in single-quote characters:
Note: Sadly, single-quoted strings cannot contain single quotes, not even with escaping.
directory='I'\''m here.'
The string is broken into into single-quoted I, followed by literal ' (escaped as an unquoted string as \'), followed by single-quoted m here.. By virtue of having NO spaces between the parts, the result is a single string containing a literal single quote after I.
Alternative: if you don't mind using a multiline statement, you can use a quoted here document, as described at the bottom.
If your string DOES contain single quotes (e.g., I'm here.) and you want a solution that works in bash, ksh, and zsh:
Use ANSI-C quoting:
directory=$'I\'m here.'
Note: As you can see, ANSI-C quoting allows for escaping single quotes as \', but note the additional implications: other \<char> sequences are subject to interpretation, too; e.g., \n is interpreted as a newline character - see http://www.gnu.org/software/bash/manual/bash.html#ANSI_002dC-Quoting
Tip of the hat to #chepner, who points out that the POSIX-compatible way of directly including a single quote in a string to be used verbatim is to use read -r with a here document using a quoted opening delimiter (the -r option ensures that \ characters in the string are treated as literals).
# *Any* form of quoting, not just single quotes, on the opening EOF will work.
# Note that $HOME will by design NOT be expanded.
# (If you didn't quote the opening EOF, it would.)
read -r directory <<'EOF'
I'm here at $HOME
EOF
Note that here documents create stdin input (which read reads in this case). Therefore, you cannot use this technique to directly pass the resulting string as an argument.
use strong quotes i.e. 'string', allowing escape char or special char for string.
e.g. declare director='Apparel & Accessories > Clothing > Activewear'
also using declare is a good practice while declaring variable.

Sed using a variable for line number restriction

I want to do a search and replace on a line with specific line number. However, I want to be able to use a variable for the Line Number itself.
For instance, if I wanted to replace the number 4 with a number 5 on line 180. I would use the following code.
sed '180 s/4/5/' file
My Question is how do I use a variable for the line number?
sed '$variable s/4/5/' file
#gniourf_gniourf's comment contains the crucial pointer: use double quotes around your sed program in order to reference shell variables (the shell doesn't interpret (expand) single-quoted strings in any way).
Note that sed programs are their own world - they have NO concept of variables, so the only way to use variables is to use a double-quoted string evaluated by the shell containing references to shell variables.
As a result, you must \-escape characters that you want the shell to ignore and pass through to sed to see, notably $ as \$.
In your specific case, however, nothing needs escaping.
Thus, as #gniourf_gniourf states in his comment, use:
sed "$variable s/4/5/" file
Afterthought:
Alternatively, the core of your sed program can remain single-quoted, with only the shell-variable references spliced in as double-quoted strings; note that no spaces are allowed between the string components, as the entire expression must evaluate to a single string:
sed "$variable"' s/4/5/' file
While in this specific case you could get away without the double quotes around the variable reference, it's generally safer to use them, so as to avoid unwanted shell expansions (such as word splitting) that could alter or even break the command.
You could just leave the variable outside of the quotes
sed $variable's/4/5/' file
Note that there cannot be a space between the variable and beginning quote though
You can do it with awk
awk 'NR==l {sub(/4/,"5")}1' l="$variable" file

Resources