How can prevent bash tokenising string variable expansion? - bash

I don't quite know the term(s) for this part of the bash shell. In one specific, yet critical, case my script falls afoul of this effect:
Objective
Perform this command chain within a script where the awk expression (-e) comes form a variable. This example works when it is a script argument.
echo "test string" | awk -e { print $0; }
Problem example
On the command line I am seeking to produce output of: "test string", viz.:
$ optE="-e "
$ argE="{ print \$0; }"
$ set -x; echo "test string" | awk $optE$argE ; set +x
+ awk -e '{' print '$0;' '}'
+ echo 'test string'
awk: cmd. line:1: {
awk: cmd. line:1: ^ unexpected newline or end of string
+ set +x
In a way I can see what's happened. Is there a good/best way to not have the $argE variable tokenised after it is expanded?
Typing that same command on the command line works as you know:
$ echo "test string" | awk -e '{ print $0; }'
test string
Because the expression is enclosed in single quote. I haven't found a way to make that happen using a variable...
$ optE="-e "
$ argE="'{ print \$0; }'"
$ set -x; echo "test string" | awk $optE$argE ; set +x
+ echo 'test string'
+ awk -e ''\''{' print '$0;' '}'\'''
awk: cmd. line:1: '{
awk: cmd. line:1: ^ invalid char ''' in expression
+ set +x
Needless to say, I'm on stackoverflow because the things I've tried and read in ohter questions, etc. Don't give desirable result.
Solutions welcome

Word splitting expansion is applied after the unquoted variable expansion (${var} or $var) is replaced by its expansion. Word splitting splits the result on (white-)spaces, no matter the quotes. No matter what you put inside the string, if the variable expansion is unquoted, the result will be word split. To affect word splitting, you have to change the way you expand the variable, not it's content (i.e. change $var to "$var").
Is there a good/best way to not have the $argE variable tokenised after it is expanded?
Yes, quote the expansion. Rule of a thumb: never $var always "$var". Also, check your scripts with shellcheck.
In your case, it's simple, just assign the variable the content you want to execute, and quote the expansions:
optE="-e"
argE='{ print $0; }'
echo "test string" | awk "$optE" "$argE"
^^^^^^^ - variable expansion inside quotes
For more cases, use bash arrays arr=(-e '{ print $0; }'), and properly awk "${arr[#]}" expand them.
Research: https://www.gnu.org/software/bash/manual/html_node/Shell-Operation.html , bash manual shell expansions , https://mywiki.wooledge.org/BashFAQ/050 https://mywiki.wooledge.org/Quotes https://mywiki.wooledge.org/BashPitfalls , When should I wrap quotes around a shell variable? https://github.com/koalaman/shellcheck/wiki/Sc2086 .

Related

Bash, awk, two arguments for one column

Need 2 arguments in awk command for one column.
Script, name todo.
#!/bin/bash
folder_main="$( cd $( dirname "${BASH_SOURCE[0]}" ) >/dev/null 2>&1 && pwd )"
if [ $1 = 'e' ]; then
mcedit $folder_main/kb/todo.kb
else
awk -F ',' '$1=="'$1'"' $folder_main/kb/todo.kb
fi
Expectation is when I write todo i, it will grep me lines with i OR c by the first column divided by ,.
I tried this.
awk -F ',' '$1=="{c|'$1'}"' $folder_main/kb/todo.kb
But nothing.
Thanks.
You should pass your shell variable to awk using -v and fix your awk syntax:
awk -F, -v a="$1" '$1 == "c" || $1 == a' "$folder_main/kb/todo.kb"
This sets the awk variable a to the value of the shell positional parameter $1, and prints the line if the first column is either "c" or whatever you passed as the first argument to the script.
You could also shorten the line slightly by using a regular expression match instead of two ==:
awk -F, -v a="$1" '$1 ~ "^(c|"a")$"' "$folder_main/kb/todo.kb"
Although I think that the first option is easier to read, personally. It is also safer to use, as a character with special meaning inside a regular expression (such as *, [, ( or {) could cause the script to either fail or behave in an unexpected way.
You can't use shell variables directly in awk like this. Instead you pass them into your awk script using the -v flag:
awk -F ',' -v searchterm=$1 '$1==searchterm' $folder_main/kb/todo.kb

How to scape shell variable with spaces within AWK script

I have the path of "file1 Nov 2018.txt" stored in variable "var". Then I use this shell variable inside the awk script
to generate another script (this is a small example). The issue is the path and the filename have spaces and even I put the variable between double quotes ""
and within awk I put between single quotes '' is not working either. I get the error "No such file or directory"
How to handle this path that has spaces?
The script is like this:
var="/mydrive/d/data 2018/Documents Nov/file1 Nov 2018.txt"
z=$(awk -v a="$var" 'BEGIN{str = "cat " 'a' ; print str}')
eval "$z"
I get these errors:
$ eval "$z"
cat: /mydrive/d/data: No such file or directory
cat: 2018/Documents: No such file or directory
cat: Nov/file1: No such file or directory
cat: Nov: No such file or directory
cat: 2018.txt: No such file or directory
Thanks for any help.
The single-quote escape sequence comes in handy here. Note that 047 is the value in octal for the ASCII ' character, and awk allows you to use \nnn within a string to include any character using its octal value.
$ cat 'foo bar.txt'
a b c
1 2 3
$ var="foo bar.txt"
$ echo "$var"
foo bar.txt
$ z=$(awk -v a="$var" 'BEGIN{print "cat \047" a "\047"}')
$ eval "$z"
a b c
1 2 3
Maybe it's a bit nicer with printf:
$ awk -v a="$var" 'BEGIN{ printf "cat \047%s\047\n", a }'
cat 'foo bar.txt'
The problem is coming from the fact that the single quote has special meaning to the shell, so it's not surprising that there's a clash when single quotes are also being used in your awk program, when that program is on the command line.
This can be avoided by putting the awk program in its own file:
$ cat a.awk
BEGIN { printf "cat '%s'\n", a }
$ awk -v a="$var" -f a.awk
cat 'foo bar.txt'
remove the single quotes around a and add escaped double quotes instead.
$ echo success > "a b"
$ var="a b"; z=$(awk -v a="$var" 'BEGIN{print "cat \"" a "\""}');
$ eval "${z}"
success
however, most likely you're doing some task unnecessarily complex.
$ cat > path\ to/test
foo
$ z=$(awk -v a="$var" 'BEGIN{gsub(/ /,"\\ ",a); str = "cat " a ; print str}')
$ echo "$z"
cat path\ to/test
$ eval "$z"
foo
The key (in this solution) being: gsub(/ /,"\\ ",a) ie. escaping the spaces with a \ (\\ due to awk).
With bash's printf %q "$var" you can correctly escape any string for later use in eval - even linebreaks will be handled correctly. However, the resulting string may contain special symbols like \ that could be interpreted by awk when assigning variables with awk -v var="$var". Therefore, better pass the variable via stdin:
path='/path/with spaces/and/special/symbols/like/*/?/\/...'
cmd=$(printf %q "$path" | awk '{print "cat "$0}')
eval "$cmd"
In this example the generated command $cmd is
cat /path/with\ spaces/and/special/symbols/like/\*/\?/\\/...

Bash scripting | awk with ariables

Works:
repquota $HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}"
But if i try insert this to variable it isn't working:
VARIABLE=`repquota $FULL_HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}"`
awk: {if( > 1572864 && < 302118056)print}
awk: ^ syntax error
Your bash syntax is way off. You're not quoting variables, wrongly quoting an awk script, and using deprecated backticks. What you seem to be trying to do would be:
VARIABLE=$(repquota "$FULL_HOME" | awk -v min="$MIN" -v max="$MAX" '($3>min) && ($3<max)')
but since you didn't provide any sample input and expected output it's an untested guess and it's always hard to tell what you DO want from reading a script that doesn't do what you want.
Use the new command substitution syntax $(command):
VARIABLE=$(repquota $FULL_HOME | awk "{if(\$3 > $MIN && \$3 < $MAX )print}")
Explanation
From man bash:
When the old-style backquote form of substitution is used, backslash
retains its literal meaning except when followed by $, `, or \. The
first backquote not preceded by a backslash terminates the command sub‐
stitution. When using the $(command) form, all characters between the
parentheses make up the command; none are treated specially.
When using backslashes, a \$var inside a double-quoted string is not scaped, resulting in the value of $var being substituted, so awk does not see $3, as you expected.
You can see it with these commands:
var="I am a test string"
echo `echo "\$var"` # output: I am a test string
echo $(echo "\$var") # output: $var
Edit: As Ed Morton comments, you should not pass awk variables from shell that way, instead use the -v switch of awk:
VARIABLE=$(repquota $FULL_HOME | awk -v min="$MIN" -v max="$MAX" '{if($3 > min && $3 < max )print}')

How can I print a newline as \n in Bash?

Basically, I want to achieve something like the inverse of echo -e.
I have a variable which stores a command output, but I want to print newlines as \n.
Here's my solution:
sed 's/$/\\n/' | tr -d '\n'
If your input is already in a (Bash) shell variable, say $varWithNewlines:
echo "${varWithNewlines//$'\n'/\\n}"
It simply uses Bash parameter expansion to replace all newline ($'\n') instances with literal '\n' each.
If your input comes from a file, use AWK:
awk -v ORS='\\n' 1
In action, with sample input:
# Sample input with actual newlines created with ANSI C quoting ($'...'),
# which turns `\n` literals into actual newlines.
varWithNewlines=$'line 1\nline 2\nline 3'
# Translate newlines to '\n' literals.
# Note the use of `printf %s` to avoid adding an additional newline.
# By contrast, a here-string - <<<"$varWithNewlines" _always appends a newline_.
printf %s "$varWithNewlines" | awk -v ORS='\\n' 1
awk reads input line by line
by setting ORS- the output record separator to literal '\n' (escaped with an additional \ so that awk doesn't interpret it as an escape sequence), the input lines are output with that separator
1 is just shorthand for {print}, i.e., all input lines are printed, terminated by ORS.
Note: The output will always end in literal '\n', even if your input does not end in a newline.
This is because AWK terminates every output line with ORS, whether the input line ended with a newline (separator specified in FS) or not.
Here's how to unconditionally strip the terminating literal '\n' from your output.
# Translate newlines to '\n' literals and capture in variable.
varEncoded=$(printf %s "$varWithNewlines" | awk -v ORS='\\n' 1)
# Strip terminating '\n' literal from the variable value
# using Bash parameter expansion.
echo "${varEncoded%\\n}"
By contrast, more work is needed if you want to make the presence of a terminating literal '\n' dependent on whether the input ends with a newline or not.
# Translate newlines to '\n' literals and capture in variable.
varEncoded=$(printf %s "$varWithNewlines" | awk -v ORS='\\n' 1)
# If the input does not end with a newline, strip the terminating '\n' literal.
if [[ $varWithNewlines != *$'\n' ]]; then
# Strip terminating '\n' literal from the variable value
# using Bash parameter expansion.
echo "${varEncoded%\\n}"
else
echo "$varEncoded"
fi
You can use printf "%q":
eol=$'\n'
printf "%q\n" "$eol"
$'\n'
A Bash solution
x=$'abcd\ne fg\nghi'
printf "%s\n" "$x"
abcd
e fg
ghi
y=$(IFS=$'\n'; set -f; printf '%s\\n' $x)
y=${y%??}
printf "%s\n" "$y"
abcd\ne fg\nghi

Set variable in current shell from awk

Is there a way to set a variable in my current shell from within awk?
I'd like to do some processing on a file and print out some data; since I'll read the whole file through, I'd like to save the number of lines -- in this case, FNR.
Happens though I can't seem to find a way to set a shell variable with FNR value; if not this, I'd have to read the FNR from my output file, to set, say num_lines, with FNR value.
I've tried some combinations using awk 'END{system(...)}', but could not manage it to work. Any way around this?
Here's another way.
This is especially useful when when you've got the values of your variables in a single variable and you want split them up. For example, you have a list of values from a single row in a database that you want to create variables out of.
val="hello|beautiful|world" # assume this string comes from a database query
read a b c <<< $( echo ${val} | awk -F"|" '{print $1" "$2" "$3}' )
echo $a #hello
echo $b #beautiful
echo $c #world
We need the 'here string' i.e <<< in this case, because the read command does not read from a pipe and instead reads from stdin
$ echo "$var"
$ declare $( awk 'BEGIN{print "var=17"}' )
$ echo "$var"
17
Here's why you should use declare instead of eval:
$ eval $( awk 'BEGIN{print "echo \"removing all of your files, ha ha ha....\""}' )
removing all of your files, ha ha ha....
$ declare $( awk 'BEGIN{print "echo \"removing all of your files\""}' )
bash: declare: `"removing': not a valid identifier
bash: declare: `files"': not a valid identifier
Note in the first case that eval executes whatever string awk prints, which could accidentally be a very bad thing!
You can't export variables from a subshell to its parent shell. You have some other choices, though, including:
Make another pass of the file using AWK to count records, and use command substitution to capture the result. For example:
FNR=$(awk 'END {print FNR}' filename)
Print FNR in the subshell, and parse the output in your other process.
If FNR is the same as number of lines, you can call wc -l < filename to get your count.
A warning for anyone trying to use declare as suggested by several answers.
eval does not have this problem.
If the awk (or other expression) provided to declare results in an empty string then declare will dump the current environment.
This is almost certainly not what you would want.
eg: if your awk pattern doesn't exist in the input you will never print an output, therefore you will end up with unexpected behaviour.
An example of this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {print "var=17"}' )
echo "var=$var"
var=99
The current environment as seen by declare is printed
and $var is not changed
A minor change to store the value to set in an awk variable and print it at the end solves this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
To show this working with a matching pattern
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
Make awk print out the assignment statement:
MYVAR=NewValue
Then in your shell script, eval the output of your awk script:
eval $(awk ....)
# then use $MYVAR
EDIT: people recommend using declare instead of eval, to be slightly less error-prone if something other than the assignment is printed by the inner script. It's bash-only, but it's okay when the shell is bash and the script has #!/bin/bash, correctly stating this dependency.
The eval $(...) variant is widely used, with existing programs generating output suitable for eval but not for declare (lesspipe is an example); that's why it's important to understand it, and the bash-only variant is "too localized".
To synthesize everything here so far I'll share what I find is useful to set a shell environment variable from a script that reads a one-line file using awk. Obviously a /pattern/ could be used instead of NR==1 to find the needed variable.
# export a variable from a script (such as in a .dotfile)
declare $( awk 'NR==1 {tmp=$1} END {print "SHELL_VAR=" tmp}' /path/to/file )
export SHELL_VAR
This will avoid a massive output of variables if a declare command is issued with no argument, as well as the security risks of a blind eval.
echo "First arg: $1"
for ((i=0 ; i < $1 ; i++)); do
echo "inside"
echo "Welcome $i times."
cat man.xml | awk '{ x[NR] = $0 } END { for ( i=2 ; i<=NR ; i++ ) { if (x[i] ~ // ) {x[i+1]=" '$i'"}print x[i] }} ' > $i.xml
done
echo "compleated"

Resources