shell variable inside awk without -v option it is working - shell

I noticed that a shell script variable can be used inside an awk script like this:
var="help"
awk 'BEGIN{print "'$var'" }'
Can anyone tell me how to change the value of var inside awk while retaining the value outside of awk?
Similarly to accessing a variable of shell script inside awk, can we access shell array inside awk? If so, how?

It is impossible; the only variants you have:
use command substitution and write output of awk to the variable;
write data to file and then read from the outer shell;
produce shell output and then execute it with eval.
Examples.
Command substitution, one variable:
$ export A=10
$ A=$(awk 'END {print 2*ENVIRON["A"]}' < /dev/null)
$ echo $A
20
Here you multiple A by two and write the result of multiplication back.
eval; two variables:
$ A=10
$ B=10
$ eval $(awk 'END {print "A="2*ENVIRON["A"]"; B="2*ENVIRON["B"]}' < /dev/null)
$ echo $A
20
$ echo $B
20
$ awk 'END {print "A="2*ENVIRON["A"]"; B="2*ENVIRON["B"]}' < /dev/null
A=40; B=40

It uses a file intermediary, but it does work:
var="hello world"
cat > /tmp/my_script.awk.$$ <<EOF
BEGIN { print \"$var\" }
EOF
awk /tmp/my_script.awk.$$
rm -f /tmp/my_script.awk.$$
This uses the here document feature of the shell, Check your shell manual for the rules about interpolation within a here document.

Related

Assign bash value from value in specific line

I have a file that looks like:
>ref_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSD
>ref_frame=2
HQGLDISTMCFHRDGKDHQQYSKVA*QKS*SLLENKIQT*LSINTWMICM*DLT
>ref_frame=3
TRD*ISVQCASTGMERITSNIPK*HDKNLRAF*KTKSRHSYLSIHG*FVCRI*
>test_3_2960_3_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPSRKQNPDIVIYQYMDDLYVGSD
I want to assign a bash variable so that echo $variable gives test_3_2960
The line/row that I want to assign the variable to will always be line 7. How can I accomplish this using bash?
so far I have:
variable=`cat file.txt | awk 'NR==7'`
echo $variable = >test_3_2960_3_frame=1
Using sed
$ variable=$(sed -En '7s/>(([^_]*_){2}[0-9]+).*/\1/p' input_file)
$ echo "$variable"
test_3_2960
No pipes needed here...
$: variable=$(awk -F'[>_]' 'NR==7{ OFS="_"; print $2, $3, $4; exit; }' file)
$: echo $variable
test_3_2960
-F is using either > or _ as field separators, so your data starts in field 2.
OFS="_" sets the Output Field Separator, but you could also just use "_" instead of commas.
exit keeps it from wasting time bothering to read beyond line 7.
If you wish to continue with awk
$ variable=$(awk 'NR==7' file.txt | awk -F "[>_]" '{print $2"_"$3"_"$4}')
$ echo $variable
test_3_2960

Expand matched strings in sed

Is it possible to expand the matched string in a sed command? I want to substitute variable names in a file with their values, this is my script at the moment:
#!/bin/bash
echo "Running the build script..."
VAR1="2005648"
VAR2="7445aa"
SERVER_NAME=$(hostname)
TIMESTAMP=$(date +%m-%d-%Y)
sed -i "s/{[A-Z_][A-Z_]*}/$&/g" my_file.txt #variable names in the file are written between { }
and this is a snapshot of my_file.txt:
Building finished at {TIMESTAMP}
{VAR1}:{VAR2}
On: {SERVER_NAME}
current working directory: {PWD}
But it doesn't work. Instead of substituting the variable name with it's value, It inserts a dollar sign right before the curly bracket.
How do I resolve this?
You could use envsubst to substitute environment variables, otherwise you would need a bunch of sed commands to replace everything.
Change your template file to:
Building finished at ${TIMESTAMP}
${VAR1}:${VAR2}
On: ${SERVER_NAME}
current working directory: ${PWD}
And the script to:
#!/bin/bash
echo "Running the build script..."
export VAR1="2005648"
export VAR2="7445aa"
export SERVER_NAME=$(hostname)
export TIMESTAMP=$(date +%m-%d-%Y)
# only replace the defined variables
envsubst '$VAR1 $VAR2 $SERVER_NAME $TIMESTAMP' < my_file.txt > newfile
# replace all environment variables ($USER, $HOME, $HOSTNAME, etc.)
#envsubst < my_file.txt > newfile.txt > newfile
The script replaces environment variables $VAR1, $VAR2, $SERVER_NAME and $TIMESTAMP in my_file.txt and saves the output to newfile.
You can see that ${PWD} doesn't get replaced, because I forgot to add it to the list.
In the second commented example all environment variables are replaced and non-existing variables are replaced by an empty string.
You can use the $VARNAME or ${VARNAME} syntax in the template.
I'd actually do it in a single pass this way using an awk that supports ENVIRON[], e.g. any POSIX awk:
$ cat tst.sh
#!/bin/env bash
echo "Running the build script..."
VAR1=2005648 \
VAR2=7445aa \
SERVER_NAME=$(hostname) \
TIMESTAMP=$(date +%m-%d-%Y) \
awk '
{
while ( match($0,/{[[:alnum:]_]+}/) ) {
printf "%s", substr($0,1,RSTART-1) ENVIRON[substr($0,RSTART+1,RLENGTH-2)]
$0 = substr($0,RSTART+RLENGTH)
}
print
}
' file
$ ./tst.sh
Running the build script...
Building finished at 04-14-2020
2005648:7445aa
On: MyLogin
current working directory: /home/MyLogin
but if you really want to do multiple passes calling sed inside a shell loop then ${!variable} is your friend, here's a start:
$ cat tst.sh
#!/bin/env bash
VAR1='2005648'
VAR2='7445aa'
SERVER_NAME='foo'
for var in VAR1 VAR2 SERVER_NAME; do
echo "var, $var, ${!var}"
done
$ ./tst.sh
var, VAR1, 2005648
var, VAR2, 7445aa
var, SERVER_NAME, foo
.
$ VAR1='stuff'
$ var='VAR1'; echo 'foo {VAR1} bar' | sed "s/{$var}/${!var}/"
foo stuff bar
The awk script is robust but YMMV using sed depending on the contents of the variables, e.g. it'd fail if they contain & or / or \1 or .... ENVIRON[] only has access to shell variables set on the awk command line or exported, hence the escape at the end of each line that sets a shell variable so it's part of the awk command line.
You can try this.
#!/usr/bin/env bash
echo "Running the build script..."
VAR1="2005648"
VAR2="7445aa"
SERVER_NAME=$(hostname)
TIMESTAMP=$(date +%m-%d-%Y)
sed "s|{TIMESTAMP}|$TIMESTAMP|;s|{VAR1}|$VAR1|;s|{VAR2}|$VAR2|;s|{SERVER_NAME}|$SERVER_NAME|;s|{PWD}|$PWD|" file.txt
Just add {} in the variables e.g. {$TIMESTAMP} and so on, if you really need it.
That should work unless there is something more that is not included in the question above.

How to scape shell variable with spaces within AWK script

I have the path of "file1 Nov 2018.txt" stored in variable "var". Then I use this shell variable inside the awk script
to generate another script (this is a small example). The issue is the path and the filename have spaces and even I put the variable between double quotes ""
and within awk I put between single quotes '' is not working either. I get the error "No such file or directory"
How to handle this path that has spaces?
The script is like this:
var="/mydrive/d/data 2018/Documents Nov/file1 Nov 2018.txt"
z=$(awk -v a="$var" 'BEGIN{str = "cat " 'a' ; print str}')
eval "$z"
I get these errors:
$ eval "$z"
cat: /mydrive/d/data: No such file or directory
cat: 2018/Documents: No such file or directory
cat: Nov/file1: No such file or directory
cat: Nov: No such file or directory
cat: 2018.txt: No such file or directory
Thanks for any help.
The single-quote escape sequence comes in handy here. Note that 047 is the value in octal for the ASCII ' character, and awk allows you to use \nnn within a string to include any character using its octal value.
$ cat 'foo bar.txt'
a b c
1 2 3
$ var="foo bar.txt"
$ echo "$var"
foo bar.txt
$ z=$(awk -v a="$var" 'BEGIN{print "cat \047" a "\047"}')
$ eval "$z"
a b c
1 2 3
Maybe it's a bit nicer with printf:
$ awk -v a="$var" 'BEGIN{ printf "cat \047%s\047\n", a }'
cat 'foo bar.txt'
The problem is coming from the fact that the single quote has special meaning to the shell, so it's not surprising that there's a clash when single quotes are also being used in your awk program, when that program is on the command line.
This can be avoided by putting the awk program in its own file:
$ cat a.awk
BEGIN { printf "cat '%s'\n", a }
$ awk -v a="$var" -f a.awk
cat 'foo bar.txt'
remove the single quotes around a and add escaped double quotes instead.
$ echo success > "a b"
$ var="a b"; z=$(awk -v a="$var" 'BEGIN{print "cat \"" a "\""}');
$ eval "${z}"
success
however, most likely you're doing some task unnecessarily complex.
$ cat > path\ to/test
foo
$ z=$(awk -v a="$var" 'BEGIN{gsub(/ /,"\\ ",a); str = "cat " a ; print str}')
$ echo "$z"
cat path\ to/test
$ eval "$z"
foo
The key (in this solution) being: gsub(/ /,"\\ ",a) ie. escaping the spaces with a \ (\\ due to awk).
With bash's printf %q "$var" you can correctly escape any string for later use in eval - even linebreaks will be handled correctly. However, the resulting string may contain special symbols like \ that could be interpreted by awk when assigning variables with awk -v var="$var". Therefore, better pass the variable via stdin:
path='/path/with spaces/and/special/symbols/like/*/?/\/...'
cmd=$(printf %q "$path" | awk '{print "cat "$0}')
eval "$cmd"
In this example the generated command $cmd is
cat /path/with\ spaces/and/special/symbols/like/\*/\?/\\/...

Set variable in current shell from awk

Is there a way to set a variable in my current shell from within awk?
I'd like to do some processing on a file and print out some data; since I'll read the whole file through, I'd like to save the number of lines -- in this case, FNR.
Happens though I can't seem to find a way to set a shell variable with FNR value; if not this, I'd have to read the FNR from my output file, to set, say num_lines, with FNR value.
I've tried some combinations using awk 'END{system(...)}', but could not manage it to work. Any way around this?
Here's another way.
This is especially useful when when you've got the values of your variables in a single variable and you want split them up. For example, you have a list of values from a single row in a database that you want to create variables out of.
val="hello|beautiful|world" # assume this string comes from a database query
read a b c <<< $( echo ${val} | awk -F"|" '{print $1" "$2" "$3}' )
echo $a #hello
echo $b #beautiful
echo $c #world
We need the 'here string' i.e <<< in this case, because the read command does not read from a pipe and instead reads from stdin
$ echo "$var"
$ declare $( awk 'BEGIN{print "var=17"}' )
$ echo "$var"
17
Here's why you should use declare instead of eval:
$ eval $( awk 'BEGIN{print "echo \"removing all of your files, ha ha ha....\""}' )
removing all of your files, ha ha ha....
$ declare $( awk 'BEGIN{print "echo \"removing all of your files\""}' )
bash: declare: `"removing': not a valid identifier
bash: declare: `files"': not a valid identifier
Note in the first case that eval executes whatever string awk prints, which could accidentally be a very bad thing!
You can't export variables from a subshell to its parent shell. You have some other choices, though, including:
Make another pass of the file using AWK to count records, and use command substitution to capture the result. For example:
FNR=$(awk 'END {print FNR}' filename)
Print FNR in the subshell, and parse the output in your other process.
If FNR is the same as number of lines, you can call wc -l < filename to get your count.
A warning for anyone trying to use declare as suggested by several answers.
eval does not have this problem.
If the awk (or other expression) provided to declare results in an empty string then declare will dump the current environment.
This is almost certainly not what you would want.
eg: if your awk pattern doesn't exist in the input you will never print an output, therefore you will end up with unexpected behaviour.
An example of this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {print "var=17"}' )
echo "var=$var"
var=99
The current environment as seen by declare is printed
and $var is not changed
A minor change to store the value to set in an awk variable and print it at the end solves this....
unset var
var=99
declare $( echo "foobar" | awk '/fail/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
To show this working with a matching pattern
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
This time $var is unset ie: set to the null string var=''
and there is no unwanted output.
Make awk print out the assignment statement:
MYVAR=NewValue
Then in your shell script, eval the output of your awk script:
eval $(awk ....)
# then use $MYVAR
EDIT: people recommend using declare instead of eval, to be slightly less error-prone if something other than the assignment is printed by the inner script. It's bash-only, but it's okay when the shell is bash and the script has #!/bin/bash, correctly stating this dependency.
The eval $(...) variant is widely used, with existing programs generating output suitable for eval but not for declare (lesspipe is an example); that's why it's important to understand it, and the bash-only variant is "too localized".
To synthesize everything here so far I'll share what I find is useful to set a shell environment variable from a script that reads a one-line file using awk. Obviously a /pattern/ could be used instead of NR==1 to find the needed variable.
# export a variable from a script (such as in a .dotfile)
declare $( awk 'NR==1 {tmp=$1} END {print "SHELL_VAR=" tmp}' /path/to/file )
export SHELL_VAR
This will avoid a massive output of variables if a declare command is issued with no argument, as well as the security risks of a blind eval.
echo "First arg: $1"
for ((i=0 ; i < $1 ; i++)); do
echo "inside"
echo "Welcome $i times."
cat man.xml | awk '{ x[NR] = $0 } END { for ( i=2 ; i<=NR ; i++ ) { if (x[i] ~ // ) {x[i+1]=" '$i'"}print x[i] }} ' > $i.xml
done
echo "compleated"

Passing bash input variables to awk

Trying to pass a variable into awk from user input:
Have tried variations of awk -v with errors stating 'awk: invalid -v option', even though the option is listed in man files.
#! /bin/bash
read -p "Enter ClassID:" CLASS
read -p "Enter FacultyName:" FACULTY
awk '/FacultyName/ {print}' data-new.csv > $FACULTY.csv
awk -vclass=${CLASS} '/class/ {print}' data-new.csv >> $FACULTY.csv
echo Class is $CLASS
echo Faculty member is $FACULTY
Some versions of awk require a space between the -v and the variable assignment. Also, you should put the bash variable in double-quotes to prevent unwanted parsing by the shell (e.g. word splitting, wildcard expansion) before it's passed to awk. Finally, in awk /.../ is a constant regular expression (i.e. /class/ will search for the string "class", not the value of the variable "class"). With all of this corrected, here's the awk command that I think will do what you want:
awk -v class="${CLASS}" '$0 ~ class {print}' data-new.csv >> $FACULTY.csv
Now, is there any reason you're using this instead of:
grep "$CLASS" data-new.csv >> $FACULTY.csv
Your script is not clear to me, but these all work:
CLASS=ec123
echo | awk -vclass=$CLASS '{print class}'
echo | awk -vclass=${CLASS} '{print class}'

Resources