Use array variable in awk? - bash

A=(aaa bbb ccc)
cat abc.txt | awk '{ print $1, ${A[$1]} }'
I want to index an array element based on the $1, but the code above is not correct in awk syntax. Could someone help?

You can't index a bash array using a value generated inside awk, even if you weren't using single quotes (thereby preventing bash from doing any substitution). You could pass the array in, though.
A=(aaa bbb ccc)
awk -v a="${A[*]}" 'BEGIN {split(a, A, / /)}
{print $1, A[$1] }' <abc.txt
Because of the split function inside awk, the elements of A may not contain spaces or newlines. If you need to do anything more interesting, set the array inside of awk.
awk 'BEGIN {a[1] = "foo bar" # sadly, there is no way to set an array all
a[2] = "baz" } # at once without abusing split() as above
{print $1, a[$1] }' <abc.txt
(Clarification: bash substitutes variables before invoking the program whose argument you're substituting, so by the time you have $1 in awk it's far too late to ask bash to use it to substitute a particular element of A.)

If you are going to be hard-coding the A array, you can just initialize it in awk
awk 'BEGIN{A[0]="aaa";A[1]="bbb"}{ print $1, A[$1] }' abc.txt

Your awk program within single quotes cannot see the shell environment variable A. In general, you can get a little shell substitution to work if you use double quotes instead of single quotes, but that is done by the shell, before awk is invoked. Overall, it is heavy sledding to try to combine shell and awk this way. If possible, I would take kurumi's approach of using an awk array.
Single quotes: an impenetrable veil.
Double quotes: generally too much travail.
So pick your poison: shell or awk.
Otherwise: your code may balk.

You can also print each element of the array on separate line with printf and pipe it to awk. This code will simply print bash array (bash_arr) from awk:
bash_arr=( 1 2 3 4 5 )
printf '%s\n' "${bash_arr[#]}" |
awk ' { awk_arr[NR] = $0 }
END {
for (key in awk_arr) {
print awk_arr[key]
}
}'

Related

Pass a variable to awk contained between special characters [duplicate]

I want to extract a substring where certain pattern exist from pipe separated file, thus I used below command,
awk -F ":" '/REWARD REQ. SERVER HEADERS/{print $1, $2, $3, $4}' sample_profile.txt
Here, 'REWARD REQ. SERVER HEADERS' is a pattern which is to be searched in the file, and print its first 4 parts on a colon separated line.
Now, I want to send bash variable to act as a pattern. thus I used below command, but it's not working.
awk -v pat="$pattern" -F ":" '/pat/{print $1, $2 , $3, $4 } sample_profile.txt
How can I use -v and -F in a single awk command?
If you want to provide the pattern through a variable, you need to use ~ to match against it:
awk -v pat="$pattern" '$0 ~ pat'
In your case, the problem does not have to do with -F.
The problem is the usage of /pat/ when you want pat to be a variable. If you say /pat/, awk understands it as a literal "pat", so it will try to match those lines containing the string "pat".
All together, your code should be:
awk -v pat="$pattern" -F ":" '$0~pat{print $1, $2, $3, $4 }' file
# ^^^^^^
See an example:
Given this file:
$ cat file
hello
this is a var
hello bye
Let's look for lines containing "hello":
$ awk '/hello/' file
hello
hello bye
Let's now try looking for "pat", contained in a variable, the way you were doing it:
$ awk -v pat="hello" '/pat/' file
$ # NO MATCHES!
Let's now use the $0 ~ pat expression:
$ awk -v pat="hello" '$0~pat' file
hello # WE MATCH!
hello bye
Of course, you can use such expressions to match just one field and say awk -v pat="$pattern" '$2 ~ pat' file and so on.
From GNU Awk User's Guide → 3.1 How to Use Regular Expressions:
When a regexp is enclosed in slashes, such as /foo/, we call it a regexp constant, much like 5.27 is a numeric constant and "foo" is a string constant.
And GNU Awk User's Guide → 3.6 Using Dynamic Regexps:
The righthand side of a ‘~’ or ‘!~’ operator need not be a regexp
constant (i.e., a string of characters between slashes). It may be any
expression. The expression is evaluated and converted to a string if
necessary; the contents of the string are then used as the regexp. A
regexp computed in this way is called a dynamic regexp or a computed
regexp:
BEGIN { digits_regexp = "[[:digit:]]+" }
$0 ~ digits_regexp { print }
This sets digits_regexp to a regexp that describes one or more digits,
and tests whether the input record matches this regexp.
awk -v pat="$pattern" -F":" '$0 ~ pat { print $1, $2, $3, $4 }' sample_profile.txt
You can't use the variable inside the regex // notation (there's no way to distinguish it from searching for pat); you have to specify that the variable is a regex with the ~ (matching) operator.
This is kind of a hack but it makes things a little simpler for me.
cmd="awk '/$pattern/'"
eval $cmd
making it a string first lets you manipulate it past the boundaries of awk

awk variable for `awk ~ $3 /$VARIABLE/` not working [duplicate]

I want to extract a substring where certain pattern exist from pipe separated file, thus I used below command,
awk -F ":" '/REWARD REQ. SERVER HEADERS/{print $1, $2, $3, $4}' sample_profile.txt
Here, 'REWARD REQ. SERVER HEADERS' is a pattern which is to be searched in the file, and print its first 4 parts on a colon separated line.
Now, I want to send bash variable to act as a pattern. thus I used below command, but it's not working.
awk -v pat="$pattern" -F ":" '/pat/{print $1, $2 , $3, $4 } sample_profile.txt
How can I use -v and -F in a single awk command?
If you want to provide the pattern through a variable, you need to use ~ to match against it:
awk -v pat="$pattern" '$0 ~ pat'
In your case, the problem does not have to do with -F.
The problem is the usage of /pat/ when you want pat to be a variable. If you say /pat/, awk understands it as a literal "pat", so it will try to match those lines containing the string "pat".
All together, your code should be:
awk -v pat="$pattern" -F ":" '$0~pat{print $1, $2, $3, $4 }' file
# ^^^^^^
See an example:
Given this file:
$ cat file
hello
this is a var
hello bye
Let's look for lines containing "hello":
$ awk '/hello/' file
hello
hello bye
Let's now try looking for "pat", contained in a variable, the way you were doing it:
$ awk -v pat="hello" '/pat/' file
$ # NO MATCHES!
Let's now use the $0 ~ pat expression:
$ awk -v pat="hello" '$0~pat' file
hello # WE MATCH!
hello bye
Of course, you can use such expressions to match just one field and say awk -v pat="$pattern" '$2 ~ pat' file and so on.
From GNU Awk User's Guide → 3.1 How to Use Regular Expressions:
When a regexp is enclosed in slashes, such as /foo/, we call it a regexp constant, much like 5.27 is a numeric constant and "foo" is a string constant.
And GNU Awk User's Guide → 3.6 Using Dynamic Regexps:
The righthand side of a ‘~’ or ‘!~’ operator need not be a regexp
constant (i.e., a string of characters between slashes). It may be any
expression. The expression is evaluated and converted to a string if
necessary; the contents of the string are then used as the regexp. A
regexp computed in this way is called a dynamic regexp or a computed
regexp:
BEGIN { digits_regexp = "[[:digit:]]+" }
$0 ~ digits_regexp { print }
This sets digits_regexp to a regexp that describes one or more digits,
and tests whether the input record matches this regexp.
awk -v pat="$pattern" -F":" '$0 ~ pat { print $1, $2, $3, $4 }' sample_profile.txt
You can't use the variable inside the regex // notation (there's no way to distinguish it from searching for pat); you have to specify that the variable is a regex with the ~ (matching) operator.
This is kind of a hack but it makes things a little simpler for me.
cmd="awk '/$pattern/'"
eval $cmd
making it a string first lets you manipulate it past the boundaries of awk

Add string to columns in bash

I have a comma-delimited file to which I want to append a string in specific columns. I am trying to do something like this, but couldn't do it until now.
re1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT
And I want to append 'some_string' to columns 3 and 4:
re1,1,some_stringa1e,some_stringa2e,AGT
re2,2,some_stringa1w,some_stringa2w,AGT
re3,3,some_stringa1t,some_stringa2t,ACGTCA
re12,4,some_stringb1e,some_stringb2e,ACGTACT
I was trying something similar to the suggestion solution, but to no avail:
awk -v OFS=$'\,' '{ $3="some_string" $3; print}' $lookup_file
Also, I would like my string to be added to both columns. How would you do this with awk or bash?
Thanks a lot in advance
You can do that with (almost) what you have:
pax> echo 're1,1,a1e,a2e,AGT
re2,2,a1w,a2w,AGT
re3,3,a1t,a2t,ACGTCA
re12,4,b1e,b2e,ACGTACT' | awk 'BEGIN{FS=OFS=","}{$3 = "pre3:"$3; $4 = "pre4:"$4; print}'
re1,1,pre3:a1e,pre4:a2e,AGT
re2,2,pre3:a1w,pre4:a2w,AGT
re3,3,pre3:a1t,pre4:a2t,ACGTCA
re12,4,pre3:b1e,pre4:b2e,ACGTACT
The begin block sets the input and output field separators, the two assignments massage fields 3 and 4, and the print outputs the modified line.
You need to set FS to comma, not just OFS. There's a shortcut for setting FS, it's the -F option.
awk -F, -v OFS=',' '{ $3="some_string" $3; $4 = "some_string" $4; print}' "$lookup_file"
awk's default action is to concatenate, so you can simply place strings next to each other and they'll be treated as one. 1 means true, so with no {action} it will assume "print". You can use Bash's Brace Expansion to assign multiple variables after the script.
awk '{$3 = "three" $3; $4 = "four" $4} 1' {O,}FS=,

split the end of the path in shell script

I have following string in my shell script.
/usr/java/jdk1.8.0_77/jre/bin/java
What is the best way to split it into /usr/java/jdk1.8.0_77/jre
#! /bin/sh
path=/usr/java/jdk1.8.0_77/jre/bin/java
short_path="${path%/bin*}"
echo $short_path
More string manipulation examples here:
http://tldp.org/LDP/abs/html/string-manipulation.html
With awk, if you can setup the input and output separators correctly, the solution becomes intuitive:
echo /usr/java/jdk1.8.0_77/jre/bin/java | awk '{ NF -= 2 } 1' FS=/ OFS=/
Output:
/usr/java/jdk1.8.0_77/jre
Explanation
awk implicitly splits its input at the FS string (or pattern with some versions of awk). The number of fields is stored in the NF variable; subtracting two from NF results in leaving off the last two elements. The 1 at the end invokes the default code block: { print $0 }.
If you are looking for an awk solution, one alternative is (similar in sed)
$ echo /usr/java/jdk1.8.0_77/jre/bin/java |
awk '{sub("/[^/]+/[^/]+$","")}1'
/usr/java/jdk1.8.0_77/jre
note that this is generic in the sense that it will chop down the last two levels in the path.

awk function printing..... -bash?

For some reason that i'm trying to figure out i'm getting "-bash" printed out of this script:
cat sample | awk -v al=$0 -F"|" '{n = split(al, a, "|")} {print a[1]}'
the 'sample' file contains psv "pipe separated value", like a|b|c|d|e|f|d.
My intention is to use an array.
The result of the above script is an array of length 1 and th only item contained is "-bash", the name of the shell.
$0 by default points to the program that is currently used, but as far as i know, within an awk script, the $0 parameter 'should' point to the entire line being read.
since i would like to understand where the problem exaclty is "i'm new to bash/awk"
can you point me out which of the following steps is failing?
1-"concatenate" the sample file and pass it as input for the awk script
2-define a variable named 'al' with as value each line contained in 'sample'
3-define a pipe "|" as field separator
4-define an action, split the value of 'al' into an array named 'a' using a pipe as splitter
5-define another action, which in this case is simply printing the first item in the array
Any advice? thank you!
The $0 is expanded by the shell before it runs awk, and $0 is the name of the current program, which is bash, the - at the start is because bash was run by login(1) (see the description of the exec builtin in man bash)
You need to quote the $0 so the shell doesn't expand it, and awk sees it:
awk -v 'al=$0' -F"|" '{n = split(al, a, "|")} {print a[1]}' sample
But variable assignments are processed before reading any data, so that sets the variable al to the string "$0" at the start of the program, it does not set al to the contents of each input record.
If you want the record, just say so instead of using a variable:
awk -F"|" '{n = split($0, a, "|")} {print a[1]}' sample
By -v a1=$0, you are setting a1 to the name of the current programme, which is bash. See Arguments in man bash.
Err...
awk -F'|' '{ print $1 }' sample

Resources