For some reason that i'm trying to figure out i'm getting "-bash" printed out of this script:
cat sample | awk -v al=$0 -F"|" '{n = split(al, a, "|")} {print a[1]}'
the 'sample' file contains psv "pipe separated value", like a|b|c|d|e|f|d.
My intention is to use an array.
The result of the above script is an array of length 1 and th only item contained is "-bash", the name of the shell.
$0 by default points to the program that is currently used, but as far as i know, within an awk script, the $0 parameter 'should' point to the entire line being read.
since i would like to understand where the problem exaclty is "i'm new to bash/awk"
can you point me out which of the following steps is failing?
1-"concatenate" the sample file and pass it as input for the awk script
2-define a variable named 'al' with as value each line contained in 'sample'
3-define a pipe "|" as field separator
4-define an action, split the value of 'al' into an array named 'a' using a pipe as splitter
5-define another action, which in this case is simply printing the first item in the array
Any advice? thank you!
The $0 is expanded by the shell before it runs awk, and $0 is the name of the current program, which is bash, the - at the start is because bash was run by login(1) (see the description of the exec builtin in man bash)
You need to quote the $0 so the shell doesn't expand it, and awk sees it:
awk -v 'al=$0' -F"|" '{n = split(al, a, "|")} {print a[1]}' sample
But variable assignments are processed before reading any data, so that sets the variable al to the string "$0" at the start of the program, it does not set al to the contents of each input record.
If you want the record, just say so instead of using a variable:
awk -F"|" '{n = split($0, a, "|")} {print a[1]}' sample
By -v a1=$0, you are setting a1 to the name of the current programme, which is bash. See Arguments in man bash.
Err...
awk -F'|' '{ print $1 }' sample
Related
Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.
I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile
I have following string in my shell script.
/usr/java/jdk1.8.0_77/jre/bin/java
What is the best way to split it into /usr/java/jdk1.8.0_77/jre
#! /bin/sh
path=/usr/java/jdk1.8.0_77/jre/bin/java
short_path="${path%/bin*}"
echo $short_path
More string manipulation examples here:
http://tldp.org/LDP/abs/html/string-manipulation.html
With awk, if you can setup the input and output separators correctly, the solution becomes intuitive:
echo /usr/java/jdk1.8.0_77/jre/bin/java | awk '{ NF -= 2 } 1' FS=/ OFS=/
Output:
/usr/java/jdk1.8.0_77/jre
Explanation
awk implicitly splits its input at the FS string (or pattern with some versions of awk). The number of fields is stored in the NF variable; subtracting two from NF results in leaving off the last two elements. The 1 at the end invokes the default code block: { print $0 }.
If you are looking for an awk solution, one alternative is (similar in sed)
$ echo /usr/java/jdk1.8.0_77/jre/bin/java |
awk '{sub("/[^/]+/[^/]+$","")}1'
/usr/java/jdk1.8.0_77/jre
note that this is generic in the sense that it will chop down the last two levels in the path.
I have a situation in awk where I need to convert an input format into another format and later use the number of records processed separately. Is there any way I can use a shell variable to get the value of NR in the END section? Something like:
cat file1 | awk 'some processing END{SHELL_VARIABLE=NR}' > file2
Then later use SHELL_VARIABLE outside awk.
I do not want to process the file and then do a wc -l separately as the files are huge.
One way: Use the redirection inside your awk command and print your result in the END block. And use command substitution to read the result in a shell variable:
my_var=$(awk '{ some processing; print "your output" >>file2 } END { print NR }' file1)
No subprocess can affect the parent's environment variables. What you can do is have awk write output to the file directly, then have it print the value you want to stdout and capture it. Or if you prefer, you could reverse that and have awk just print it to a file and read it back afterwards.
Incidentally, you have a UUOC.
rows=$(awk '{ ...; print > "file2"} END {print NR}' file1)
Or
awk '... END{print NR > "rows"}' file1 >file2
rows=$(<rows)
rm rows
A=(aaa bbb ccc)
cat abc.txt | awk '{ print $1, ${A[$1]} }'
I want to index an array element based on the $1, but the code above is not correct in awk syntax. Could someone help?
You can't index a bash array using a value generated inside awk, even if you weren't using single quotes (thereby preventing bash from doing any substitution). You could pass the array in, though.
A=(aaa bbb ccc)
awk -v a="${A[*]}" 'BEGIN {split(a, A, / /)}
{print $1, A[$1] }' <abc.txt
Because of the split function inside awk, the elements of A may not contain spaces or newlines. If you need to do anything more interesting, set the array inside of awk.
awk 'BEGIN {a[1] = "foo bar" # sadly, there is no way to set an array all
a[2] = "baz" } # at once without abusing split() as above
{print $1, a[$1] }' <abc.txt
(Clarification: bash substitutes variables before invoking the program whose argument you're substituting, so by the time you have $1 in awk it's far too late to ask bash to use it to substitute a particular element of A.)
If you are going to be hard-coding the A array, you can just initialize it in awk
awk 'BEGIN{A[0]="aaa";A[1]="bbb"}{ print $1, A[$1] }' abc.txt
Your awk program within single quotes cannot see the shell environment variable A. In general, you can get a little shell substitution to work if you use double quotes instead of single quotes, but that is done by the shell, before awk is invoked. Overall, it is heavy sledding to try to combine shell and awk this way. If possible, I would take kurumi's approach of using an awk array.
Single quotes: an impenetrable veil.
Double quotes: generally too much travail.
So pick your poison: shell or awk.
Otherwise: your code may balk.
You can also print each element of the array on separate line with printf and pipe it to awk. This code will simply print bash array (bash_arr) from awk:
bash_arr=( 1 2 3 4 5 )
printf '%s\n' "${bash_arr[#]}" |
awk ' { awk_arr[NR] = $0 }
END {
for (key in awk_arr) {
print awk_arr[key]
}
}'