Unable to use associative array value in sed or awk - bash

I am trying to iteratively search and replace strings in a file using a variable input and replacement string. I have tried using sed and awk and have seemed to determine that it is actually the associative array value that is giving me issues(?).
I am looking at an associative array like this:
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
and for ease I was declaring my replacement vars first:
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
#replaceWithValue="blah"
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' infile.txt > outfile.txt
#sed 's/'"$findValue"'/'"$replaceWithValue"'/g' infile.txt > outfile.txt
#sed "s/$findValue/$replaceWithValue/g" $scriptDir/$currentFileName > outfile.txt
done
The commented out lines are alternate versions of what I have tried with similar inbetween versions.
I have tried using just a normal string (the commented out "blah") and that works fine.
The weirdest part is that the echo statement displays the right value for both key and value.
I have tried so many combinations I am losing my mind. Please someone tell me I am doing something dumb here.
NOTE: This is nested inside another loop but I do not believe this to be an issue, let me know if I am wrong
EDIT: I have simplified the in and out files, and to clarify, if i try to use my associative array value, nothing gets replaced. But if i use a dummy string like "blah" it works.
BONUS: I have marked the answer below, but my search and replace values start and end in double quotes but no matter what I try it replaces all instances of 60. How can i make it replace "60" with "xsixtyx"?
Thanks

I think you want to use >> instead of > inside your loop?
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
I tried to run your code it works as expected except that >.
Or if you just want to see the replaced results
awk -v srch="$findValue" -v repl="$replaceWithValue" '{ if (gsub(srch,repl)) print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
For a file with
30
20
60
the output looks like
xthirtyx
xhundredx
xfiftyx
For the second case.
Here is the full bash script I tried
#!/bin/bash
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{if (gsub(srch,repl)) print}' test.txt >> /tmp/test.txt
done

Related

Parameter expansion not working when used inside Awk on one of the column entries

System: Linux. Bash 4.
I have the following file, which will be read into a script as a variable:
/path/sample_A.bam A 1
/path/sample_B.bam B 1
/path/sample_C1.bam C 1
/path/sample_C2.bam C 2
I want to append "_string" at the end of the filename of the first column, but before the extension (.bam). It's a bit trickier because of containing the path at the beginning of the name.
Desired output:
/path/sample_A_string.bam A 1
/path/sample_B_string.bam B 1
/path/sample_C1_string.bam C 1
/path/sample_C2_string.bam C 2
My attempt:
I did the following script (I ran: bash script.sh):
List=${1};
awk -F'\t' -vOFS='\t' '{ $1 = "${1%.bam}" "_string.bam" }1' < ${List} ;
And its output was:
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
Problem:
I followed the idea of using awk for this substitution as in this thread https://unix.stackexchange.com/questions/148114/how-to-add-words-to-an-existing-column , but the parameter expansion of ${1%.bam} it's clearly not being recognised by AWK as I intend. Does someone know the correct syntax for that part of code? That part was meant to mean "all the first entry of the first column, except the last part of .bam". I used ${1%.bam} because it works in Bash, but AWK it's another language and probably this differs. Thank you!
Note that the paramter expansion you applied on $1 won't apply inside awk as the entire command
body of the awk command is passed in '..' which sends content literally without applying any
shell parsing. Hence the string "${1%.bam}" is passed as-is to the first column.
You can do this completely in Awk
awk -F'\t' 'BEGIN { OFS = FS }{ n=split($1, arr, "."); $1 = arr[1]"_string."arr[2] }1' file
The code basically splits the content of $1 with delimiter . into an array arr in the context of Awk. So the part of the string upto the first . is stored in arr[1] and the subsequent split fields are stored in the next array indices. We re-construct the filename of your choice by concatenating the array entries with the _string in the filename part without extension.
If I understood your requirement correctly, could you please try following.
val="_string"
awk -v value="$val" '{sub(".bam",value"&")} 1' Input_file
Brief explanation: -v value means passing shell variable named val value to awk variable variable here. Then using sub function of awk to substitute string .bam with string value along with .bam value which is denoted by & too. Then mentioning 1 means print edited/non-edtied line.
Why OP's attempt didn't work: Dear, OP. in awk we can't pass variables of shell directly without mentioning them in awk language. So what you are trying will NOT take it as an awk variable rather than it will take it as a string and printing it as it is. I have mentioned in my explanation above how to define shell variables in awk too.
NOTE: In case you have multiple occurences of .bam then please change sub to gsub in above code. Also in case your Input_file is TAB delmited then use awk -F'\t' in above code.
sed -i 's/\.bam/_string\.bam/g' myfile.txt
It's a single line with sed. Just replace the .bam with _string.bam
You can try this way with awk :
awk -v a='_string' 'BEGIN{FS=OFS="."}{$1=$1 a}1' infile

Bash Script: Grabbing First Item Per Line, Throwing Into Array

I'm fairly new to the world of writing Bash scripts and am needing some guidance. I've begun writing a script for work, and so far so good. However, I'm now at a part that needs to collect database names. The names are actually stored in a file, and I can grep them.
The command I was given is cat /etc/oratab which produces something like this:
# This file is used by ORACLE utilities. It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
I turn turned around and wrote grep ":/software/oracle/ora" /etc/oratab so it can grab everything I need, which is 10 databases. Not the most elegant way, but it gets what I need:
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
So, if I want to grab the name, such as dev068 or xtst161, how do I? I think for what I need to do with this project moving forward, is storing them in an array. As mentioned in the documentation, a colon is the field terminator. How could I whip this together so I have an array, something like:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
I feel like I may be asking for too much assistance here but I'm truly at a loss. I would be happy to clarify if need be.
It is much simpler using awk:
awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
To populate a BASH array with above output use:
mapfile -t arr < <(awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab)
To check output:
declare -p arr
declare -a arr='([0]="dev068" [1]="dev299" [2]="xtst036" [3]="xtst161" [4]="dev360" [5]="dev361" [6]="xtst215" [7]="xtst216" [8]="dev298" [9]="xtst160")'
We can pipe the output of grep to the cut utility to extract the first field, taking colon as the field separator.
Then, assuming there are no whitespace or glob characters in any of the names (which would be subject to word splitting and filename expansion), we can use a command substitution to run the pipeline, and capture the output in an array by assigning it within the parentheses.
names=($(grep ':/software/oracle/ora' /etc/oratab| cut -d: -f1;));
Note that the above command actually makes use of word splitting on the command substitution output to split the names into separate elements of the resulting array. That is why we must be sure that no whitespace occurs within any single database name, otherwise that name would be internally split into separate elements of the array. The only characters within the command substitution output that we want to be taken as word splitting delimiters are the line feeds that delimit each line of output coming off the cut utility.
You could also use awk for this:
awk -F: '!/^#/ && $2 ~ /^\/software\/oracle\/ora-/ {print $1}' /etc/oratab
The first pattern excludes any commented-out lines (starting with a #). The second pattern looks for your expected directory pattern in the second field. If both conditions are met it prints the first field, which the Oracle SID. The -F: flag sets the field delimiter to a colon.
With your file that gets:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
Depending on what you're doing you could finesse it further and check the last flag is set to Y; although that is really to indicate automatic start-up, it can sometime be used to indicate that a database isn't active at all.
And you can put the results into an array with:
declare -a DBS=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
and then refer to ${DBS[1]} (which evaluates to dev299) etc.
If you'd like them into a Bash array:
$ cat > toarr.bash
#!/bin/bash
while read -r line
do
if [[ $line =~ .*Y$ ]] # they seem to end in a "Y"
then
arr[$((i++))]=${line%%:*}
fi
done < file
echo ${arr[*]} # here we print the array arr
$ bash toarr.bash
dev068 dev299 xtst036 xtst161 dev360 dev361 xtst215 xtst216 dev298 xtst160

awk output is acting weird

cat TEXT | awk -v var=$i -v varB=$j '$1~var , $1~varB {print $1}' > PROBLEM HERE
I am passing two variables from an array to parse a very large text file by range. And it works, kind of.
if I use ">" the output to the file will ONLY be the last three lines as verified by cat and a text editor.
if I use ">>" the output to the file will include one complete read of TEXT and then it will divide the second read into the ranges I want.
if I let the output go through to the shell I get the same problem as above.
Question:
It appears awk is reading every line and printing it. Then it goes back and selects the ranges from the TEXT file. It does not do this if I use constants in the range pattern search.
I undestand awk must read all lines to find the ranges I request.
why is it printing the entire document?
How can I get it to ONLY print the ranges selected?
This is the last hurdle in a big project and I am beating my head against the table.
Thanks!
give this a try, you didn't assign varB in right way:
yours: awk -v var="$i" -varB="$j" ...
mine : awk -v var="$i" -v varB="$j" ...
^^
Aside from the typo, you can't use variables in //, instead you have to specify with regular ~ match. Also quote your shell variables (here is not needed obviously, but to set an example). For example
seq 1 10 | awk -v b="3" -v e="5" '$0 ~ b, $0 ~ e'
should print 3..5 as expected
It sounds like this is what you want:
awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
e.g.
$ cat file
1
2
foo
3
4
bar
5
foo
6
bar
7
$ awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
foo
3
4
bar
foo
6
bar
but without sample input and expected output it's just a guess and this would not address the SHELL behavior you are seeing wrt use of > vs >>.
Here's what happened. I used an array to input into my variables. I set the counter for what I thought was the total length of the array. When the final iteration of the array was reached, there was a null value returned to awk for the variable. This caused it to print EVERYTHING. Once I correctly had a counter with the correct number of array elements the printing oddity ended.
As far as the > vs >> goes, I don't know. It did stop, but I wasn't as careful in documenting it. I think what happened is that I used $1 in the print command to save time, and with each line it printed at the end it erased the whole file and left the last three identical matches. Something to ponder. Thanks Ed for the honest work. And no thank you to Robo responses.

Save changes to a file AWK/SED

I have a huge text file delimited with comma.
19429,(Starbucks),390 Provan Walk,Glasgow,G34 9DL,-4.136909,55.872982
The first one is a unique id. I want the user to enter the id and enter a value for one of the following 6 fields in order to be replaced. Also, i'm asking him to enter a 2-7 value in order to identify which field should be replaced.
Now i've done something like this. I am checking every line to find the id user entered and then i'm replacing the value.
awk -F ',' -v elem=$element -v id=$code -v value=$value '{if($1==id) {if(elem==2) { $2=value } etc }}' $path
Where $path = /root/clients.txt
Let's say user enters "2" in order to replace the second field, and also enters "Whatever". Now i want "(Starbucks)" to be replaced with "Whatever" What i've done work fine but does not save the change into the file. I know that awk is not supposed to do so, but i don't know how to do it. I've searched a lot in google but still no luck.
Can you tell me how i'm supposed to do this? I know that i can do it with sed but i don't know how.
Newer versions of GNU awk support inplace editing:
awk -i inplace -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path"
With other awks:
awk -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path" > /usr/tmp/tmp$$ &&
mv /usr/tmp/tmp$$ "$path"
NOTES:
Always quote your shell variables unless you have an explicit reason not to and fully understand all of the implications and caveats.
If you're creating a tmp file, use "&&" before replacing your original with it so you don't zap your original file if the tmp file creation fails for any reason.
I fully support replacing Starbucks with Whatever in Glasgow - I'd like to think they wouldn't have let it open in the first place back in my day (1986 Glasgow Uni Comp Sci alum) :-).
awk is much easier than sed for processing specific variable fields, but it does not have in-place processing. Thus you might do the following:
#!/bin/bash
code=$1
element=$2
value=$3
echo "code is $code"
awk -F ',' -v elem=$element -v id=$code -v value=$value 'BEGIN{OFS=",";} /^'$code',/{$elem=value}1' mydb > /tmp/mydb.txt
mv /tmp/mydb.txt ./mydb
This finds a match for a line starting with code followed by a comma (you could also use ($1==code)), then sets the elemth field to value; finally it prints the output, using the comma as output field separator. If nothing matches, it just echoes the input line.
Everything is written to a temporary file, then overwrites the original.
Not very nice but it gets the job done.

how can I supply bash variables as fields for print in awk

I currently am trying to use awk to rearrange a .csv file that is similar to the following:
stack,over,flow,dot,com
and the output would be:
over,com,stack,flow,dot
(or any other order, just using this as an example)
and when it comes time to rearrange the csv file, I have been trying to use the following:
first='$2'
second='$5'
third='$1'
fourth='$3'
fifth='$4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' '{print $a,$b,$c,$d,$e}' somefile.csv
with the intent of awk/print interpreting the $a,$b,$c,etc as field numbers, so it would come out to the following:
{print $2,$5,$1,$3,$4}
and print out the fields of the csv file in that order, but unfortunately I have not been able to get this to work correctly yet. I've tried several different methods, this seeming like the most promising, but unfortunately have not been able to get any solution to work correctly yet. Having said that, I was wondering if anyone could possibly give any suggestions or point out my flaw as I am stumped at this point in time, any help would be much appreciated, thanks!
Use simple numbers:
first='2'
second='5'
third='1'
fourth='3'
fifth='4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' \
'{print $a, $b, $c, $d, $e}' somefile.csv
Another way with a shorter example:
aa='$2'
bb='$1'
cc='$3'
awk -F '^|,|$' "{print $aa,$bb,$cc}" somefile.csv
You already got the answer to your specific question but have you considered just specifying the order as a string instead of each individual field? For example:
order="2 5 1 3 4"
awk -v order="$order" '
BEGIN{ FS=OFS=","; n=split(order,a," ") }
{ for (i=1;i<n;i++) printf "%s%s",$(a[i]),OFS; print $(a[i]) }
' somefile.csv
That way if you want to add/delete fields or change the order you just trivially rearrange the numbers in the first line instead of having to mess with a bunch of hard-coded variables, etc.
Note that I changed your FS as there was no need for it to be that complicated. Also, you don't need the shell variable, "order",you could just populate the awk variable of the same name explicitly, I just started with the shell variable since you had started with shell variables so maybe you have a reason.

Resources