shell: write integer division result to a variable and print floating number - shell

I'm trying to write a shell script and plan to calculate a simple division using two variables inside the script. I couldn't get it to work. It's some kind of syntax error.
Here is part of my code, named test.sh
awk '{a+=$5} END {print a}' $variable1 > casenum
awk '{a+=$5} END {print a}' $variable2 > controlnum
score=$(echo "scale=4; $casenum/$controlnum" | bc)
printf "%s\t%s\t%.4f\n", $variable3 $variable4 $score
It's just the $score that doesn't work.
I tried to use either
sh test.sh
or
bash test.sh
but neither worked. The error message is:
(standard_in) 1: syntax error
Does anyone know how to make it work? Thanks so much!

You are outputting to files, not to vars. For this, you need var=$(command). Hence, this should make it:
casenum=$(awk '{a+=$5} END {print a}' $variable1)
controlnum=$(awk '{a+=$5} END {print a}' $variable2)
score=$(echo "scale=4; $casenum/$controlnum" | bc)
printf "%s\t%s\t%.4f\n", $variable3 $variable4 $score
Note $variable1 and $variable2 should be file names. Otherwise, indicate it.

First your $variable1 and $variable2 must expand to a name of an existing file; but that's not a syntax error, it's just a fact that makes your code wrong, unless you mean really to cope with files containing numbers and accumulating the sum of the fifth field into a file. Since casenum and controlnum are not assigned (in fact you write the awk result to a file, not into a variable), your score computation expands to
score=$(echo "scale=4; /" | bc)
which is wrong (Syntax error comes from this).
Then, the same problem with $variable3 and $variable4. Are they holding a value? Have you assigned them with something like
variable=...
? Otherwise they will expand as "". Fixing these (including assigning casenum and controlnum), will fix everything, since basically the only syntax error is when bc tries to interpret the command / without operands. (And the comma after the printf is not needed).
The way you assign the output of execution of a command to a variable is
var=$(command)
or
var=`command`

If I understand your commands properly, you could combine calculation of score with a single awk statement as follows
score=$(awk 'NR==FNR {a+=$5; next} {b+=$5} END {printf "%.4f", a/b}' $variable1 $variable2)
This is with assumption that $variable1 and $variable2 are valid file names
Refer to #fedorqui's solution if you want to stick to your approach of 2 awk and 1 bc.

Related

Unable to use associative array value in sed or awk

I am trying to iteratively search and replace strings in a file using a variable input and replacement string. I have tried using sed and awk and have seemed to determine that it is actually the associative array value that is giving me issues(?).
I am looking at an associative array like this:
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
and for ease I was declaring my replacement vars first:
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
#replaceWithValue="blah"
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' infile.txt > outfile.txt
#sed 's/'"$findValue"'/'"$replaceWithValue"'/g' infile.txt > outfile.txt
#sed "s/$findValue/$replaceWithValue/g" $scriptDir/$currentFileName > outfile.txt
done
The commented out lines are alternate versions of what I have tried with similar inbetween versions.
I have tried using just a normal string (the commented out "blah") and that works fine.
The weirdest part is that the echo statement displays the right value for both key and value.
I have tried so many combinations I am losing my mind. Please someone tell me I am doing something dumb here.
NOTE: This is nested inside another loop but I do not believe this to be an issue, let me know if I am wrong
EDIT: I have simplified the in and out files, and to clarify, if i try to use my associative array value, nothing gets replaced. But if i use a dummy string like "blah" it works.
BONUS: I have marked the answer below, but my search and replace values start and end in double quotes but no matter what I try it replaces all instances of 60. How can i make it replace "60" with "xsixtyx"?
Thanks
I think you want to use >> instead of > inside your loop?
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
I tried to run your code it works as expected except that >.
Or if you just want to see the replaced results
awk -v srch="$findValue" -v repl="$replaceWithValue" '{ if (gsub(srch,repl)) print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
For a file with
30
20
60
the output looks like
xthirtyx
xhundredx
xfiftyx
For the second case.
Here is the full bash script I tried
#!/bin/bash
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{if (gsub(srch,repl)) print}' test.txt >> /tmp/test.txt
done

awk output is acting weird

cat TEXT | awk -v var=$i -v varB=$j '$1~var , $1~varB {print $1}' > PROBLEM HERE
I am passing two variables from an array to parse a very large text file by range. And it works, kind of.
if I use ">" the output to the file will ONLY be the last three lines as verified by cat and a text editor.
if I use ">>" the output to the file will include one complete read of TEXT and then it will divide the second read into the ranges I want.
if I let the output go through to the shell I get the same problem as above.
Question:
It appears awk is reading every line and printing it. Then it goes back and selects the ranges from the TEXT file. It does not do this if I use constants in the range pattern search.
I undestand awk must read all lines to find the ranges I request.
why is it printing the entire document?
How can I get it to ONLY print the ranges selected?
This is the last hurdle in a big project and I am beating my head against the table.
Thanks!
give this a try, you didn't assign varB in right way:
yours: awk -v var="$i" -varB="$j" ...
mine : awk -v var="$i" -v varB="$j" ...
^^
Aside from the typo, you can't use variables in //, instead you have to specify with regular ~ match. Also quote your shell variables (here is not needed obviously, but to set an example). For example
seq 1 10 | awk -v b="3" -v e="5" '$0 ~ b, $0 ~ e'
should print 3..5 as expected
It sounds like this is what you want:
awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
e.g.
$ cat file
1
2
foo
3
4
bar
5
foo
6
bar
7
$ awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
foo
3
4
bar
foo
6
bar
but without sample input and expected output it's just a guess and this would not address the SHELL behavior you are seeing wrt use of > vs >>.
Here's what happened. I used an array to input into my variables. I set the counter for what I thought was the total length of the array. When the final iteration of the array was reached, there was a null value returned to awk for the variable. This caused it to print EVERYTHING. Once I correctly had a counter with the correct number of array elements the printing oddity ended.
As far as the > vs >> goes, I don't know. It did stop, but I wasn't as careful in documenting it. I think what happened is that I used $1 in the print command to save time, and with each line it printed at the end it erased the whole file and left the last three identical matches. Something to ponder. Thanks Ed for the honest work. And no thank you to Robo responses.

Printing lines which have a field number greater than, in AWK

I am writing a script in bash which takes a parameter and storing it;
threshold = $1
I then have sample data that looks something like:
5 blargh
6 tree
2 dog
1 fox
9 fridge
I wish to print only the lines which have their number greater than the number which is entered as the parameter (threshold).
I am currently using:
awk '{print $1 > $threshold}' ./file
But nothing prints out, help would be appreciated.
You're close, but it needs to be more like this:
$ threshold=3
$ awk -v threshold="$threshold" '$1 > threshold' file
Creating a variable with -v avoids the ugliness of trying to expand shell variables within an awk script.
EDIT:
There are a few problems with the current code you've shown. The first is that your awk script is single quoted (good), which stops $threshold from expanding, and so the value is never inserted in your script. Second, your condition belongs outside the curly braces, which would make it:
$1 > threshold { print }
This works, but the `print is not necessary (it's the default action), which is why I shortened it to
$1 > threshold

Different output for pipe in script vs. command line

I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for loop. Now all your percentages are in $ARRAY. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[#]} then you will know where your loop is terminating early.
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc

Meaning of this shell script line with awk

I read this line of script in book [linux device drivers]. What does it do?
major=$(awk "\\$2= =\"$module\" {print \\$1}" /proc/devices)
as in context:
#!/bin/sh
module="scull"
device="scull"
mode="664"
# invoke insmod with all arguments we got
# and use a pathname, as newer modutils don't look in . by default
/sbin/insmod ./$module.ko $* || exit 1
# remove stale nodes
rm -f /dev/${device}[0-3]
major=$(awk "\\$2= =\"$module\" {print \\$1}" /proc/devices)
mknod /dev/${device}0 c $major 0
....
A better way to write this would be :
major=$(awk -v mod=$module '$2==mod{print $1}' /proc/devices)
I read this too but that line was not working for me. I had to modify it to
major=$(awk "\$2 == \"$module\" {print \$1}" /proc/devices)
The first part \$2 == \"$module\" is the pattern. When this is satisfied, that is, the second column is equal to "scull", the command print \$1 is executed which prints the first column. This value is stored in the variable major.
The $ needs to be escaped as they need to be passed as it is to awk.
/proc/devices contains the currently configured character and block devices for each module.
Expanding a few variables in your context, and fixing the syntax error in the equality, the command looks like this:
awk '$2=="scull" {print $1}' /proc/devices
This means "if the value of the second column is scull, then output the first column."
This command is run in a subshell — $(...) — and the output is assigned to the variable $major.
The explanation of the purpose is in the book:
The script to load a module that has been assigned a dynamic number can, therefore, be written using a tool such as awk to retrieve information from /proc/devices in order to create the files in /dev.
Note that in the distributed examples, the line in scull_load matches Vivek's correction.

Resources