Summing row in bash [duplicate] - bash

I am trying to read a file line by line and find the average of the numbers in each line. I am getting the error: expr: non-numeric argument
I have narrowed the problem down to sum=expr $sum + $i, but I'm not sure why the code doesn't work.
while read -a rows
do
for i in "${rows[#]}"
do
sum=`expr $sum + $i`
total=`expr $total + 1`
done
average=`expr $sum / $total`
done < $fileName
The file looks like this (the numbers are separated by tabs):
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7

With some minor corrections, your code runs well:
while read -a rows
do
total=0
sum=0
for i in "${rows[#]}"
do
sum=`expr $sum + $i`
total=`expr $total + 1`
done
average=`expr $sum / $total`
echo $average
done <filename
With the sample input file, the output produced is:
1
5
7
5
2
5
Note that the answers are what they are because expr only does integer arithmetic.
Using sed to preprocess for expr
The above code could be rewritten as:
$ while read row; do expr '(' $(sed 's/ */ + /g' <<<"$row") ')' / $(wc -w<<<$row); done < filename
1
5
7
5
2
5
Using bash's builtin arithmetic capability
expr is archaic. In modern bash:
while read -a rows
do
total=0
sum=0
for i in "${rows[#]}"
do
((sum += $i))
((total++))
done
echo $((sum/total))
done <filename
Using awk for floating point math
Because awk does floating point math, it can provide more accurate results:
$ awk '{s=0; for (i=1;i<=NF;i++)s+=$i; print s/NF;}' filename
1
5.2
7.4
5.4
2.8
5.6

Some variations on the same trick of using the IFS variable.
#!/bin/bash
while read line; do
set -- $line
echo $(( ( $(IFS=+; echo "$*") ) / $# ))
done < rows
echo
while read -a line; do
echo $(( ( $(IFS=+; echo "${line[*]}") ) / ${#line[*]} ))
done < rows
echo
saved_ifs="$IFS"
while read -a line; do
IFS=+
echo $(( ( ${line[*]} ) / ${#line[*]} ))
IFS="$saved_ifs"
done < rows

Others have already pointed out that expr is integer-only, and recommended writing your script in awk instead of shell.
Your system may have a number of tools on it that support arbitrary-precision math, or floats. Two common calculators in shell are bc which follows standard "order of operations", and dc which uses "reverse polish notation".
Either one of these can easily be fed your data such that per-line averages can be produced. For example, using bc:
#!/bin/sh
while read line; do
set - ${line}
c=$#
string=""
for n in $*; do
string+="${string:++}$1"
shift
done
average=$(printf 'scale=4\n(%s) / %d\n' $string $c | bc)
printf "%s // avg=%s\n" "$line" "$average"
done
Of course, the only bc-specific part of this is the format for the notation and the bc itself in the third last line. The same basic thing using dc might look like like this:
#!/bin/sh
while read line; do
set - ${line}
c=$#
string="0"
for n in $*; do
string+=" $1 + "
shift
done
average=$(dc -e "4k $string $c / p")
printf "%s // %s\n" "$line" "$average"
done
Note that my shell supports appending to strings with +=. If yours does not, you can adjust this as you see fit.
In both of these examples, we're printing our output to four decimal places -- with scale=4 in bc, or 4k in dc. We are processing standard input, so if you named these scripts "calc", you might run them with command lines like:
$ ./calc < inputfile.txt
The set command at the beginning of the loop turns the $line variable into positional parameters, like $1, $2, etc. We then process each positional parameter in the for loop, appending everything to a string which will later get fed to the calculator.
Also, you can fake it.
That is, while bash doesn't support floating point numbers, it DOES support multiplication and string manipulation. The following uses NO external tools, yet appears to present decimal averages of your input.
#!/bin/bash
declare -i total
while read line; do
set - ${line}
c=$#
total=0
for n in $*; do
total+="$1"
shift
done
# Move the decimal point over prior to our division...
average=$(($total * 1000 / $c))
# Re-insert the decimal point via string manipulation
average="${average:0:$((${#average} - 3))}.${average:$((${#average} - 3))}"
printf "%s // %0.3f\n" "$line" "$average"
done
The important bits here are:
* declare which tells bash to add to $total with += rather than appending it as if it were a string,
* the two average= assignments, the first of which multiplies $total by 1000, and the second of which splits the result at the thousands column, and
* printf whose format enforces three decimal places of precision in its output.
Of course, input still needs to be integers.
YMMV. I'm not saying this is how you should solve this, just that it's an option. :)

This is a pretty old post, but came up at the top my Google search, so thought I'd share what I came up with:
while read line; do
# Convert each line to an array
ARR=( $line )
# Append each value in the array with a '+' and calculate the sum
# (this causes the last value to have a trailing '+', so it is added to '0')
ARR_SUM=$( echo "${ARR[#]/%/+} 0" | bc -l)
# Divide the sum by the total number of elements in the array
echo "$(( ${ARR_SUM} / ${#ARR[#]} ))"
done < "$filename"

Related

basic calculations in bash [duplicate]

I have this Bash script and I had a problem in line 16.
How can I take the previous result of line 15 and add
it to the variable in line 16?
#!/bin/bash
num=0
metab=0
for ((i=1; i<=2; i++)); do
for j in `ls output-$i-*`; do
echo "$j"
metab=$(cat $j|grep EndBuffer|awk '{sum+=$2} END { print sum/120}') (line15)
num= $num + $metab (line16)
done
echo "$num"
done
For integers:
Use arithmetic expansion: $((EXPR))
num=$((num1 + num2))
num=$(($num1 + $num2)) # Also works
num=$((num1 + 2 + 3)) # ...
num=$[num1+num2] # Old, deprecated arithmetic expression syntax
Using the external expr utility. Note that this is only needed for really old systems.
num=`expr $num1 + $num2` # Whitespace for expr is important
For floating point:
Bash doesn't directly support this, but there are a couple of external tools you can use:
num=$(awk "BEGIN {print $num1+$num2; exit}")
num=$(python -c "print $num1+$num2")
num=$(perl -e "print $num1+$num2")
num=$(echo $num1 + $num2 | bc) # Whitespace for echo is important
You can also use scientific notation (for example, 2.5e+2).
Common pitfalls:
When setting a variable, you cannot have whitespace on either side of =, otherwise it will force the shell to interpret the first word as the name of the application to run (for example, num= or num)
num= 1 num =2
bc and expr expect each number and operator as a separate argument, so whitespace is important. They cannot process arguments like 3+ +4.
num=`expr $num1+ $num2`
Use the $(( )) arithmetic expansion.
num=$(( $num + $metab ))
See Chapter 13. Arithmetic Expansion for more information.
There are a thousand and one ways to do it. Here's one using dc (a reverse Polish desk calculator which supports unlimited precision arithmetic):
dc <<<"$num1 $num2 + p"
But if that's too bash-y for you (or portability matters) you could say
echo $num1 $num2 + p | dc
But maybe you're one of those people who thinks RPN is icky and weird; don't worry! bc is here for you:
bc <<< "$num1 + $num2"
echo $num1 + $num2 | bc
That said, there are some unrelated improvements you could be making to your script:
#!/bin/bash
num=0
metab=0
for ((i=1; i<=2; i++)); do
for j in output-$i-* ; do # 'for' can glob directly, no need to ls
echo "$j"
# 'grep' can read files, no need to use 'cat'
metab=$(grep EndBuffer "$j" | awk '{sum+=$2} END { print sum/120}')
num=$(( $num + $metab ))
done
echo "$num"
done
As described in Bash FAQ 022, Bash does not natively support floating point numbers. If you need to sum floating point numbers the use of an external tool (like bc or dc) is required.
In this case the solution would be
num=$(dc <<<"$num $metab + p")
To add accumulate possibly-floating-point numbers into num.
In Bash,
num=5
x=6
(( num += x ))
echo $num # ==> 11
Note that Bash can only handle integer arithmetic, so if your AWK command returns a fraction, then you'll want to redesign: here's your code rewritten a bit to do all math in AWK.
num=0
for ((i=1; i<=2; i++)); do
for j in output-$i-*; do
echo "$j"
num=$(
awk -v n="$num" '
/EndBuffer/ {sum += $2}
END {print n + (sum/120)}
' "$j"
)
done
echo "$num"
done
I always forget the syntax so I come to Google Search, but then I never find the one I'm familiar with :P. This is the cleanest to me and more true to what I'd expect in other languages.
i=0
((i++))
echo $i;
I really like this method as well. There is less clutter:
count=$[count+1]
#!/bin/bash
read X
read Y
echo "$(($X+$Y))"
You should declare metab as integer and then use arithmetic evaluation
declare -i metab num
...
num+=metab
...
For more information, see 6.5 Shell Arithmetic.
Use the shell built-in let. It is similar to (( expr )):
A=1
B=1
let "C = $A + $B"
echo $C # C == 2
Source: Bash let builtin command
Another portable POSIX compliant way to do in Bash, which can be defined as a function in .bashrc for all the arithmetic operators of convenience.
addNumbers () {
local IFS='+'
printf "%s\n" "$(( $* ))"
}
and just call it in command-line as,
addNumbers 1 2 3 4 5 100
115
The idea is to use the Input-Field-Separator(IFS), a special variable in Bash used for word splitting after expansion and to split lines into words. The function changes the value locally to use word-splitting character as the sum operator +.
Remember the IFS is changed locally and does not take effect on the default IFS behaviour outside the function scope. An excerpt from the man bash page,
The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words on these characters. If IFS is unset, or its value is exactly , the default, then sequences of , , and at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words.
The "$(( $* ))" represents the list of arguments passed to be split by + and later the sum value is output using the printf function. The function can be extended to add scope for other arithmetic operations also.
#!/usr/bin/bash
#integer numbers
#===============#
num1=30
num2=5
echo $(( num1 + num2 ))
echo $(( num1-num2 ))
echo $(( num1*num2 ))
echo $(( num1/num2 ))
echo $(( num1%num2 ))
read -p "Enter first number : " a
read -p "Enter second number : " b
# we can store the result
result=$(( a+b ))
echo sum of $a \& $b is $result # \ is used to espace &
#decimal numbers
#bash only support integers so we have to delegate to a tool such as bc
#==============#
num2=3.4
num1=534.3
echo $num1+$num2 | bc
echo $num1-$num2 | bc
echo $num1*$num2 |bc
echo "scale=20;$num1/$num2" | bc
echo $num1%$num2 | bc
# we can store the result
#result=$( ( echo $num1+$num2 ) | bc )
result=$( echo $num1+$num2 | bc )
echo result is $result
##Bonus##
#Calling built in methods of bc
num=27
echo "scale=2;sqrt($num)" | bc -l # bc provides support for calculating square root
echo "scale=2;$num^3" | bc -l # calculate power
#!/bin/bash
num=0
metab=0
for ((i=1; i<=2; i++)); do
for j in `ls output-$i-*`; do
echo "$j"
metab=$(cat $j|grep EndBuffer|awk '{sum+=$2} END { print sum/120}') (line15)
let num=num+metab (line 16)
done
echo "$num"
done
Works on MacOS. bc is a command line calculator
#!/bin/bash
sum=0
for (( i=1; i<=5; i++ )); do
sum=$(echo "$sum + 1.1" | bc) # bc: if you want to use decimal
done
echo "Total: $sum"

Average operation using numbers with decimals

I want to get average of numbers with decimals. I wrote this but it's getting me the following error message when i write decimals numbers:
./average.sh: line 10: 1.2: syntax error: invalid arithmetic operator (error token is ".2")
my average operation is:
i=1;
sum=0;
while [[ i -lt 4 ]]
do
read nr
echo "scale=2; $nr" | bc -l
sum=$((sum+nr))
i=$((i+1))
done
echo "scale=2; $sum / 4" | bc -l
How can i modify it in order to accept input with decimals? Thanks.
You can use bc as you do below:
sum=$( echo $sum + $nr | bc -l )
Your code runs fine on my system (bash 4.4.20), but you should modify line 1 of your script to i=0 if you want to read 4 numbers. It would help if you also provide your input along with your script.
However, a more easier way would be using datamash by:
datamash mean 1 # Calculate the mean of the first field
Edit
If you can't use datamash, awk would be good for you:
awk '{sum += $0} END {print sum / NR}'
As dibery says, you need to modify to i=0 if you want to calculate average for 4 numbers.
For reading input with decimals,
read -p 'Enter: ' var
nr="$( printf '%.2f' "$var" )"
not very elegant but could work...
For summing, I think you can't sum directly like integers in bash, so maybe try this
sum="$( echo "$sum+$nr" | bc)"
For printing the output, this is the format I prefer to use
bc -l <<< "$sum/4"
My server doesn't accept "datamash" or "bc" utilitaries and i want to sum numbers with decimals from keyboard, not from file.
I tried some examples, but its don't work:
i=1; sum=0; while [[ i -le 3 ]]; do read nr; awk '{sum+=$nr} END { print "media:"; sum/i}'; i=$((i+1));done

bash getting numbers from a file and make the average of them

Write a script that expects a file as its first argument. Some lines of the
file will consist of integers 0 - 1000.
The script should select the lines matching the previous criteria and print out their average to stdout (average of n integers is their sum divided by n).
And the file given looks like this:
22
78907
77 88 99 0000
need 11 gallons of water
0
roses are red
11
Example output:
11
Explanation: (22 + 11 + 0) / 3 = 11
I have tried already with this code:
#!/bin/bash
sum=0
ind=0
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ ^[a-zA-Z\ ]+$ ]]
then
${sum}=${sum}+${#line}
${ind}=${ind}+1
echo ${sum}
fi
done < "$1"
value=${sum}/${ind}
echo ${value}
the print of this code is always 0/0 and some errors like:
./test1: line 9: 0=0+13: command not found
./test1: line 10: 0=0+1: command not found
Any ideas?
Part of the issue with your script is answered here.. Your variable assignments are incorrect. You only use the $ to refer to a variable that has already been assigned. The assignment process drops the dollar sign.
The other issue you're having is that your arithmetic is not being expressed within an arithmetic expression.
Note that you can use use arithmetic expansion to handle your variables:
if [[ $line =~ ^[a-zA-Z\ ]+$ ]]; then
(( sum += ${#line} ))
(( ind++ ))
printf '%s\n' "$sum"
fi
and later ...
value="$(( sum / ind ))"
printf '%s\n' "$value"
Beware that bash can only deal with integer math, floats are truncated. For more advanced math, consider using bc or dc (which are not built in to bash, they are separate tools that may need to be installed on your system) or another language like awk or perl which can do the same thing with better performance and more precise math.
That said, you can "fake" a couple of decimal places with a few extra lines of code and string manipulation, if you really need to:
$ sum=100; ind=7
$ printf -v x '%d' "$((${sum}00/${ind}))"
$ printf '%d.%d\n' "${x%??}" "${x:$((${#x}-2))}"
14.28
The first printf has division which multiplies the dividend by 100 (by adding two zeroes after it). The resultant quotient is then split with the second printf to insert the decimal point. This is a hack. Use tools that support real math.

Formatting Output From Shell Script [duplicate]

This question already has answers here:
Echo tab characters in bash script
(10 answers)
Closed 5 years ago.
I am working on a shell script that takes stdin or file as input and prints the averages and medians for rows or columns depending on the arguments.
When calculating the averages for the columns, the output needs to print out the following (tabbed):
My output currently looks like this (no spaces or tabs):
Averages:
92480654263
Medians:
6368974
Is there a way to echo out the averages and medians with tabs so each average and median set align left correctly? Here is a sample of how I am printing out the averages:
echo "Averages:"
while read i
do
sum=0
count=0
mean=0
#Cycle through the numbers in the rows
for num in $i
do
#Perform calculations necessary to determine the average and median
sum=$(($sum + $num))
count=`expr $count + 1`
mean=`expr $sum / $count`
done
echo -n "$mean"
done < $1
man echo:
-e enable interpretation of backslash escapes
If -e is in effect, the following sequences are recognized:
\t horizontal tab
I'd try echo -n -e "$mean\t", didn't test it though.
You should use printf. For instance, this will print a value followed by a tab
printf "%s\t" "$mean"
You can actually print several values separated by tabs if you want by adding arguments :
printf "%s\t" "$mean" "$count"
You can use an array expansion to print several values separated by tabs :
printf "%s\t" "${my_array[#]}"
Among advantages of printf over echo is the availability of flexible formatting strings, and the fact that implementations of printf vary less than those of echo among shells and operating systems.
You could try using column command but it does take additional steps:
echo "Averages:"
while read line
do
sum=0
count=0
mean=0
#Cycle through the numbers in the rows
for num in $line
do
#Perform calculations necessary to determine the average and median
(( sum += num ))
(( count++ ))
(( mean = sum / count ))
done
(( mean == 0 )) && out=$mean || out="$out|$mean"
done < $1
echo "$out" | column -s'|' -t
Above is untested as I do not have the original file, but you should get the idea. I would add that the division will also provide truncated values so not exactly accurate.

printing line numbers that are multiple of 5

Hi I am trying to print/echo line numbers that are multiple of 5. I am doing this in shell script. I am getting errors and unable to proceed. below is the script
#!/bin/bash
x=0
y=$wc -l $1
while [ $x -le $y ]
do
sed -n `$x`p $1
x=$(( $x + 5 ))
done
When executing above script i get below errors
#./echo5.sh sample.h
./echo5.sh: line 3: -l: command not found
./echo5.sh: line 4: [: 0: unary operator expected
Please help me with this issue.
For efficiency, you don't want to be invoking sed multiple times on your file just to select a particular line. You want to read through the file once, filtering out the lines you don't want.
#!/bin/bash
i=0
while IFS= read -r line; do
(( ++i % 5 == 0 )) && echo "$line"
done < "$1"
Demo:
$ i=0; while read line; do (( ++i % 5 == 0 )) && echo "$line"; done < <(seq 42)
5
10
15
20
25
30
35
40
A funny pure Bash possibility:
#!/bin/bash
mapfile ary < "$1"
printf "%.0s%.0s%.0s%.0s%s" "${ary[#]}"
This slurps the file into an array ary, which each line of the file in a field of the array. Then printf takes care of printing one every 5 lines: %.0s takes a field, but does nothing, and %s prints the field. Since mapfile is used without the -t option, the newlines are included in the array. Of course this really slurps the file into memory, so it might not be good for huge files. For large files you can use a callback with mapfile:
#!/bin/bash
callback() {
printf '%s' "$2"
ary=()
}
mapfile -c 5 -C callback ary < "$1"
We're removing all the elements of the array during the callback, so that the array doesn't grow too large, and the printing is done on the fly, as the file is read.
Another funny possibility, in the spirit of glenn jackmann's solution, yet without a counter (and still pure Bash):
#!/bin/bash
while read && read && read && read && IFS= read -r line; do
printf '%s\n' "$line"
done < "$1"
Use sed.
sed -n '0~5p' $1
This prints every fifth line in the file starting from 0
Also
y=$wc -l $1
wont work
y=$(wc -l < $1)
You need to create a subshell as bash will see the spaces as the end of the assignment, also if you just want the number its best to redirect the file into wc.
Dont know what you were trying to do with this ?
x=$(( $x + 5 ))
Guessing you were trying to use let, so id suggest looking up the syntax for that command. It would look more like
(( x = x + 5 ))
Hope this helps
There are cleaner ways to do it, but what you're looking for is this.
#!/bin/bash
x=5
y=`wc -l $1`
y=`echo $y | cut -f1 -d\ `
while [ "$y" -gt "$x" ]
do
sed -n "${x}p" "$1"
x=$(( $x + 5 ))
done
Initialize x to 5, since there is no "line zero" in your file $1.
Also, wc -l $1 will display the number of line counts, followed by the name of the file. Use cut to strip the file name out and keep just the first word.
In conditionals, a value of zero can be interpreted as "true" in Bash.
You should not have space between your $x and your p in your sed command. You can put them right next to each other using curly braces.
You can do this quite succinctly using awk:
awk 'NR % 5 == 0' "$1"
NR is the record number (line number in this case). Whenever it is a multiple of 5, the expression is true, so the line is printed.
You might also like the even shorter but slightly less readable:
awk '!(NR%5)' "$1"
which does the same thing.

Resources