Floating point results in Bash integer division - bash

I have a backup script on my server which does cron jobs of backups, and sends me a summary of files backed up, including the size of the new backup file. As part of the script, I'd like to divide the final size of the file by (1024^3) to get the file size in GB, from the file size in bytes.
Since bash does not have floating point calculation, I am trying to use pipes to bc to get the result, however I'm getting stumped on basic examples.
I tried to get the value of Pi to a scale, however,
even though the following works:
~ #bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
4/3
1.33333333333333333333
22/7
3.14285714285714285714
q
0
quit
A non interactive version does not work:
#echo $(( 22/7 )) | bc
3
This works:
#echo '22/7' | bc -l
3.14285714285714285714
But I need to use variables. So it doesnt help that the following does not work:
#a=22 ; b=7
#echo $(( a/b )) | bc -l
3
I'm obviously missing something in the syntax for using variables in Bash, and could use with some 'pointers' on what I've misunderstood.
As DigitalRoss said, I can use the following:
#echo $a / $b | bc -l
3.14285714285714285714
However I cant use complex expressions like:
#echo $a / (( $b-34 )) | bc -l
-bash: syntax error near unexpected token `('
#echo $a / (( b-34 )) | bc -l
-bash: syntax error near unexpected token `('
#echo $a / (( b-34 )) | bc -l
-bash: syntax error near unexpected token `('
Can someone give me a working correct syntax for getting floating point results with complicated arithmetic expresssions?

Just double-quote (") the expression:
echo "$a / ( $b - 34 )" | bc -l
Then bash will expand the $ variables and ignore everything else and bc will see an expression with parentheses:
$ a=22
$ b=7
$ echo "$a / ( $b - 34 )"
22 / ( 7 - 34 )
$ echo "$a / ( $b - 34 )" | bc -l
-.81481481481481481481

Please note that your echo $(( 22/7 )) | bc -l actually makes bash calculate 22/7 and then send the result to bc. The integer output is therefore not the result of bc, but simply the input given to bc.
Try echo $(( 22/7 )) without piping it to bc, and you'll see.

scale variable determines number of digits after decimal separator
$ bc
$ scale=2
$ 3/4
$ .75

I would prefer awk over bc, it is does the same thing in one command and also gives you more flexibilty to add variables and format your output by using printf:
# Define vars in the command
awk -v a=3 -v b=2 'BEGIN{print a/b}'
1.5
# Define vars earlier and init with them awk vars
c=3
d=2
awk -v a=$c -v b=$d 'BEGIN{print a/b}'
1.5
# Use vars that are defined in script
a=3
b=2
awk 'BEGIN{print '$a'/'$b'}'
# Format your output using C printf syntax
awk -v a=3 -v b=2 'BEGIN{printf("%.3f\n", a/b)}'
1.500
Also bc does not return a code error if it divides by zero, so you can't check the error:
echo 3/0 | bc -l
Runtime error (func=(main), adr=5): Divide by zero
# The error code is zero, that means there is no errors
echo $?
0
While awk does return a code error 2:
awk -v a=3 -v b=0 'BEGIN{print a/b}'
awk: cmd. line:1: fatal: division by zero attempted
# awk returned code error 2, that indicates that something went wrong
echo $?
2
The code error can be used to check for division by zero like:
# Set your own vars
if output=$(awk -v a=3 -v b=0 'BEGIN{print a/b}' 2> /dev/null); then
echo "$output"
else
echo "error, division by zero"
fi

u can handle the div-zero error checking directly at awk :
for a in 19 29 31; do
for b in 11 3 0; do
gawk -v PREC=512 -Mbe '$++NF= +(_=$NF) ? $(!!_)/_ : "div_by_zero"' \
\
CONVFMT='%.59g' OFS=' \t| ' <<< "${a} ${b}"; done; done
19 | 11 | 1.7272727272727272727272727272727272727272727272727272727273
19 | 3 | 6.3333333333333333333333333333333333333333333333333333333333
19 | 0 | div_by_zero
29 | 11 | 2.6363636363636363636363636363636363636363636363636363636364
29 | 3 | 9.6666666666666666666666666666666666666666666666666666666667
29 | 0 | div_by_zero
31 | 11 | 2.8181818181818181818181818181818181818181818181818181818182
31 | 3 | 10.333333333333333333333333333333333333333333333333333333333
31 | 0 | div_by_zero
if u don't need all that GMP precision, then mawk is willing to directly return an infinity instead of a fatal error message :
for a in 19 29 31; do for b in 11 3 0; do
mawk '$++NF=$++_/$(_+_--)' CONVFMT='%.19g' OFS='\t' <<<"$a $b";done;done
19 11 1.727272727272727293
19 3 6.333333333333333037
19 0 inf
29 11 2.636363636363636243
29 3 9.666666666666666075
29 0 inf
31 11 2.818181818181818343
31 3 10.33333333333333393
31 0 inf
or better yet, do it from one single call to awk instead of calling it nonstop :
for a in 19 29 31; do for b in 11 3 0; do
echo "${a} ${b}"
done; done | mawk '$++NF = $(++_) / $(_+_--)' CONVFMT='%.19g' OFS='\t'
19 11 1.727272727272727293
19 3 6.333333333333333037
19 0 inf
29 11 2.636363636363636243
29 3 9.666666666666666075
29 0 inf
31 11 2.818181818181818343
31 3 10.33333333333333393
31 0 inf
Or if you so prefer, have mawk call gawk-gmp indirectly :
echo "22 7\n22 4\n22 0" |
mawk '$++NF = substr(_=__="", (__="gawk -v PREC=65536 -Mbe"\
" \47BEGIN { printf(\"%.127f\","(+(_=$(NF-!_))\
? "("($!__)")/("(_)")" : (+(_=$!__)<-_?__:"-") \
"log(_<_)")") } \47" ) | getline _, close(__))_'
22 7 3.1428571428571428571428571428571428571428571428571428………
22 4 5.5000000000000000000000000000000000000000000000000000………
22 0 +inf

Related

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Dividing one file into separate based on line numbers

I have the following test file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:
file 1:
1
2
3
4
5
file2:
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20
That would make 4 files with 5 lines and 1 file with 4 lines.
As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk command in the if statement, instead of printing the first 5 lines, it prints the whole 20?
d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
awk 'NR<=$start' test
fi
done
Thanks!
Why not just use the split util available from your POSIX toolkit. It has an option to split on number of lines which you can give it as 5
split -l 5 input-file
From the man split page,
-l, --lines=NUMBER
put NUMBER lines/records per output file
Note that, -l is POSIX compliant also.
$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1 file2 file3 file4 file5
.
$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20
If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
oh. and wrt why your awk command awk 'NR<=$start' test didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart with the value of a shell variable named start and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test. The awk variable can also be named start or anything else sensible - it is completely unrelated to the name of the shell variable.
You could improve your code by removing the unneccesary echo cut and bc and do it like this
#!/bin/bash
for i in $(seq $(wc -l < test) ); do
(( i % 4 != 1 )) && continue
tail +$i test | head -5 > "file$(( 1+i/4 ))"
done
But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk will give you that (and a lot of other things) for free.
Use split:
$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20
Or, if you have a file:
$ split -l test_file

Setting Bash variable to last number in output

I have bash running a command from another program (AFNI). The command outputs two numbers, like this:
70.0 13.670712
I need to make a bash variable that will be whatever the last # is (in this case 13.670712). I've figured out how to make it print only the last number, but I'm having trouble setting it to be a variable. What is the best way to do this?
Here is the code that prints only 13.670712:
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')"; echo "${test}" | awk '{print $2}'
Just pipe(|) the command output to awk. Here in your example, awk reads from stdout of your previous command and prints the 2nd column de-limited by the default single white-space character.
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]' | awk '{print $2}')"
printf "%s\n" "$test"
13.670712
(or) using echo
echo "$test"
13.670712
This is the simplest of the ways to do this, if you are looking for other ways to do this in bash-ism, use read command as using process-substitution
read _ va2 < <(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')
printf "%s\n" "$val2"
13.670712
Another more portable version using set, which will work irrespective of the shell available.
set -- $(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]');
printf "%s\n" "$2"
13.670712
You can use cut to print to print the second column:
$ echo "70.0 13.670712" | cut -d ' ' -f2
13.670712
And assign that to a variable with command substitution:
$ sc="$(echo '70.0 13.670712' | cut -d ' ' -f2)"
$ echo "$sc"
13.670712
Just replace echo '70.0 13.670712' with the command that is actually producing the two numbers.
If you want to grab the last value of some delimited field (or delimited output from a command), you can use parameter expansion. This is completely internal to Bash:
$ echo "$s"
$ echo ${s##*' '}
10
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ echo ${s2##*' '}
20
And then just assign directly:
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ lf=${s2##*' '}
$ echo "$lf"
20

Addition in awk failing

I am using following code snippet where I export the shell variables in awk as follows:
half_buffer1=$((start_buffer/2))
half_buffer2=$((end_buffer/2))
echo $line | awk -v left="$half_buffer1" -v right="$half_buffer2" 'BEGIN {print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
However for the variable 'right' in awk at times the $3 variable is being subtracted from instead of adding the 'right' variable to $3.
Observe that the following provides the "wrong" answers:
$ echo 1 2 3 4 5 | awk -v left=10 -v right=20 'BEGIN {print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
-10 20
To get the right answers, remove BEGIN:
$ echo 1 2 3 4 5 | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
1 -8 23 4 5
The problem is that the BEGIN block is executed before any input is read. Consequently, the variables $1, $2, etc., do not yet have useful values.
If BEGIN is removed, the code is executed on each line read. This gives you the answers that you want.
Examples
Using real input lines from the comments:
$ echo ID1 14389398 14389507 109 + ABC 608 831 | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
ID1 14389388 14389527 109 + ABC 608 831
$ echo ID1 14390340 14390409 69 + ABC 831 32 – | awk -v left=10 -v right=20 '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
ID1 14390330 14390429 69 + ABC 831 32
Also, this shell script:
start_buffer=10
end_buffer=100
half_buffer1=$((start_buffer/2))
half_buffer2=$((end_buffer/2))
echo ID1 14390340 14390409 69 + ABC 831 32 – | awk -v left="$half_buffer1" -v right="$half_buffer2" '{print $1"\t"$2-left"\t"$3+right"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8}'
produces this output:
ID1 14390335 14390459 69 + ABC 831 32

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh
#!/bin/sh
LOG=/var/log/syslog-ng/svrs/sec2tes1
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done
data_tripwire.sh
91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3
I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....
The problem now is that it reads all the 42 numbers without breaking out of the loop.
This is the output of the table
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
The problem now is that it read the total 42 numbers from 85...3
I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below.
After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.
This is my standard deviation calculation
count=0 # Number of data points; global.
SC=3 # Scale to be used by bc. three decimal places.
E_DATAFILE=90 # Data file error
## ----------------- Set data file ---------------------
if [ ! -z "$1" ] # Specify filename as cmd-line arg?
then
datafile="$1" # ASCII text file,
else #+ one (numerical) data point per line!
datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi # See example data file, below.
if [ ! -e "$datafile" ]
then
echo "\""$datafile"\" does not exist!"
exit $E_DATAFILE
fi
Calculate the mean
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <"$datafile" # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <"$datafile" # Rewinds data file.
Showing the output
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo
I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.
This is the output of the table i want.
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
*Standard
deviation
(7mths) 31.172 35.559 5.248 8.935 5.799 8.580
* Mean
(7mths) 54.428 94.285 11.142 9.142 20.285 14.714
paste - - - - - - < data_tripwire.sh | while read -a values; do
# values is an array with 6 values
# ${values[0]} .. ${values[5]}
arith_mean "${values[#]}"
done
This means you have to rewrite your function so they don't use read: change
while read value
to
for value in "$#"
#Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :
arith_mean () {
local sum=$(IFS=+; echo "$*")
echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
local mean=$1
shift
local sum2=0
for i in "$#"; do
sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
done
echo "scale=$SC; sqrt($sum2/$#)"|bc
}
paste - - - - - - < data_tripwire1.sh | while read -a values; do
mean=$(arith_mean "${values[#]}")
sd=$(sd $mean "${values[#]}")
echo "${values[#]} $mean $sd"
done | column -t
91 58 54 108 52 18 63.500 29.038
8 81 103 110 129 137 94.666 42.765
84 15 14 18 11 17 26.500 25.811
12 6 1 28 6 14 11.166 8.648
8 8 0 0 28 24 11.333 10.934
25 23 21 13 9 4 15.833 7.711
18 17 18 30 13 3 16.500 7.973
Note you don't need to return a fancy value from the functions: you know how many points you pass in.
Based on Glenn's answer I propose this which needs very little changes to the original:
paste - - - - - - < data_tripwire.sh | while read -a values
do
for value in "${values[#]}"
do
echo "$value"
done | arith_mean
for value in "${values[#]}"
do
echo "$value"
done | sd
done
You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.
Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.
The functions will now be able to only read 6 items in datafile.
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Rewinds data file.
From main you will need to set your blocks to read.
for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
done
Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.
The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.
cat <(awk -F: '{print $1}' /etc/passwd) works.
cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

Resources