How to replace a not existing value by 0? - bash

So I have a subject folder, located in /home/subject, which contains subjects such as :
Geography
Math
These subjects are files.
And each one of them contain, the name of a student with his mark.
So for example it will be,
For Geography
Mattew 15
Elena 14
And Math :
Matthew 10
Elena 19
I also have a student folder, located in /home/student, which is empty for now.
And the purpose of this folder is to put inside of it :
The name of the student as the name of the file;
The marks of all subjets this student received.
So here is my code :
rm /home/student/*
for subjectFile in /home/subject/*; do
awk -v subject="$(basename "$subjectFile")" '{ print subject, $2 >>"/home/student/" $1 }' "$subjectFile"
done
This loop iterates over all the subject files, inside of the subject folder.
The $subjectFile value is something like :
/home/subject/Math
/home/subject/Geograpy
/home/subject/(MySubject)
Etc.. => Depending on what subjects are available.
I then, get the basename of each of these subject files :
Math
Geography
(...)
And then print the result of the second column, the mark number, inside of my student name that I get through the first column of the subject file : so for this example, for the subject Geography, I'll get Matthew.
I also didn't want to simply infinite append the result, but overwrite the previous results each time I run this script, so I typed : rm /home/student/* , to erase any student files before to proceed to the append.
This works great.
But then I have a request,
How can I make it to replace by 0 the subject mark of a student, if this one is undefined?
A student that did not received any mark for a specific subject, while others did?
As for example :
For Geography
Mattew 15
And Math :
Matthew 10
Elena 19
So for Elena,
This shall create an Elena file, inside of the student folder with :
Geography 0
Math 19

Simple distribution using pure bash
Let's try: This will create a little tree in /tmp:
cd /tmp && tar -zxvf <(base64 -d <<eof
H4sIAFUFUFwAA+3XXU6EMBQF4D6ziu7A/l2uLsBHE7eAYwUMghlKnNm9EKfEEHUyBpg4nu+lJJDQ
5HBK226KpqmuxJJUj4mGUTOpz2MktHXG9RdEWiitrWUhadFZHXRtyLZSiidflbsfnjt2/49qP/Jv
u4dnvwnLfAcn5K9JuT5/w0zIfw2T/F+yUMz+jiHg1Lnv87c09j9l0+dvU3JCqtln8oV/nv9dFkLh
36RWyW3l60zqm+S+KCspNSfnnhwsbtJ/X+dV2c68BBzvfzr2nw33/XeWUvR/DWP/WcYFgOICcI0F
4OJN+p/7Jt9mr8V+znec8P8/7P8cK4v+r2HsP8X6u1h/Qv0vX+x/6B59ff7znyFNw/5fGZz/AQAA
AAAAAAAAAAB+7R1PsalnACgAAA==
eof
)
This will create (and print in terminal, because of -v flag in tar command):
school/
school/subject/
school/subject/math
school/subject/english
school/subject/geography
school/student/
Quick overview:
cd /tmp/school/subject/
grep . * | sort -t: -k2
will render:
geography:Elena 14
english:Elena 15
math:Elena 19
math:Matthew 10
geography:Matthew 15
english:Matthew 17
geography:Phil 15
math:Phil 17
english:Phil 18
Building student stat files:
cd /tmp/school
rm student/*
for file in subject/*;do
subj=${file##*/}
while read student note ;do
echo >>student/${student,,} ${subj^} $note
done < $file
done
Nota: This use ${VARNAME^} to upper first char and ${VARNAME,,} to lower all string. So filenames are lowercaps and subject become capitalized in student files.
Then now:
ls -l student
total 12
-rw-r--r-- 1 user user 32 jan 29 08:57 elena
-rw-r--r-- 1 user user 32 jan 29 08:57 matthew
-rw-r--r-- 1 user user 32 jan 29 08:57 phil
and
cat student/phil
English 18
Geography 15
Math 17
Then now: searching for missing notation
for file in student/*;do
for subj in subject/*;do
subj=${subj##*/}
grep -q ^${subj^}\ $file || echo ${subj^} 0 >> $file
done
done
This could be tested (This will randomely drop 0 or 1 mark in all files):
for file in subject/*;do
((val=1+(RANDOM%4)))
((val<4)) && sed ${val}d -i $file
done
Then run:
cd /tmp/school
rm student/*
for file in subject/*;do
subj=${file##*/}
while read student note ;do
echo >>student/${student,,} ${subj^} $note
done < $file
done
for file in student/*;do
for subj in subject/*;do
subj=${subj##*/}
grep -q ^${subj^}\ $file || echo ${subj^} 0 >> $file
done
done
Ok, now:
grep ' 0$' student/*
student/matthew:Geography 0
Nota: As I'v been used $RANDOM, result may differ in your tests;-)
Another aproach: two steps again but
First step: building student list, then student files imediately with 0 notation:
cd /tmp/school
rm student/*
declare -A students
for file in subject/* ;do
while read student mark ;do
[ "$student" ] && students[$student]=
done <$file
done
for file in subject/*;do
class=(${!students[#]})
while read student mark ;do
subj=${file##*/}
echo >> student/${student,,} ${subj^} $mark
class=(${class[#]/$student})
done <$file
for student in ${class[#]};do
echo >> student/${student,,} ${subj^} 0
done
done
Statistic tool
For fun, with a lot of bashisms and without file creation, there is a pretty dump tool:
#!/bin/bash
declare -A students
declare subjects=() sublen=0 stdlen=0
for file in subject/* ;do # read all subject files
subj=${file##*/}
subjects+=($subj) # Add subject to array
sublen=$(( ${#subj} > sublen ? ${#subj} : sublen )) # Max subject string len
declare -A mark_$subj # Create subject's associative array
while read student mark ;do
stdlen=$(( ${#student} > $stdlen ? ${#student} : stdlen ))
[ "$student" ] && { # Skip empty lines
((students[$student]++)) # Count student's marks
printf -v mark_$subj[$student] "%d" $mark # Store student's mark
}
done <$file
done
printf -v formatstr %${#subjects[#]}s; # prepare format string for all subjects
formatstr="%-${stdlen}s %2s ${formatstr// / %${sublen}s}"
printf -v headline "$formatstr" Student Qt "${subjects[#]}"
echo "$headline" # print head line
echo "${headline//[^ ]/-}" # underscore head line
for student in ${!students[#]};do # Now one line by student...
marks=() # Clear marks
for subject in ${subjects[#]};do
eval "marks+=(\${mark_$subject[\$student]:-0})" # Add subject mark or 0
done
printf "$formatstr\n" $student ${students[$student]} ${marks[#]}
done
This may print out something like:
Student Qt english geography math
------- -- ------- --------- ----
Phil 2 18 15 0
Matthew 3 17 15 10
Elena 2 0 14 19
Nota
This script was built for bash v4.4.12 and tested under bash v5.0.
More
You could download bigger demo script: scholl-averages-demo.sh (view in browser as text .txt).
Always pure bash without forks, but with
average by student, average by subject and overall average, in pseudo float
subject and student sorted alphabeticaly
support UTF-8 in student names
.
Student Qt art biology english geography history math Average
------- -- --- ------- ------- --------- ------- ---- -------
Elena 5 12 0 15 14 17 19 12.83
Iñacio 6 12 15 19 18 12 14 15.00
Matthew 5 19 18 17 15 17 0 14.33
Phil 5 15 19 18 0 13 17 13.67
Renée 6 14 19 18 17 18 15 16.83
Theresa 5 17 14 0 12 17 18 13.00
William 6 17 17 15 15 13 14 15.17
------- -- --- ------- ------- --------- ------- ---- -------
Avgs 7 15.14 14.57 14.57 13.00 15.28 13.86 14.40

I would do it something like this. First make empty files for every student:
cat /home/subject/* | cut -d' ' -f1 | sort -u | while read student_name; do > /home/students/$student ; done
Then I would go through each one and add the marks:
for student in `ls /home/students` ; do
for file in /home/subjects/* ; do
subject="`basename $file`"
mark="`egrep "^$student [0-9]+" $file | cut -d' ' -f2`"
if [ -z "$mark" ]; then
echo "$subject 0" >> /home/students/$student
else
echo "$subject $mark" >> /home/students/$student
fi
done
done
something like that anyway

Related

Mean of execution time of a program

I have the following bash code (A.cpp, B.cpp and C.txt are filename in the current directory):
#!/bin/bash
g++ A.cpp -o A
g++ B.cpp -o B
Inputfiles=(X Y Z U V)
for j in "${Inputfiles[#]}"
do
echo $j.txt:
i=1
while [ $i -le 5 ]
do
./A $j.txt
./B C.txt
echo ""
i=`expr $i + 1`
done
echo ""
done
rm -f A B
One execution of ./A and ./B is one execution of my program. I run my program 5 times for each input file in the array 'Inputfiles'. I want the average execution time of my program over each input-file. How can I do so?
(Earlier, I tried to add time and clock functions within the A.cpp and B.cpp files, but I am not able to add the execution times of both files to get the execution time of a program.)
If I understand correctly what average you would like to calculate, I think the code below will serve your purpose.
Some explanations on the additions to your script:
Lines 6 - 14 declare a function that expects three arguments and updates the accumulated total time, in seconds
Line 26 initializes variable total_time.
Lines 31, 38, execute programs A and B respectively. Using bash time to collect the execution time. >/dev/null "discards" A's and B's outputs. 2>&1 redirects stderr to stdout so that grep can get time's outputs (a nice explanation can be found here). grep real keeps only the real output from time, you could refer to this post for an explanation of time's output and choose the specific time of your interest. awk {print $2} keeps only the numeric part of grep's output.
Lines 32, 39 store the minutes part to the corresponding variable
Lines 33-34, 40-41 trim the seconds part of real_time variable
Lines 35, 42 accumulate the total time by calling function accumulate_time
Line 46 calculates the average time by dividing with 5
Converted the while loop to a nested for loop and introduced the iterations variable, not necessarily part of the initial question but helps re-usability of the number of iterations
1 #!/bin/bash
2
3 # Function that receives three arguments (total time,
4 # minutes and seconds) and returns the accumulated time in
5 # seconds
6 function accumulate_time() {
7 total_time=$1
8 minutes=$2
9 seconds=$3
10
11 accumulated_time_secs=$(echo "$minutes * 60 + $seconds + $total_time" | bc )
12 echo "$accumulated_time_secs"
13
14 }
15
16 g++ A.cpp -o A
17 g++ B.cpp -o B
18 Inputfiles=(X Y Z U V)
19
20 iterations=5
21
22 for j in "${Inputfiles[#]}"
23 do
24 echo $j.txt:
25 # Initialize total_time
26 total_time=0.0
27
28 for i in $(seq 1 $iterations)
29 do
30 # Execute A and capture its real time
31 real_time=`{ time ./A $j.txt >/dev/null; } 2>&1 | grep real | awk '{print $2}'`
32 minutes=${real_time%m*}
33 seconds=${real_time#*m}
34 seconds=${seconds%s*}
35 total_time=$(accumulate_time "$total_time" "$minutes" "$seconds")
36
37 # Execute B and capture its real time
38 real_time=`{ time ./B C.txt >/dev/null; } 2>&1 | grep real | awk '{print $2}'`
39 minutes=${real_time%m*}
40 seconds=${real_time#*m}
41 seconds=${seconds%s*}
42 total_time=$(accumulate_time "$total_time" "$minutes" "$seconds")
43 echo ""
44 done
45
46 average_time=$(echo "scale=3; $total_time / $iterations" | bc)
47 echo "Average time for input file $j is: $average_time"
48 done
49
50 rm -f A B

Referencing one number to another number

I have a question which I think is fairly simply but I am new to Bash and can't find much info on this.
5 references 3
10 references 4
20 references 10
30 references 20
inputBeforeLookup = 5 #this the number which needs to look up 3 above^^^^
# 10 would lookup and return 4
#20 returns 10
start = 1
end = $start + $lookupNumberfromFile # 3 in this case, since input was 5
seq $start $end
1
2
3
4
I guess my question here is what is the proper way to create like a configuration file which references numbers to other numbers?
If there is a better way than the snippet of code I posted I am always open to suggestions, like i said I am learning.
I am new to this so I am not sure if the syntax is 100% correct. I am more so looking for a solution on the best way to solve the problem.
Hope this sample helps you regarding variable expansion in bash:
Notice that: the \ prevents the expansion of $$ (current process id). For triple substitution you need double eval and so on....
#!/bin/bash
one=1
two=one
three=two
four=three
five=four
echo $one
eval echo \$$two
eval eval echo \\$\$$three
eval eval eval echo \\\\$\\$\$$four
eval eval eval eval echo \\\\\\\\$\\\\$\\$\$$five
Output:
1
1
1
1
1
Bonus:
In zsh you can use nested substitution much more easily:
#!/bin/zsh
one=1
two=one
three=two
four=three
five=four
echo $one
echo ${(P)two}
echo ${(P)${(P)three}}
...
http://zsh.sourceforge.net/Doc/Release/Expansion.html
Set up an associative array, then test it with numbers 1 to 30. Those numbers that don't reference other numbers are printed as is:
MYMAP=( [5]=3 [10]=4 [20]=10 [30]=20 )
seq 30 | while read x ; do echo ${MYMAP[$x]:-$x} ; done | paste - - - - -
That last | paste - - - - - isn't necessary, but 5 column output is easier to follow given that the input has several multiples of 5. Output:
1 2 3 4 3
6 7 8 9 4
11 12 13 14 15
16 17 18 19 10
21 22 23 24 25
26 27 28 29 20

Dividing one file into separate based on line numbers

I have the following test file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:
file 1:
1
2
3
4
5
file2:
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20
That would make 4 files with 5 lines and 1 file with 4 lines.
As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk command in the if statement, instead of printing the first 5 lines, it prints the whole 20?
d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
awk 'NR<=$start' test
fi
done
Thanks!
Why not just use the split util available from your POSIX toolkit. It has an option to split on number of lines which you can give it as 5
split -l 5 input-file
From the man split page,
-l, --lines=NUMBER
put NUMBER lines/records per output file
Note that, -l is POSIX compliant also.
$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1 file2 file3 file4 file5
.
$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20
If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
oh. and wrt why your awk command awk 'NR<=$start' test didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart with the value of a shell variable named start and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test. The awk variable can also be named start or anything else sensible - it is completely unrelated to the name of the shell variable.
You could improve your code by removing the unneccesary echo cut and bc and do it like this
#!/bin/bash
for i in $(seq $(wc -l < test) ); do
(( i % 4 != 1 )) && continue
tail +$i test | head -5 > "file$(( 1+i/4 ))"
done
But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk will give you that (and a lot of other things) for free.
Use split:
$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20
Or, if you have a file:
$ split -l test_file

Setting Bash variable to last number in output

I have bash running a command from another program (AFNI). The command outputs two numbers, like this:
70.0 13.670712
I need to make a bash variable that will be whatever the last # is (in this case 13.670712). I've figured out how to make it print only the last number, but I'm having trouble setting it to be a variable. What is the best way to do this?
Here is the code that prints only 13.670712:
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')"; echo "${test}" | awk '{print $2}'
Just pipe(|) the command output to awk. Here in your example, awk reads from stdout of your previous command and prints the 2nd column de-limited by the default single white-space character.
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]' | awk '{print $2}')"
printf "%s\n" "$test"
13.670712
(or) using echo
echo "$test"
13.670712
This is the simplest of the ways to do this, if you are looking for other ways to do this in bash-ism, use read command as using process-substitution
read _ va2 < <(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')
printf "%s\n" "$val2"
13.670712
Another more portable version using set, which will work irrespective of the shell available.
set -- $(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]');
printf "%s\n" "$2"
13.670712
You can use cut to print to print the second column:
$ echo "70.0 13.670712" | cut -d ' ' -f2
13.670712
And assign that to a variable with command substitution:
$ sc="$(echo '70.0 13.670712' | cut -d ' ' -f2)"
$ echo "$sc"
13.670712
Just replace echo '70.0 13.670712' with the command that is actually producing the two numbers.
If you want to grab the last value of some delimited field (or delimited output from a command), you can use parameter expansion. This is completely internal to Bash:
$ echo "$s"
$ echo ${s##*' '}
10
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ echo ${s2##*' '}
20
And then just assign directly:
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ lf=${s2##*' '}
$ echo "$lf"
20

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh
#!/bin/sh
LOG=/var/log/syslog-ng/svrs/sec2tes1
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done
data_tripwire.sh
91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3
I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....
The problem now is that it reads all the 42 numbers without breaking out of the loop.
This is the output of the table
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
The problem now is that it read the total 42 numbers from 85...3
I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below.
After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.
This is my standard deviation calculation
count=0 # Number of data points; global.
SC=3 # Scale to be used by bc. three decimal places.
E_DATAFILE=90 # Data file error
## ----------------- Set data file ---------------------
if [ ! -z "$1" ] # Specify filename as cmd-line arg?
then
datafile="$1" # ASCII text file,
else #+ one (numerical) data point per line!
datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi # See example data file, below.
if [ ! -e "$datafile" ]
then
echo "\""$datafile"\" does not exist!"
exit $E_DATAFILE
fi
Calculate the mean
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <"$datafile" # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <"$datafile" # Rewinds data file.
Showing the output
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo
I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.
This is the output of the table i want.
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
*Standard
deviation
(7mths) 31.172 35.559 5.248 8.935 5.799 8.580
* Mean
(7mths) 54.428 94.285 11.142 9.142 20.285 14.714
paste - - - - - - < data_tripwire.sh | while read -a values; do
# values is an array with 6 values
# ${values[0]} .. ${values[5]}
arith_mean "${values[#]}"
done
This means you have to rewrite your function so they don't use read: change
while read value
to
for value in "$#"
#Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :
arith_mean () {
local sum=$(IFS=+; echo "$*")
echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
local mean=$1
shift
local sum2=0
for i in "$#"; do
sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
done
echo "scale=$SC; sqrt($sum2/$#)"|bc
}
paste - - - - - - < data_tripwire1.sh | while read -a values; do
mean=$(arith_mean "${values[#]}")
sd=$(sd $mean "${values[#]}")
echo "${values[#]} $mean $sd"
done | column -t
91 58 54 108 52 18 63.500 29.038
8 81 103 110 129 137 94.666 42.765
84 15 14 18 11 17 26.500 25.811
12 6 1 28 6 14 11.166 8.648
8 8 0 0 28 24 11.333 10.934
25 23 21 13 9 4 15.833 7.711
18 17 18 30 13 3 16.500 7.973
Note you don't need to return a fancy value from the functions: you know how many points you pass in.
Based on Glenn's answer I propose this which needs very little changes to the original:
paste - - - - - - < data_tripwire.sh | while read -a values
do
for value in "${values[#]}"
do
echo "$value"
done | arith_mean
for value in "${values[#]}"
do
echo "$value"
done | sd
done
You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.
Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.
The functions will now be able to only read 6 items in datafile.
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Rewinds data file.
From main you will need to set your blocks to read.
for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
done
Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.
The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.
cat <(awk -F: '{print $1}' /etc/passwd) works.
cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

Resources