Moving average with successive elements using awk - bash

I am trying to write a script in which each row element will give the average of next N rows (including itself). I know how to do it with preceding rows like the Nth row will give the average of the preceding N rows. Here is the script for that
awk '
BEGIN{
N = 5;
}
{
x = $2;
i = NR % N;
aveg += (x - X[i]) / N;
X[i] = x;
print $1, $2, aveg;
}' < file > aveg.txt
where file looks like this
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
I want that the first row has average of the next 5 elements i.e.
(1+2+3+4+5)/5=3
second row (2+3+4+5+6)/5=4
third row (3+4+5+6+7)/5=5
and so on. The rows should look like
1 1 3
2 2 4
3 3 5
4 4 6 ...
Can it be done as simply as the script shown above? I was thinking of assigning the row value as the value of nth row below and then proceeding with the above script. But, unfortunately I am unable to assign the row value to some value down the file. Can someone help me to write this script and find the moving average. I am open to other commands in shell as well.

$ cat test.awk
BEGIN {
N=5 # the window size
}
{
n[NR]=$1 # store the value in an array
}
NR>=N { # for records where NR >= N
x=0 # reset the sum variable
delete n[NR-N] # delete the one out the window of N
for(i in n) # all array elements
x+=n[i] # ... must be summed
print n[NR-(N-1)],x/N # print the row from the beginning of window
} # and the related window average
Try it:
$ for i in {1..36}; do echo $i $i >> test.in ; done
$ awk -f test.awk test.in
1 3
2 4
3 5
...
30 32
31 33
32 34
It could be done in running sum, add current and subtract n[NR-N], like this:
BEGIN {
N=5
}
{
n[NR]=$1
x+=$1-n[NR-N]
}
NR>=N {
delete n[NR-N]
print n[NR-(N-1)],x/N
}

Using a N-sized array
BEGIN { N=5 }
{
s+=array[i++]=$1
if (i>=N) i=0
}
NR>=N {
print array[i], s/N
s-=array[i]
}

$ cat tst.awk
BEGIN { OFS="\t"; range=5 }
{ recs[NR%range] = $0 }
NR >= range {
sum = 0
for (i in recs) {
split(recs[i],flds)
sum += flds[2]
}
print recs[(NR+1-range)%range], sum / range
}
.
$ awk -f tst.awk file
1 1 3
2 2 4
3 3 5
4 4 6
5 5 7
6 6 8
7 7 9
8 8 10
9 9 11
10 10 12
11 11 13
12 12 14
13 13 15
14 14 16
15 15 17
16 16 18
17 17 19
18 18 20
19 19 21
20 20 22
21 21 23
22 22 24
23 23 25
24 24 26
25 25 27
26 26 28
27 27 29
28 28 30
29 29 31
30 30 32
31 31 33
32 32 34
33 33 35
34 34 36
35 35 37
36 36 38

Related

Multiplication Table using neste for loops

I am trying to produce a square-formatted multiplication table with the output at the end using code below:
def multiplicationTable(maxValue):
for i in range(1, maxvalue):
for j in range(1, maxvalue):
print(("{:6d}".format(i * j,)), end='')
print()
print(multiplicationTable(1)
print(multiplicationTable(5))
print(multiplicationTable(10))
1
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
I get an error:
File "", line 7
print(multiplicationTable(5))
^
SyntaxError: invalid syntax
print(multiplicationTable(1) is missing a closing ).
You are using maxValue (with capital V) in your function definition while using maxvalue (with small v) in the function body.
Here is the new version:
def multiplicationTable(maxvalue): # maxvalue, not maxValue
for i in range(1, maxvalue+1):
for j in range(1, maxvalue+1):
print(("{:6d}".format(i * j,)), end='')
print()
multiplicationTable(1)
multiplicationTable(5)
multiplicationTable(10)
EDIT 1: Changed range(1, maxvalue) to range(1, maxvalue+1)
EDIT 2: Changed print(multiplicationTable(n)) to multiplicationTable(n)

What is this pseudocode intended to do?

// L is a list and n is its length //
// we assume that n= 4**k , for k≥1//
Alg1(L,n)
remove the smallest and largest element from L
if n-2 > (4**k)/2
call Alg1(L, n-2)
Not what it does but what is it intended to do? I don't understand what the question means by "intended" but I think the algorithm just removes the largest and smallest element of the list recursively until 4 or 3 elements remain.
Given a starting list of size 4^k, which appears to be implied by the definition given for n, alg1 reduces the size of the supplied list to ((4^k) / 2) + 2 for k >= 1. I agree with #Ctznkane525 that the algorithm is incompletely specified in that it doesn't tell us what the return value should be. But if we make the simple assumption that two elements should be removed from the end of the list each time n is decremented by 2 we can continue. Thus, consider the following implementation in Clojure:
(defn exp [x n]
(reduce * (repeat n x)))
(def k 1)
(defn alg1[l n]
(println "k=" k " n=" n " l=" l)
(if (> (- n 2) (/ (exp 4 k) 2))
(recur (take (- n 2) l) (- n 2))
l))
I've added code here to print the values of k, n, and l so we can watch what happens at each step.
Given the above we'll start a little testing. We'll invoke alg1 as (alg1 (take (exp 4 k) (iterate #(+ 1 %) 1)) (exp 4 k)), which simply creates a list of 4^k elements and passes it as the first argument to alg1, and passes 4^k for the second argument. So here goes:
user=> (def k 1)
#'user/k
user=> (alg1 (take (exp 4 k) (iterate #(+ 1 %) 1)) (exp 4 k))
k= 1 n= 4 l= (1 2 3 4)
(1 2 3 4)
So with k=1 and the list defined as (1 2 3 4) the function returns immediately, because n-2 = 2, and that's less than or equal to (4^k)/2, which is also 2.
Let's try with k=2:
user=> (def k 2)
#'user/k
user=> (alg1 (take (exp 4 k) (iterate #(+ 1 %) 1)) (exp 4 k))
k= 2 n= 16 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16)
k= 2 n= 14 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14)
k= 2 n= 12 l= (1 2 3 4 5 6 7 8 9 10 11 12)
k= 2 n= 10 l= (1 2 3 4 5 6 7 8 9 10)
(1 2 3 4 5 6 7 8 9 10)
Ah, that's a bit more interesting. We start with n=16, which is of course 4^k = 4^2 = 16, and the beginning list is (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16). When these values are considered by alg1 it finds that n-2 (14) is greater than (4^2)/2 (8), so it trims two elements from the end of the list and recursively invokes itself. On the second iteration it finds that n-2 (12) is greater than 8 so it trims another two elements and recursively invokes itself. This continues until n=10, when alg1 finds that n-2 (8) is no longer greater than (4^2)/2 (8), so it returns the list (1 2 3 4 5 6 7 8 9 10).
What happens with k=3?
user=> (def k 3)
#'user/k
user=> (alg1 (take (exp 4 k) (iterate #(+ 1 %) 1)) (exp 4 k))
k= 3 n= 64 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64)
k= 3 n= 62 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62)
k= 3 n= 60 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60)
k= 3 n= 58 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58)
k= 3 n= 56 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56)
k= 3 n= 54 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54)
k= 3 n= 52 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52)
k= 3 n= 50 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50)
k= 3 n= 48 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48)
k= 3 n= 46 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46)
k= 3 n= 44 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44)
k= 3 n= 42 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42)
k= 3 n= 40 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40)
k= 3 n= 38 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38)
k= 3 n= 36 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36)
k= 3 n= 34 l= (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34)
(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34)
Similar results to the above. At each iteration two elements are trimmed from the list until the condition specified in the algorithm is reached, at which point the algorithm exits.
You can continue bumping up the value of k, building the arguments, and watching the algorithm work, but in the end the results are always similar: the list is reduced in size to ((4^k) / 2) + 2.
Best of luck.

Get the average of the selected cells line by line in a file?

I have a single file with the multiple columns. I want to select few and get average for selected cell in a line and output the entire average as column.
For example:
Month Low.temp Max.temp Pressure Wind Rain
JAN 17 36 120 5 0
FEB 10 34 110 15 3
MAR 13 30 115 25 5
APR 14 33 105 10 4
.......
How to get average temperature (Avg.temp) and Humidity (Hum)as column?
Avg.temp = (Low.temp+Max.temp)/2
Hum = Wind * Rain
To get the Avg.temp
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40
.......
I don't want to do it in excel. Is there any simple shell command to do this?
I would use awk like this:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file
or:
awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {print $0, ($2+$3)/2, $5*$6}' file
This consists in doing the calculations and appending them to the original values.
Let's see it in action, piping to column -t for a nice output:
$ awk 'NR==1 {print $0, "Avg.temp", "Hum"; next} {$(NF+1)=($2+$3)/2; $(NF+1)=$5*$6}1' file | column -t
Month Low.temp Max.temp Pressure Wind Rain Avg.temp Hum
JAN 17 36 120 5 0 26.5 0
FEB 10 34 110 15 3 22 45
MAR 13 30 115 25 5 21.5 125
APR 14 33 105 10 4 23.5 40

How to print a 10*10 times table as a grid?

I am trying to print a 10x10 times table using for loops.
Here's my attempt:
for x in range (1, 11):
for y in range (1, 11):
print (x*y)
print()
The output is a vertical line of numbers. I need it like the square table kind.
What you need to do is leverage the end argument:
for x in range (1, 11):
for y in range (1, 11):
print ('{:3}'.format(x*y), end=' ')
print()
Also, note the way the row entries are formatted. By using '{:3}'.format(x*y), the expression is padded with spaces out to three digits. For more details on formatting, consult the documentation.
Sample output:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
The print function adds a \n unless told otherwise. Try explicitly saying not to:
for x in range (1, 11):
for y in range (1, 11):
print (x*y, end=' ')
print()
Note: I'm assuming you're either on python3 or imported the print_function since you are using the print function, rather than statement.
Edit: added a space in the end
And one may complicate things a bit and print X index and Y index :) here
n = 11
m = 11
grid = [[x * y for x in range(1,n)] for y in range(1,m)]
print(' ', end='')
print(''.join([f'{j:5}' for j in range(1,n)]))
print(' ', end='')
print(''.join([f'{"_":>5}' for _ in range(1,n)]))
for i in range(n-1):
print(f'{i+1:2}|', end=' ')
print(' '.join(f'{x:4}' for x in grid[i]))
Results
1 2 3 4 5 6 7 8 9 10
_ _ _ _ _ _ _ _ _ _
1| 1 2 3 4 5 6 7 8 9 10
2| 2 4 6 8 10 12 14 16 18 20
3| 3 6 9 12 15 18 21 24 27 30
4| 4 8 12 16 20 24 28 32 36 40
5| 5 10 15 20 25 30 35 40 45 50
6| 6 12 18 24 30 36 42 48 54 60
7| 7 14 21 28 35 42 49 56 63 70
8| 8 16 24 32 40 48 56 64 72 80
9| 9 18 27 36 45 54 63 72 81 90
10| 10 20 30 40 50 60 70 80 90 100

How can I make a multiplication table using bash brace expansion? So far I have this: echo $[{1..10}*{1..10}]

I am trying to learn bash at a deeper level, and I decided to make a multiplication table. I have the functionality with the statement :
echo $[{1..10}*{1..10}]
but that gives me the following output:
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
Is there any way to format this output like the following using only 1 statement (i can figure out how to do this with loops, but that's no fun :p )
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
Is it even possible to do in one statement, or would I have to loop?
Use this line for a nice output without using loops:
echo $[{1..10}*{1..10}] | xargs -n10 | column -t
Output:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
Update
As a logical next step, I asked here if this multiplication table can have a variable range. With this help, my answer works with a variable ($boundary) range and stays quite readable:
boundary=4; eval echo $\[{1..$boundary}*{1..$boundary}\] | xargs -n$boundary | column -t
Output:
1 2 3 4
2 4 6 8
3 6 9 12
4 8 12 16
Also note that the $[..] arithmetic notation is deprecated and $((...)) should be used instead:
boundary=4; eval eval echo "$\(\({1..$boundary}*{1..$boundary}\)\)" | xargs -n$boundary | column -t
The printf built-in repeats its format as many times as necessary to print all arguments, so:
printf '%d %d %d %d %d %d %d %d %d %d\n' $[{1..10}*{1..10}]
If you want to avoid repeating the %d bit, it's trickier.
printf "$(echo %$[{1..10}*0]d)\\n" $[{1..10}*{1..10}]
In production code, use a loop.

Resources