Pytorch: Memory Efficient weighted sum with weights shared along channels

Pytorch: Memory Efficient weighted sum with weights shared along channels - performance

Inputs:
1) I = Tensor of dim (N, C, X) (Input)
2) W = Tensor of dim (N, X, Y) (Weight)
Output:
1) O = Tensor of dim (N, C, Y) (Output)
I want to compute:
I = I.view(N, C, X, 1)
W = W.view(N, 1, X, Y)
PROD = I*W
O = PROD.sum(dim=2)
return O
without incurring N * C * X * Y memory overhead.
Basically I want to calculate the weighted sum of a feature map wherein the weights are the same along the channel dimension, without incurring memory overhead per channel.
Maybe I could use
from itertools import product
O = torch.zeros(N, C, Y)
for n, x, y in product(range(N), range(X), range(Y)):
O[n, :, y] += I[n, :, x]*W[n, x, y]
return O
but that would be slower (no broadcasting) and I'm not sure how much memory overhead would be incurred by saving variables for the backward pass.

You can use torch.bmm (https://pytorch.org/docs/stable/torch.html#torch.bmm). Just do torch.bmm(I,W)
To verify the results :
import torch
N, C, X, Y= 100, 10, 9, 8
i = torch.rand(N,C,X)
w = torch.rand(N,X,Y)
o = torch.bmm(i,w)
# desired result code
I = i.view(N, C, X, 1)
W = w.view(N, 1, X, Y)
PROD = I*W
O = PROD.sum(dim=2)
print(torch.allclose(O,o)) # should output True if outputs are same.
EDIT: Ideally, I would assume using pytorch's internal matrix multiplication is efficient. However, you can also measure the memory usage with tracemalloc (at least on CPU). See https://discuss.pytorch.org/t/measuring-peak-memory-usage-tracemalloc-for-pytorch/34067 for GPU.
import torch
import tracemalloc
tracemalloc.start()
N, C, X, Y= 100, 10, 9, 8
i = torch.rand(N,C,X)
w = torch.rand(N,X,Y)
o = torch.bmm(i,w)
# output is a tuple indicating current memory and peak memory
print(tracemalloc.get_traced_memory())
You can do the same with other code and see the bmm implementation is indeed efficient.
import torch
import tracemalloc
tracemalloc.start()
N, C, X, Y= 100, 10, 9, 8
i = torch.rand(N,C,X)
w = torch.rand(N,X,Y)
I = i.view(N, C, X, 1)
W = w.view(N, 1, X, Y)
PROD = I*W
O = PROD.sum(dim=2)
# output is a tuple indicating current memory and peak memory
print(tracemalloc.get_traced_memory())

Related

Iterating through and operating on Sympy Matrices

My modified block of code from here works for XOR'ing python lists via using functions (XOR and AND) of the Sympy library (first block of code below). However, I am stumped on how to iterate via sympy matrices (second block of code below).
The python lists code that works is:
from sympy import And, Xor
from sympy.logic import SOPform, simplify_logic
from sympy import symbols
def LogMatrixMult (A, B):
rows_A = len(A)
cols_A = len(A[0])
rows_B = len(B)
cols_B = len(B[0])
if cols_A != rows_B:
print ("Cannot multiply the two matrices. Incorrect dimensions.")
return
# Create the result matrix
# Dimensions would be rows_A x cols_B
C = [[0 for row in range(cols_B)] for col in range(rows_A)]
for i in range(rows_A):
for j in range(cols_B):
for k in range(cols_A):
# I can add Sympy's in simplify_logic(-)
C[i][j] = Xor(C[i][j], And(A[i][k], B[k][j]))
return C
b, c, d, e, f, w, x, y, z = symbols('b c d e f w x y z')
m1 = [[b,c,d,e]]
m2 = [[w,x],[x,z],[y,z],[z,w]]
result = simplify_logic(LogMatrixMult(m1, m2)[0][0])
print(result)
In the block below using Sympy matrices note that the i,j,k and C, A, B definitions is from me trying to modify to use the iterator, I don't know if this needed or correct.
from sympy import And, Xor
from sympy.matrices import Matrix
from sympy.logic import SOPform, simplify_logic
from sympy import symbols, IndexedBase, Idx
from sympy import symbols
def LogMatrixMultArr (A, B):
rows_A = A.rows
cols_A = A.cols
rows_B = B.rows
cols_B = B.cols
i,j,k = symbols('i j k', cls=Idx)
C = IndexedBase('C')
A = IndexedBase('A')
B = IndexedBase('B')
if cols_A != rows_B:
print ("Cannot multiply the two matrices. Incorrect dimensions.")
return
# Create the result matrix
# Dimensions would be rows_A x cols_B
C = [[0 for row in range(cols_B)] for col in range(rows_A)]
for i in range(rows_A):
for j in range(cols_B):
for k in range(cols_A):
# I can add Sympy's in simplify_logic(-)
C[i,j] = Xor(C[i,j], And(A[i,k], B[k,j]))
# C[i][j] = Xor(C[i][j],And(A[i][k],B[k][j]))
return C
b, c, d, e, f, w, x, y, z = symbols('b c d e f w x y z')
P = Matrix([w,x]).reshape(1,2)
Q = Matrix([y,z])
print(LogMatrixMultArr(P,Q))
The error I get is: TypeError: list indices must be integers or slices, not tuple
C[i,j] = Xor(C[i,j], And(A[i,k], B[k,j]))
Now I believe I have to do something with some special way of sympy's iterating but am stuck on how to get it to work in the code - if I do even need this methodology.
Also, if anyone knows how to do something such as the above using XOR and And (non-bitwise) instead of + and * operators in a faster way, please do share.
Thanks.

I think the problem is with IndexedBase objects. I'm not competent on these but it seems you do not use them right. If you replace
i,j,k = symbols('i j k', cls=Idx)
C = IndexedBase('C')
A = IndexedBase('A')
B = IndexedBase('B')
by
C = zeros(rows_A, cols_B)
and remove C = [[0 for row in range(cols_B)] for col in range(rows_A)], then it works.

Tensorflow debug or print statements

I am very new to TensorFlow and trying to learn it. I copied a program from tutorial website. As I modified it, there are issues with the program and I have to debug. I am looking for help to understand how I can print certain values such as cost and optimizer. I have to figure out to see the value being updated in each iteration. I understand that notes cannot be printed but I take that cost and optimizers are inputs which should be printable, right?
plt.ion()
n_observations = 100
xs = np.linspace(-3, 3, n_observations)
ys = np.sin(xs) + np.random.uniform(-0.5, 0.5, n_observations)
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
Y_pred = tf.Variable(tf.random_normal([1]), name='bias')
for pow_i in range(1, 5):
W = tf.Variable(tf.random_normal([1]), name='weight_%d' % pow_i)
Y_pred = tf.add(tf.multiply(tf.pow(X, pow_i), W), Y_pred)
cost = tf.reduce_sum(tf.pow(Y_pred - Y, 2)) / (n_observations - 1)
d = tf.Print(cost, [cost, 2.0], message="Value of cost id:")
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
n_epochs = 10
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
prev_training_cost = 0.0
for epoch_i in range(n_epochs):
for (x, y) in zip(xs, ys):
print("Msg2 x, y ", x, y, cost);
sess.run(optimizer, feed_dict={X: x, Y: y})
sess.run(d)
print("Msg3 x, y ttt ", x, y, optimizer);
training_cost = sess.run(
cost, feed_dict={X: xs, Y: ys})
print(training_cost)
print("Msg3 cost, xs ys", cost, xs, ys);
if epoch_i % 100 == 0:
ax.plot(xs, Y_pred.eval(
feed_dict={X: xs}, session=sess),
'k', alpha=epoch_i / n_epochs)
fig.show()
#plt.draw()
# Allow the training to quit if we've reached a minimum
if np.abs(prev_training_cost - training_cost) < 0.001:
break
prev_training_cost = training_cost
ax.set_ylim([-3, 3])
fig.show()
plt.waitforbuttonpress()

In your example, cost and optimizer refer to tensors in the graph, not inputs to your graph. The need to be fetched in a session.run call to be able to print their python values. For example, in your example, printing training_cost should be printing the cost. Similarly, if you return the value you of optimizer from session.run(optimizer, ...), it should return the correct printable value.
If you are interested in debugging and printing values check out:
tfdbg
tf.Print
Hope that helps!

how to calculate a quadratic equation that best fits a set of given data

I have a vector X of 20 real numbers and a vector Y of 20 real numbers.
I want to model them as
y = ax^2+bx + c
How to find the value of 'a' , 'b' and 'c'
and best fit quadratic equation.
Given Values
X = (x1,x2,...,x20)
Y = (y1,y2,...,y20)
i need a formula or procedure to find following values
a = ???
b = ???
c = ???
Thanks in advance.

Everything #Bartoss said is right, +1. I figured I just add a practical implementation here, without QR decomposition. You want to evaluate the values of a,b,c such that the distance between measured and fitted data is minimal. You can pick as measure
sum(ax^2+bx + c -y)^2)
where the sum is over the elements of vectors x,y.
Then, a minimum implies that the derivative of the quantity with respect to each of a,b,c is zero:
d (sum(ax^2+bx + c -y)^2) /da =0
d (sum(ax^2+bx + c -y)^2) /db =0
d (sum(ax^2+bx + c -y)^2) /dc =0
these equations are
2(sum(ax^2+bx + c -y)*x^2)=0
2(sum(ax^2+bx + c -y)*x) =0
2(sum(ax^2+bx + c -y)) =0
Dividing by 2, the above can be rewritten as
a*sum(x^4) +b*sum(x^3) + c*sum(x^2) =sum(y*x^2)
a*sum(x^3) +b*sum(x^2) + c*sum(x) =sum(y*x)
a*sum(x^2) +b*sum(x) + c*N =sum(y)
where N=20 in your case. A simple code in python showing how to do so follows.
from numpy import random, array
from scipy.linalg import solve
import matplotlib.pylab as plt
a, b, c = 6., 3., 4.
N = 20
x = random.rand((N))
y = a * x ** 2 + b * x + c
y += random.rand((20)) #add a bit of noise to make things more realistic
x4 = (x ** 4).sum()
x3 = (x ** 3).sum()
x2 = (x ** 2).sum()
M = array([[x4, x3, x2], [x3, x2, x.sum()], [x2, x.sum(), N]])
K = array([(y * x ** 2).sum(), (y * x).sum(), y.sum()])
A, B, C = solve(M, K)
print 'exact values ', a, b, c
print 'calculated values', A, B, C
fig, ax = plt.subplots()
ax.plot(x, y, 'b.', label='data')
ax.plot(x, A * x ** 2 + B * x + C, 'r.', label='estimate')
ax.legend()
plt.show()
A much faster way to implement solution is to use a nonlinear least squares algorithm. This will be faster to write, but not faster to run. Using the one provided by scipy,
from scipy.optimize import leastsq
def f(arg):
a,b,c=arg
return a*x**2+b*x+c-y
(A,B,C),_=leastsq(f,[1,1,1])#you must provide a first guess to start with in this case.

That is a linear least squares problem. I think the easiest method which gives accurate results is QR decomposition using Householder reflections. It is not something to be explained in a stackoverflow answer, but I hope you will find all that is needed with this links.
If you never heard about these before and don't know how it connects with you problem:
A = [[x1^2, x1, 1]; [x2^2, x2, 1]; ...]
Y = [y1; y2; ...]
Now you want to find v = [a; b; c] such that A*v is as close as possible to Y, which is exactly what least squares problem is all about.

Simplifying recursive mean calculation

If we have
Ei = mean [abs (Hi - p) for p in Pi]
H = mean [H0, H1, ... Hi, ... Hn]
P = concat [P0, P1, ... Pi, ... Pn]
then does there exist a more efficient way to compute
E = mean [abs (H - p) for p in P]
in terms of H, P, and the Eis and His, given that H, E, and P go on to be used as Hi, Ei, and Pi for some i, at a higher recursive level?
If we store the length of Pi as Li at each stage, then we can let
L = sum [L0, L1, ... Li, ... Ln]
allowing us to perform the somewhat easier calculation
E = sum ([abs (H - p) for p in P] / L)
but the use of the abs function seems to severely restrict the kinds of algebraic manipulations we can use to simplify the numerator.

No. Imagine you have just two groups, and one group has H1 = 1 and the other group has H2 = 2. Imagine that every p in P1 is either 0 or 2, and every p in P2 in is either 1 or 3. Now you will always have E1 = 1 and E2 = 1, regardless of the actual values in P1 and P2. However, you can see that if all p in P1 are 2, and all p in P2 are 1, then E will be minimized (specifically 0.5) because H = 1.5. Or all p in P1 could be 0 and all p in P2 could be 3, in which case E would be maximized. (specifically 1.5). And you could get any answer for E in between 0.5 and 1.5 depending on the distribution of the p. If you don't actually go and look at all the individual p, there's no way to tell what exact value of E you will get between 0.5 and 1.5. So you can't do any better than O(n) time to compute E, where n is the total size of P, which is the same running time if you just compute your desired quantity E directly from it's definition formula.

Finding major axis/image orientation of binary image in R

I have a high res binary image which looks something like:
I'm trying to compute the major axis which should be slightly rotated to the right and eventually get the axis of orientation of the object
A post here (in matlab) suggests a way of doing this is computing the covariance matrix for the datapoints and finding their eigenvalues/eigenvectors
I am trying to implement something similar in R
%% MATLAB CODE Calculate axis and draw
[M N] = size(Ibw);
[X Y] = meshgrid(1:N,1:M);
%Mass and mass center
m = sum(sum(Ibw));
x0 = sum(sum(Ibw.*X))/m;
y0 = sum(sum(Ibw.*Y))/m;
#R code
d = dim(im)
M = d[1]
N = d[2]
t = meshgrid(M,N)
X = t[[2]]
Y = t[[1]]
m = sum(im);
x0 = sum(im %*% X)/m;
y0 = sum(im %*% Y)/m;
meshgrid <-function(r,c){
return(list(R=matrix(rep(1:r, r), r, byrow=T),
C=matrix(rep(1:c, c), c)))
}
However, computing m , x0 and y0 takes too long in R.
Does anyone know of an implementation in R?

Computing the variance matrix directly, with var, takes 1/3 of a second.
# Sample data
M <- 2736
N <- 3648
im <- matrix( FALSE, M, N );
y <- as.vector(row(im))
x <- as.vector(col(im))
im[ abs( y - M/2 ) < M/3 & abs( x - N/2 ) < N/3 ] <- TRUE
#image(im)
theta <- runif(1, -pi/12, pi/12)
xy <- cbind(x+1-N/2,y+1-M/2) %*% matrix(c( cos(theta), sin(theta), -sin(theta), cos(theta) ), 2, 2)
#plot(xy[,1]+N/2-1, xy[,2]+M/2-1); abline(h=c(1,M),v=c(1,N))
f <- function(u, lower, upper) pmax(lower,pmin(round(u),upper))
im[] <- im[cbind( f(xy[,2] + M/2 - 1,1,M), f(xy[,1] + N/2 - 1,1,N) )]
image(1:N, 1:M, t(im), asp=1)
# Variance matrix of the points in the rectangle
i <- which(im)
V <- var(cbind( col(im)[i], row(im)[i] ))
# Their eigenvectors
u <- eigen(V)$vectors
abline( M/2-N/2*u[2,1]/u[1,1], u[2,1]/u[1,1], lwd=5 )
abline( M/2-N/2*u[2,2]/u[1,2], u[2,2]/u[1,2] )

Try replacing the default Rblas.dll with a suitable one from this link.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pytorch: Memory Efficient weighted sum with weights shared along channels - performance

Related

Iterating through and operating on Sympy Matrices

Tensorflow debug or print statements

how to calculate a quadratic equation that best fits a set of given data

Simplifying recursive mean calculation

Finding major axis/image orientation of binary image in R

Categories

Resources