Why do i have "OutOfMemoryError" in my Kmeans CuPy code? - memory-management

im really new for gpu coding i found this Kmeans cupy code my propouse is work with a large data base (n,3) for example to realize about the timing difference on gpu and cpu , i wanna have a huge number of clusters but i am getting a memory management error. Can someone give me the route I should take to research and fix it, i already research but i have not a clear start yet.
import contextlib
import time
import cupy
import matplotlib.pyplot as plt
import numpy
#contextlib.contextmanager
def timer(message):
cupy.cuda.Stream.null.synchronize()
start = time.time()
yield
cupy.cuda.Stream.null.synchronize()
end = time.time()
print('%s: %f sec' % (message, end - start))
var_kernel = cupy.ElementwiseKernel(
'T x0, T x1, T c0, T c1', 'T out',
'out = (x0 - c0) * (x0 - c0) + (x1 - c1) * (x1 - c1)',
'var_kernel'
)
sum_kernel = cupy.ReductionKernel(
'T x, S mask', 'T out',
'mask ? x : 0',
'a + b', 'out = a', '0',
'sum_kernel'
)
count_kernel = cupy.ReductionKernel(
'T mask', 'float32 out',
'mask ? 1.0 : 0.0',
'a + b', 'out = a', '0.0',
'count_kernel'
)
def fit_xp(X, n_clusters, max_iter):
assert X.ndim == 2
# Get NumPy or CuPy module from the supplied array.
xp = cupy.get_array_module(X)
n_samples = len(X)
# Make an array to store the labels indicating which cluster each sample is
# contained.
pred = xp.zeros(n_samples)
# Choose the initial centroid for each cluster.
initial_indexes = xp.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
# Compute the new label for each sample.
distances = xp.linalg.norm(X[:, None, :] - centers[None, :, :], axis=2)
new_pred = xp.argmin(distances, axis=1)
# If the label is not changed for each sample, we suppose the
# algorithm has converged and exit from the loop.
if xp.all(new_pred == pred):
break
pred = new_pred
# Compute the new centroid for each cluster.
i = xp.arange(n_clusters)
mask = pred == i[:, None]
sums = xp.where(mask[:, :, None], X, 0).sum(axis=1)
counts = xp.count_nonzero(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def fit_custom(X, n_clusters, max_iter):
assert X.ndim == 2
n_samples = len(X)
pred = cupy.zeros(n_samples,dtype='float32')
initial_indexes = cupy.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
new_pred = cupy.argmin(distances, axis=1)
if cupy.all(new_pred == pred):
break
pred = new_pred
i = cupy.arange(n_clusters)
mask = pred == i[:, None]
sums = sum_kernel(X, mask[:, :, None], axis=1)
counts = count_kernel(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def draw(X, n_clusters, centers, pred, output):
# Plot the samples and centroids of the fitted clusters into an image file.
for i in range(n_clusters):
labels = X[pred == i]
plt.scatter(labels[:, 0], labels[:, 1], c=numpy.random.rand(3))
plt.scatter(
centers[:, 0], centers[:, 1], s=120, marker='s', facecolors='y',
edgecolors='k')
plt.savefig(output)
def run_cpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with timer(' CPU '):
centers, pred = fit_xp(X_train, n_clusters, max_iter)
def run_gpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with cupy.cuda.Device(gpuid):
X_train = cupy.asarray(X_train)
with timer(' GPU '):
if use_custom_kernel:
centers, pred = fit_custom(X_train, n_clusters, max_iter)
else:
centers, pred = fit_xp(X_train, n_clusters, max_iter)
btw i am working in colab pro 25GB(RAM), the code is working with n_clusters=200 and num= 1000000 but if i use bigger numbers the error appear, i am running the code like this:
run_gpu(0,200,1000000,10,True)
This is the error that i have
Any suggestion will be welcome, thanks for your time.

Assuming that CuPy is clever enough not to create explicit copies of the broadcasted input of var_kernel, the output distances has to have a size of 2 * num * num_clusters which are exactly the 6,400,000,000 Bytes it is trying to allocate. You could have a way smaller memory footprint by never actually writing the distances to memory which means fusing the var_kernel with argmin. See this part of the docs.
If I understand the example there correctly, this should work:
#cupy.fuse(kernel_name='argmin_distance')
def argmin_distance(x1, y1, x2, y2):
return cupy.argmin((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2), axis = 1)
The next question would be where the other 13.7GB come from. A big part of them might just be the instances of distances from earlier iterations. I'm not a CuPy expert, but at least in Python/Numpy your use of distances inside the loop would not reuse the same memory, but allocate more memory each time you call the var_kernel. The same problem is visible with pred which is allocated before the loop. If CuPy does things the Numpy way, the solution would be to just put [:] in there like
pred[:] = new_pred
or
distances[:,:,:] = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
For this to work, you need to allocate distances before the loop as well. Also this isn't needed anymore when using kernel fusion, so just take it as an example. It may be best to allocate everything beforehand and then use this syntax everywhere in the loop.
I don't know enough about CuPy to answer why fit_xp doesn't have the same problem (or does it?). But my guess would be that garbage collection with CuPy objects works differently there. If garbage collection were "quick enough" in fit_custom it should work even without kernel fusion or reusing already allocated arrays.
Other problems or at least oddities with your code:
Why are you comparing the zeroth coordinate of centers with the first coordinate of X? Wouldn't it make more sense to call
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 0], centers[None, :, 1])
Why are you creating 3D data when only using the projection on the 2D plane? So why not
samples = numpy.random.randn(num, 2)
Why are you using floats for (the initial version of) pred? The argmin should give an integer type result.

Related

LMFIT - How to get best fit with only positive values

I would like to force the best fit to be always positive.
from lmfit.models import ExpressionModel
from lmfit.models import StepModel
step_mod = StepModel(form='linear', prefix='step_')
gmod = ExpressionModel("1-exp_amp*exp(-x/exp_decay)")
mod = gmod*step_mod
pars = gmod.make_params(exp_amp=1, exp_decay=30)
pars += step_mod.guess(y, x=x, amplitude=1,center=23,sigma=0)
out = mod.fit(y, pars, x=x)
print(out.fit_report())
plt.plot(x, y,'o')
# plt.plot(x, out.init_fit, '--', label='initial fit')
plt.plot(x, out.best_fit, '-', label='best fit')
plt.legend()
plt.show()
And this is the result: Always positive
[[Model]]
(Model(_eval) * Model(step, prefix='step_', form='linear'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 37
# data points = 370
# variables = 5
chi-square = 2.53597470
reduced chi-square = 0.00694788
Akaike info crit = -1833.68223
Bayesian info crit = -1814.11472
## Warning: uncertainties could not be estimated:
step_center: at initial value
step_sigma: at boundary
[[Variables]]
exp_amp: 1.80889259 (init = 1)
exp_decay: 40.0313587 (init = 30)
step_amplitude: 1.02997887 (init = 1)
step_center: 23.0000000 (init = 23)
step_sigma: 0.00000000 (init = 0)
However, it's still not perfect (for example step_center and step_sigma are the same as the beginnig) and if I manually change the center of the step function, this is what I get Negative values
Is there a way to fit getting only positive values? Thanks!
You can put bounds on parameter values so that step_amplitude is always positive and exp_decay is always positive. That should ensure that your function is always > 0. You should investigate why parameters are stuck at the boundaries. For example, guessing step_sigma of 0 is probably unwise.

Is there an "easy" way of animating scatterplots in Python?

I've been looking at a couple of discussions on this today but I dont know how to implement the suggested solutions to my problem. Hopefully someone can give me a working example or some hint.
So, in the current code I basically start with the vertices of a triangle and plot a starting point at coordinates (2,2). In the for loop I generate a random number between 1 and 6 and depending on what that number is, the starting point moves half way to one of the vertices, which corresponds to plotting a new point. The process is then iterated with that new point n=5 times (in the code below)
I would like to see the graph/plot as the new points are getting plotted for some number of iterations n, to see what picture emerges. How can I do this? Is there some easy approach?
from matplotlib import pyplot as plt
import random as r
x = [0, 6, 3]
y = [0, 0, 8]
x0 = x[0]
y0 = y[0]
x1 = x[1]
y1 = y[1]
x2 = x[2]
y2 = y[2]
pt = [2,2]
plt.scatter(pt[0], pt[1], color='blue')
plt.scatter(x,y,color='red')
for i in range(1,5):
randNum = r.randint(1,6)
if randNum == 1 or randNum == 2:
plt.scatter((pt[0]+x0)/2,(pt[1]+y0)/2,color='black')
pt = [(pt[0]+x0)/2,(pt[1]+y0)/2]
elif randNum == 3 or randNum == 4:
plt.scatter((pt[0]+x1)/2,(pt[1]+y1)/2,color='black')
pt = [(pt[0]+x1)/2,(pt[1]+y1)/2]
elif randNum == 5 or randNum == 6:
plt.scatter((pt[0]+x2)/2,(pt[1]+y2)/2,color='black')
pt = [(pt[0]+x2)/2,(pt[1]+y2)/2]
plt.show()

tf.boolean_mask, mask_dimension must be specified?

When using tf.boolean_mask(), a Value Error is raised. It reads "Number of mask dimensions must be specified, even if some dimensions are None. E.g. shape=[None] is ok, but shape=None is not.
I suspect that something is going wrong when I create my boolean mask s, because when I just create a boolean mask by hand, all works fine. However, I've checked the shape and the dtype of s so far, and couldn't notice anything suspicious. Both seemed to be identical to the shape and type of the boolean mask I created by hand.
Please see a screenshot of the problem.
The following should allow you to reproduce the error on your machine. You need tensorflow, numpy and scipy.
with tf.Session() as sess:
# receive five embedded vectors
v0 = tf.constant([[3.0,1.0,2.,4.,2.]])
v1 = tf.constant([[4.0,0,1.0,4,1.]])
v2 = tf.constant([[1.0,1.0,0.0,4.,8.]])
v3 = tf.constant([[1.,4,2.,5.,2.]])
v4 = tf.constant([[3.,2.,3.,2.,5.]])
# concatenate the five embedded vectors into a matrix
VT = tf.concat([v0,v1,v2,v3,v4],axis=0)
# perform SVD on the concatenated matrix
s, u1, u2 = tf.svd(VT)
e = tf.square(s) # list of eigenvalues
v = u1 # eigenvectors as column vectors
# sample a set
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
X = tf.boolean_mask(VT,s)
print(X.eval())
This is the code to generate s. s is a sample from a determinantal point process (for the mathematically interested).
Note that I'm using tf.py_func to wrap this python function:
import tensorflow as tf
import numpy as np
from scipy.linalg import orth
def sample_dpp_bin(e_val,e_vec):
# e_val = np.array of eigenvalues
# e_vec = array of eigenvectors (= column vectors)
eps = 0.01
# sample a set of eigenvectors
ind = (np.random.rand(len(e_val)) <= (e_val)/(1+e_val))
k = sum(ind)
if k == e_val.size:
return np.ones(e_val.size,dtype=bool) # check for full set
if k == 0:
return np.zeros(e_val.size,dtype=bool)
V = e_vec[:,np.array(ind)]
# sample a set of k items
sample = np.zeros(e_val.size,dtype=bool)
for l in range(k-1,-1,-1):
p = np.sum(V**2,axis=1)
p = np.cumsum(p / np.sum(p)) # item cumulative probabilities
i = int((np.random.rand() <= p).argmax()) # choose random item
sample[i] = True
j = (np.abs(V[i,:])>eps).argmax() # pick an eigenvector not orthogonal to e_i
Vj = V[:,j]
V = orth(V - (np.outer(Vj,(V[i,:]/Vj[i]))))
return sample
The output if I print s and tf.reshape(s) is
[False True True True True]
[5]
The output if I print VT and tf.reshape(VT) is
[[ 3. 1. 2. 4. 2.]
[ 4. 0. 1. 4. 1.]
[ 1. 1. 0. 4. 8.]
[ 1. 4. 2. 5. 2.]
[ 3. 2. 3. 2. 5.]]
[5 5]
Any help much appreciated.
Following example works for me.
import tensorflow as tf
import numpy as np
tensor = [[1, 2], [3, 4], [5, 6]]
mask = np.array([True, False, True])
t_m = tf.boolean_mask(tensor, mask)
sess = tf.Session()
print(sess.run(t_m))
Output:
[[1 2]
[5 6]]
Provide your runnable code snippet to reproduce the error. I think you might be doing something wrong in s.
Update:
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
s_v = (s.eval())
X = tf.boolean_mask(VT,s_v)
print(X.eval())
mask should be a np array not TF tensor. You don't have to use tf.pyfunc.
The error message states that the shape of the mask is not defined. What do you get if you print tf.shape(s)? I'd bet the problem with your code is that the shape of s is completely unknown, and you could fix that with a simple call like s.set_shape((None)) (to simply specify that s is a 1-dimensional tensor). Consider this code snippet:
X = np.random.randint(0, 2, (100, 100, 3))
with tf.Session() as sess:
X_tf = tf.placeholder(tf.int8)
# X_tf.set_shape((None, None, None))
y = tf.greater(tf.reduce_max(X_tf, axis=(0, 1)), 0)
print(tf.shape(y))
z = tf.boolean_mask(X_tf, y, axis=2)
print(sess.run(z, feed_dict={X_tf: X}))
This prints a shape of Tensor("Shape_3:0", shape=(?,), dtype=int32) (i.e., even the dimensions of y are unknown) and returns the same error as you have. However, if you uncomment the set_shape line, then X_tf is known to be 3-dimensional and so s is 1-dimensional. The code then works. So, I think all you need to do is add a s.set_shape((None)) call after the py_func call.

Pressure-Impulse-at-One-End, Wave Equation

I am trying to solve the captioned problem numerically using Mathematica, to no avail. Imagine a rod of length L. The speed of sound in the rod is c. A pressure impulse of gaussian shape whose width is comparable to L/c is applied at one end. I would like to solve for the particle displacement function u(t,x) inside the rod. The Mathematica codes are given are follows,
c = 1.0 (*speed of wave*)
L = 1.0 (*length of medium*)
Subscript[P, 0] = 0.0 (*pressure of reservoir at one end*)
Subscript[t, 0] = 5.0*c/L; (*mean time of pressure impulse*)
\[Delta]t = 2.0*c/L; (*Std of pressure impulse*)
K = 1.0; (* proportionality constant, stress-strain *)
Subscript[P, max ] = 1.0; (*max. magnitude of pressure impulse*)
Subscript[P, 1][t_] :=
Subscript[P, max ]
PDF[NormalDistribution[Subscript[t, 0], \[Delta]t], t];
PDE = D[func[t, x], t, t] == c^2 D[func[t, x], x, x]
BC1 = -K func[t, 0] == Subscript[P, 1][t]
BC2 = -K func[t, L] == Subscript[P, 0]
IC1 = func[0,
x] == (-Subscript[P, 1][0]/K) (x/L) + (-Subscript[P, 0]/K) (1 - x/L)
IC2 = Derivative[1, 0][func][0, x] == 0.0
sol = NDSolve[{PDE, BC1, BC2, IC1, IC2},
func, {t, 0, 2 Subscript[t, 0]}, {x, 0, L}]
The problem is that the program keeps running for minutes without giving any output. Given the simplicity of the problem (i.e. that an analytical solution exists), I think there should be a quicker way to arrive at a numerical solution. Would someone please give me some suggestions?
Following George's advice, the equation was solved.
BC1 and BC2 given in the question should be modified as follows
BC1 = -kk Derivative[0, 1][func][t, 0] == p1[t]
BC2 = -kk Derivative[0, 1][func][t, ll] == p0
Also t0 and [Delta]t has been modified,
t0 = 2.0*c/ll (*mean time of pressure impulse*)
\[Delta]t = 0.5*c/ll (*Std of pressure impulse*)
The problem can be solved to within the accuracy requirement for the time interval 0 < t < 2 t0. I solved the problem for a longer time interval 0 < t < 4 t0 in order to look for something interesting.
Here is a plot of 3D plot of pressure (versus x and t)
Here is a plot of the pressure at one end of the bar where impulse is applied. The pressure is a gaussian, as expected.
Here is a plot of the pressure in the middle of the bar. Note that although the applied pressure is a gaussian, and the pressure at the other end is held at P0=0, the pressure becomes negative for some time tc.

Solving systems of second order differential equations

I'm working on a script in mathematica that will take simulate a string held at either end and plucked, by solving the wave equation via numerical methods. (http://en.wikipedia.org/wiki/Wave_equation#Investigation_by_numerical_methods)
n = 5; (*The number of discreet elements to be used*)
L = 1.0; (*The length of the string that is vibrating*)
a = 1.0/3.0; (*The distance from the left side that the string is \
plucked at*)
T = 1; (*The tension in the string*)
[Rho] = 1; (*The length density of the string*)
y0 = 0.1; (*The vertical distance of the string pluck*)
[CapitalDelta]x = L/n; (*The length of each discreet element*)
m = ([Rho]*L)/n;(*The mass of each individual node*)
c = Sqrt[T/[Rho]];(*The speed at which waves in the string propogate*)
I set all my variables
Y[t] = Array[f[t], {n - 1, 1}];
MatrixForm(*Creates a vector size n-1 by 1 of functions \
representing each node*)
I define my Vector of nodal position functions
K = MatrixForm[
SparseArray[{Band[{1, 1}] -> -2, Band[{2, 1}] -> 1,
Band[{1, 2}] -> 1}, {n - 1,
n - 1}]](*Creates a matrix size n by n governing the coupling \
between each node*)
I create the stiffness matrix relating all the nodal functions to one another
Y0 = MatrixForm[
Table[Piecewise[{{(((i*L)/n)*y0)/a,
0 < ((i*L)/n) < a}, {(-((i*L)/n)*y0)/(L - a) + (y0*L)/(L - a),
a < ((i*L)/n) < L}}], {i, 1, n - 1}]]
I define the initial positions of each node using a piecewise function
NDSolve[{Y''[t] == (c/[CapitalDelta]x)^2 Y[t].K, Y[0] == Y0,
Y'[0] == 0},
Y, {t, 0, 10}];(*Numerically solves the system of second order DE's*)
Finally, This should solve for the values of the individual nodes, but it returns an error:
"NDSolve::ndinnt : Initial condition [Y0 table] is not a number or a rectangular array"
So , it would seem that I don't have a firm grasp on how matrices work in mathematica. I would greatly appreciate it if anyone could help me get this last line of code to run properly.
Thank you,
Brad
I don't think you should use MatrixForm when defining the matrices. MatrixForm is used to format a list of list as a matrix, usually when you display it. Try removing it and see if it works.

Resources