im really new for gpu coding i found this Kmeans cupy code my propouse is work with a large data base (n,3) for example to realize about the timing difference on gpu and cpu , i wanna have a huge number of clusters but i am getting a memory management error. Can someone give me the route I should take to research and fix it, i already research but i have not a clear start yet.
import contextlib
import time
import cupy
import matplotlib.pyplot as plt
import numpy
#contextlib.contextmanager
def timer(message):
cupy.cuda.Stream.null.synchronize()
start = time.time()
yield
cupy.cuda.Stream.null.synchronize()
end = time.time()
print('%s: %f sec' % (message, end - start))
var_kernel = cupy.ElementwiseKernel(
'T x0, T x1, T c0, T c1', 'T out',
'out = (x0 - c0) * (x0 - c0) + (x1 - c1) * (x1 - c1)',
'var_kernel'
)
sum_kernel = cupy.ReductionKernel(
'T x, S mask', 'T out',
'mask ? x : 0',
'a + b', 'out = a', '0',
'sum_kernel'
)
count_kernel = cupy.ReductionKernel(
'T mask', 'float32 out',
'mask ? 1.0 : 0.0',
'a + b', 'out = a', '0.0',
'count_kernel'
)
def fit_xp(X, n_clusters, max_iter):
assert X.ndim == 2
# Get NumPy or CuPy module from the supplied array.
xp = cupy.get_array_module(X)
n_samples = len(X)
# Make an array to store the labels indicating which cluster each sample is
# contained.
pred = xp.zeros(n_samples)
# Choose the initial centroid for each cluster.
initial_indexes = xp.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
# Compute the new label for each sample.
distances = xp.linalg.norm(X[:, None, :] - centers[None, :, :], axis=2)
new_pred = xp.argmin(distances, axis=1)
# If the label is not changed for each sample, we suppose the
# algorithm has converged and exit from the loop.
if xp.all(new_pred == pred):
break
pred = new_pred
# Compute the new centroid for each cluster.
i = xp.arange(n_clusters)
mask = pred == i[:, None]
sums = xp.where(mask[:, :, None], X, 0).sum(axis=1)
counts = xp.count_nonzero(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def fit_custom(X, n_clusters, max_iter):
assert X.ndim == 2
n_samples = len(X)
pred = cupy.zeros(n_samples,dtype='float32')
initial_indexes = cupy.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
new_pred = cupy.argmin(distances, axis=1)
if cupy.all(new_pred == pred):
break
pred = new_pred
i = cupy.arange(n_clusters)
mask = pred == i[:, None]
sums = sum_kernel(X, mask[:, :, None], axis=1)
counts = count_kernel(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def draw(X, n_clusters, centers, pred, output):
# Plot the samples and centroids of the fitted clusters into an image file.
for i in range(n_clusters):
labels = X[pred == i]
plt.scatter(labels[:, 0], labels[:, 1], c=numpy.random.rand(3))
plt.scatter(
centers[:, 0], centers[:, 1], s=120, marker='s', facecolors='y',
edgecolors='k')
plt.savefig(output)
def run_cpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with timer(' CPU '):
centers, pred = fit_xp(X_train, n_clusters, max_iter)
def run_gpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with cupy.cuda.Device(gpuid):
X_train = cupy.asarray(X_train)
with timer(' GPU '):
if use_custom_kernel:
centers, pred = fit_custom(X_train, n_clusters, max_iter)
else:
centers, pred = fit_xp(X_train, n_clusters, max_iter)
btw i am working in colab pro 25GB(RAM), the code is working with n_clusters=200 and num= 1000000 but if i use bigger numbers the error appear, i am running the code like this:
run_gpu(0,200,1000000,10,True)
This is the error that i have
Any suggestion will be welcome, thanks for your time.
Assuming that CuPy is clever enough not to create explicit copies of the broadcasted input of var_kernel, the output distances has to have a size of 2 * num * num_clusters which are exactly the 6,400,000,000 Bytes it is trying to allocate. You could have a way smaller memory footprint by never actually writing the distances to memory which means fusing the var_kernel with argmin. See this part of the docs.
If I understand the example there correctly, this should work:
#cupy.fuse(kernel_name='argmin_distance')
def argmin_distance(x1, y1, x2, y2):
return cupy.argmin((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2), axis = 1)
The next question would be where the other 13.7GB come from. A big part of them might just be the instances of distances from earlier iterations. I'm not a CuPy expert, but at least in Python/Numpy your use of distances inside the loop would not reuse the same memory, but allocate more memory each time you call the var_kernel. The same problem is visible with pred which is allocated before the loop. If CuPy does things the Numpy way, the solution would be to just put [:] in there like
pred[:] = new_pred
or
distances[:,:,:] = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
For this to work, you need to allocate distances before the loop as well. Also this isn't needed anymore when using kernel fusion, so just take it as an example. It may be best to allocate everything beforehand and then use this syntax everywhere in the loop.
I don't know enough about CuPy to answer why fit_xp doesn't have the same problem (or does it?). But my guess would be that garbage collection with CuPy objects works differently there. If garbage collection were "quick enough" in fit_custom it should work even without kernel fusion or reusing already allocated arrays.
Other problems or at least oddities with your code:
Why are you comparing the zeroth coordinate of centers with the first coordinate of X? Wouldn't it make more sense to call
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 0], centers[None, :, 1])
Why are you creating 3D data when only using the projection on the 2D plane? So why not
samples = numpy.random.randn(num, 2)
Why are you using floats for (the initial version of) pred? The argmin should give an integer type result.
How can I select a random point on one image, then find its corresponding point on another image using cross-correlation?
So basically I have image1, I want to select a point on it (automatically) then find its corresponding/similar point on image2.
Here are some example images:
Full image:
Patch:
Result of cross correlation:
Well, xcorr2 can essentially be seen as analyzing all possible shifts in both positive and negative direction and giving a measure for how well they fit with each shift. Therefore for images of size N x N the result must have size (2*N-1) x (2*N-1), where the correlation at index [N, N] would be maximal if the two images where equal or not shifted. If they were shifted by 10 pixels, the maximum correlation would be at [N-10, N] and so on. Therefore you will need to subtract N to get the absolute shift.
With your actual code it would probably be easier to help. But let's look at an example:
(A) We read an image and select two different sub-images with offsets da and db
Orig = imread('rice.png');
N = 200; range = 1:N;
da = [0 20];
db = [30 30];
A=Orig(da(1) + range, da(2) + range);
B=Orig(db(1) + range, db(2) + range);
(b) Calculate cross-correlation and find maximum
X = normxcorr2(A, B);
m = max(X(:));
[i,j] = find(X == m);
(C) Patch them together using recovered shift
R = zeros(2*N, 2*N);
R(N + range, N + range) = B;
R(i + range, j + range) = A;
(D) Illustrate things
figure
subplot(2,2,1), imagesc(A)
subplot(2,2,2), imagesc(B)
subplot(2,2,3), imagesc(X)
rectangle('Position', [j-1 i-1 2 2]), line([N j], [N i])
subplot(2,2,4), imagesc(R);
(E) Compare intentional shift with recovered shift
delta_orig = da - db
%--> [30 10]
delta_recovered = [i - N, j - N]
%--> [30 10]
As you see in (E) we get exactly the shift we intenionally introduced in (A).
Or adjusted to your case:
full=rgb2gray(imread('a.jpg'));
template=rgb2gray(imread('b.jpg'));
S_full = size(full);
S_temp = size(template);
X=normxcorr2(template, full);
m=max(X(:));
[i,j]=find(X==m);
figure, colormap gray
subplot(2,2,1), title('full'), imagesc(full)
subplot(2,2,2), title('template'), imagesc(template),
subplot(2,2,3), imagesc(X), rectangle('Position', [j-20 i-20 40 40])
R = zeros(S_temp);
shift_a = [0 0];
shift_b = [i j] - S_temp;
R((1:S_full(1))+shift_a(1), (1:S_full(2))+shift_a(2)) = full;
R((1:S_temp(1))+shift_b(1), (1:S_temp(2))+shift_b(2)) = template;
subplot(2,2,4), imagesc(R);
However, for this method to work properly the patch (template) and the full image should be scaled to the same resolution.
A more detailed example can also be found here.
How can I select a random point on one image, then find its corresponding point on another image using cross-correlation?
So basically I have image1, I want to select a point on it (automatically) then find its corresponding/similar point on image2.
Here are some example images:
Full image:
Patch:
Result of cross correlation:
Well, xcorr2 can essentially be seen as analyzing all possible shifts in both positive and negative direction and giving a measure for how well they fit with each shift. Therefore for images of size N x N the result must have size (2*N-1) x (2*N-1), where the correlation at index [N, N] would be maximal if the two images where equal or not shifted. If they were shifted by 10 pixels, the maximum correlation would be at [N-10, N] and so on. Therefore you will need to subtract N to get the absolute shift.
With your actual code it would probably be easier to help. But let's look at an example:
(A) We read an image and select two different sub-images with offsets da and db
Orig = imread('rice.png');
N = 200; range = 1:N;
da = [0 20];
db = [30 30];
A=Orig(da(1) + range, da(2) + range);
B=Orig(db(1) + range, db(2) + range);
(b) Calculate cross-correlation and find maximum
X = normxcorr2(A, B);
m = max(X(:));
[i,j] = find(X == m);
(C) Patch them together using recovered shift
R = zeros(2*N, 2*N);
R(N + range, N + range) = B;
R(i + range, j + range) = A;
(D) Illustrate things
figure
subplot(2,2,1), imagesc(A)
subplot(2,2,2), imagesc(B)
subplot(2,2,3), imagesc(X)
rectangle('Position', [j-1 i-1 2 2]), line([N j], [N i])
subplot(2,2,4), imagesc(R);
(E) Compare intentional shift with recovered shift
delta_orig = da - db
%--> [30 10]
delta_recovered = [i - N, j - N]
%--> [30 10]
As you see in (E) we get exactly the shift we intenionally introduced in (A).
Or adjusted to your case:
full=rgb2gray(imread('a.jpg'));
template=rgb2gray(imread('b.jpg'));
S_full = size(full);
S_temp = size(template);
X=normxcorr2(template, full);
m=max(X(:));
[i,j]=find(X==m);
figure, colormap gray
subplot(2,2,1), title('full'), imagesc(full)
subplot(2,2,2), title('template'), imagesc(template),
subplot(2,2,3), imagesc(X), rectangle('Position', [j-20 i-20 40 40])
R = zeros(S_temp);
shift_a = [0 0];
shift_b = [i j] - S_temp;
R((1:S_full(1))+shift_a(1), (1:S_full(2))+shift_a(2)) = full;
R((1:S_temp(1))+shift_b(1), (1:S_temp(2))+shift_b(2)) = template;
subplot(2,2,4), imagesc(R);
However, for this method to work properly the patch (template) and the full image should be scaled to the same resolution.
A more detailed example can also be found here.
I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means algorithm.
Is the probability function used based on distance or Gaussian?
In the same time the most long distant point (From the other centroids) is picked for a new centroid.
I will appreciate a step by step explanation and an example. The one in Wikipedia is not clear enough. Also a very well commented source code would also help. If you are using 6 arrays then please tell us which one is for what.
Interesting question. Thank you for bringing this paper to my attention - K-Means++: The Advantages of Careful Seeding
In simple terms, cluster centers are initially chosen at random from the set of input observation vectors, where the probability of choosing vector x is high if x is not near any previously chosen centers.
Here is a one-dimensional example. Our observations are [0, 1, 2, 3, 4]. Let the first center, c1, be 0. The probability that the next cluster center, c2, is x is proportional to ||c1-x||^2. So, P(c2 = 1) = 1a, P(c2 = 2) = 4a, P(c2 = 3) = 9a, P(c2 = 4) = 16a, where a = 1/(1+4+9+16).
Suppose c2=4. Then, P(c3 = 1) = 1a, P(c3 = 2) = 4a, P(c3 = 3) = 1a, where a = 1/(1+4+1).
I've coded the initialization procedure in Python; I don't know if this helps you.
def initialize(X, K):
C = [X[0]]
for k in range(1, K):
D2 = scipy.array([min([scipy.inner(c-x,c-x) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
r = scipy.rand()
for j,p in enumerate(cumprobs):
if r < p:
i = j
break
C.append(X[i])
return C
EDIT with clarification: The output of cumsum gives us boundaries to partition the interval [0,1]. These partitions have length equal to the probability of the corresponding point being chosen as a center. So then, since r is uniformly chosen between [0,1], it will fall into exactly one of these intervals (because of break). The for loop checks to see which partition r is in.
Example:
probs = [0.1, 0.2, 0.3, 0.4]
cumprobs = [0.1, 0.3, 0.6, 1.0]
if r < cumprobs[0]:
# this event has probability 0.1
i = 0
elif r < cumprobs[1]:
# this event has probability 0.2
i = 1
elif r < cumprobs[2]:
# this event has probability 0.3
i = 2
elif r < cumprobs[3]:
# this event has probability 0.4
i = 3
One Liner.
Say we need to select 2 cluster centers, instead of selecting them all randomly{like we do in simple k means}, we will select the first one randomly, then find the points that are farthest to the first center{These points most probably do not belong to the first cluster center as they are far from it} and assign the second cluster center nearby those far points.
I have prepared a full source implementation of k-means++ based on the book "Collective Intelligence" by Toby Segaran and the k-menas++ initialization provided here.
Indeed there are two distance functions here. For the initial centroids a standard one is used based numpy.inner and then for the centroids fixation the Pearson one is used. Maybe the Pearson one can be also be used for the initial centroids. They say it is better.
from __future__ import division
def readfile(filename):
lines=[line for line in file(filename)]
rownames=[]
data=[]
for line in lines:
p=line.strip().split(' ') #single space as separator
#print p
# First column in each row is the rowname
rownames.append(p[0])
# The data for this row is the remainder of the row
data.append([float(x) for x in p[1:]])
#print [float(x) for x in p[1:]]
return rownames,data
from math import sqrt
def pearson(v1,v2):
# Simple sums
sum1=sum(v1)
sum2=sum(v2)
# Sums of the squares
sum1Sq=sum([pow(v,2) for v in v1])
sum2Sq=sum([pow(v,2) for v in v2])
# Sum of the products
pSum=sum([v1[i]*v2[i] for i in range(len(v1))])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/len(v1))
den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1)))
if den==0: return 0
return 1.0-num/den
import numpy
from numpy.random import *
def initialize(X, K):
C = [X[0]]
for _ in range(1, K):
#D2 = numpy.array([min([numpy.inner(c-x,c-x) for c in C]) for x in X])
D2 = numpy.array([min([numpy.inner(numpy.array(c)-numpy.array(x),numpy.array(c)-numpy.array(x)) for c in C]) for x in X])
probs = D2/D2.sum()
cumprobs = probs.cumsum()
#print "cumprobs=",cumprobs
r = rand()
#print "r=",r
i=-1
for j,p in enumerate(cumprobs):
if r 0:
for rowid in bestmatches[i]:
for m in range(len(rows[rowid])):
avgs[m]+=rows[rowid][m]
for j in range(len(avgs)):
avgs[j]/=len(bestmatches[i])
clusters[i]=avgs
return bestmatches
rows,data=readfile('/home/toncho/Desktop/data.txt')
kclust = kcluster(data,k=4)
print "Result:"
for c in kclust:
out = ""
for r in c:
out+=rows[r] +' '
print "["+out[:-1]+"]"
print 'done'
data.txt:
p1 1 5 6
p2 9 4 3
p3 2 3 1
p4 4 5 6
p5 7 8 9
p6 4 5 4
p7 2 5 6
p8 3 4 5
p9 6 7 8
I'm dealing with an image processing problem that I've simplified as follows. I have three 10x10 matrices, each with the values 1 or -1 in each cell. Each matrix has an irregular object located somewhere, and there is some noise in the matrix. I'd like to figure out how to find the optimal alignment of the matrices that would let me line up the objects so I can get their average.
With the 1/-1 coding, I know that the product of two matrices (using element-wise multiplication, not matrix multiplication) will yield 1 if there is a match between two multiplied cells and -1 if there is a mismatch, thus the sum of the products yields a measure of overlap. With this, I know I can try out all possible alignments of two matrices to find that which yields the optimal overlap, but I'm not sure how to do this with 3 matrices (or more - I really have 20+ in my actual data set).
To help clarify the problem, here is some code, written in R, that sets up the sort of matricies I'm dealing with:
#set up the 3 matricies
m1 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m1 = matrix(m1,10)
m2 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m2 = matrix(m2,10)
m3 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m3 = matrix(m3,10)
#show the matricies
image(m1)
image(m2)
image(m3)
#notice there's a "+" shaped object in each
#create noise
set.seed(1)
n1 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n1 = matrix(n1,10)
n2 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n2 = matrix(n2,10)
n3 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n3 = matrix(n3,10)
#add noise to the matricies
mn1 = m1*n1
mn2 = m2*n2
mn3 = m3*n3
#show the noisy matricies
image(mn1)
image(mn2)
image(mn3)
Here is a program in Mathematica that does what you want (I think).
I may explain it in more detail, if you need.
(*define temp tables*)
r = m = Table[{}, {100}];
(*define noise function*)
noise := Partition[RandomVariate[BinomialDistribution[1, .05], 100],
10];
For[i = 1, i <= 100, i++,
(*generate 100 10x10 matrices with the random cross and noise added*)
w = RandomInteger[6]; h = w = RandomInteger[6];
m[[i]] = (ArrayPad[CrossMatrix[4, 4], {{w, 6 - w}, {h, 6 - h}}] +
noise) /. 2 -> 1;
(*Select connected components in each matrix and keep only the biggest*)
id = Last#
Commonest[
Flatten#(mf =
MorphologicalComponents[m[[i]], CornerNeighbors -> False]), 2];
d = mf /. {id -> x, x_Integer -> 0} /. {x -> 1};
{minX, maxX, minY, maxY} =
{Min#Thread[g[#]] /. g -> First,
Max#Thread[g[#]] /. g -> First,
Min#Thread[g[#]] /. g -> Last,
Max#Thread[g[#]] /. g -> Last} &#Position[d, 1];
(*Trim the image of the biggest component *)
r[[i]] = d[[minX ;; maxX, minY ;; maxY]];
]
(*As the noise is low, the more repeated component is the image*)
MatrixPlot ## Commonest#r
Result: