tf.boolean_mask, mask_dimension must be specified? - filter

When using tf.boolean_mask(), a Value Error is raised. It reads "Number of mask dimensions must be specified, even if some dimensions are None. E.g. shape=[None] is ok, but shape=None is not.
I suspect that something is going wrong when I create my boolean mask s, because when I just create a boolean mask by hand, all works fine. However, I've checked the shape and the dtype of s so far, and couldn't notice anything suspicious. Both seemed to be identical to the shape and type of the boolean mask I created by hand.
Please see a screenshot of the problem.
The following should allow you to reproduce the error on your machine. You need tensorflow, numpy and scipy.
with tf.Session() as sess:
# receive five embedded vectors
v0 = tf.constant([[3.0,1.0,2.,4.,2.]])
v1 = tf.constant([[4.0,0,1.0,4,1.]])
v2 = tf.constant([[1.0,1.0,0.0,4.,8.]])
v3 = tf.constant([[1.,4,2.,5.,2.]])
v4 = tf.constant([[3.,2.,3.,2.,5.]])
# concatenate the five embedded vectors into a matrix
VT = tf.concat([v0,v1,v2,v3,v4],axis=0)
# perform SVD on the concatenated matrix
s, u1, u2 = tf.svd(VT)
e = tf.square(s) # list of eigenvalues
v = u1 # eigenvectors as column vectors
# sample a set
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
X = tf.boolean_mask(VT,s)
print(X.eval())
This is the code to generate s. s is a sample from a determinantal point process (for the mathematically interested).
Note that I'm using tf.py_func to wrap this python function:
import tensorflow as tf
import numpy as np
from scipy.linalg import orth
def sample_dpp_bin(e_val,e_vec):
# e_val = np.array of eigenvalues
# e_vec = array of eigenvectors (= column vectors)
eps = 0.01
# sample a set of eigenvectors
ind = (np.random.rand(len(e_val)) <= (e_val)/(1+e_val))
k = sum(ind)
if k == e_val.size:
return np.ones(e_val.size,dtype=bool) # check for full set
if k == 0:
return np.zeros(e_val.size,dtype=bool)
V = e_vec[:,np.array(ind)]
# sample a set of k items
sample = np.zeros(e_val.size,dtype=bool)
for l in range(k-1,-1,-1):
p = np.sum(V**2,axis=1)
p = np.cumsum(p / np.sum(p)) # item cumulative probabilities
i = int((np.random.rand() <= p).argmax()) # choose random item
sample[i] = True
j = (np.abs(V[i,:])>eps).argmax() # pick an eigenvector not orthogonal to e_i
Vj = V[:,j]
V = orth(V - (np.outer(Vj,(V[i,:]/Vj[i]))))
return sample
The output if I print s and tf.reshape(s) is
[False True True True True]
[5]
The output if I print VT and tf.reshape(VT) is
[[ 3. 1. 2. 4. 2.]
[ 4. 0. 1. 4. 1.]
[ 1. 1. 0. 4. 8.]
[ 1. 4. 2. 5. 2.]
[ 3. 2. 3. 2. 5.]]
[5 5]
Any help much appreciated.

Following example works for me.
import tensorflow as tf
import numpy as np
tensor = [[1, 2], [3, 4], [5, 6]]
mask = np.array([True, False, True])
t_m = tf.boolean_mask(tensor, mask)
sess = tf.Session()
print(sess.run(t_m))
Output:
[[1 2]
[5 6]]
Provide your runnable code snippet to reproduce the error. I think you might be doing something wrong in s.
Update:
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
s_v = (s.eval())
X = tf.boolean_mask(VT,s_v)
print(X.eval())
mask should be a np array not TF tensor. You don't have to use tf.pyfunc.

The error message states that the shape of the mask is not defined. What do you get if you print tf.shape(s)? I'd bet the problem with your code is that the shape of s is completely unknown, and you could fix that with a simple call like s.set_shape((None)) (to simply specify that s is a 1-dimensional tensor). Consider this code snippet:
X = np.random.randint(0, 2, (100, 100, 3))
with tf.Session() as sess:
X_tf = tf.placeholder(tf.int8)
# X_tf.set_shape((None, None, None))
y = tf.greater(tf.reduce_max(X_tf, axis=(0, 1)), 0)
print(tf.shape(y))
z = tf.boolean_mask(X_tf, y, axis=2)
print(sess.run(z, feed_dict={X_tf: X}))
This prints a shape of Tensor("Shape_3:0", shape=(?,), dtype=int32) (i.e., even the dimensions of y are unknown) and returns the same error as you have. However, if you uncomment the set_shape line, then X_tf is known to be 3-dimensional and so s is 1-dimensional. The code then works. So, I think all you need to do is add a s.set_shape((None)) call after the py_func call.

Related

Why do i have "OutOfMemoryError" in my Kmeans CuPy code?

im really new for gpu coding i found this Kmeans cupy code my propouse is work with a large data base (n,3) for example to realize about the timing difference on gpu and cpu , i wanna have a huge number of clusters but i am getting a memory management error. Can someone give me the route I should take to research and fix it, i already research but i have not a clear start yet.
import contextlib
import time
import cupy
import matplotlib.pyplot as plt
import numpy
#contextlib.contextmanager
def timer(message):
cupy.cuda.Stream.null.synchronize()
start = time.time()
yield
cupy.cuda.Stream.null.synchronize()
end = time.time()
print('%s: %f sec' % (message, end - start))
var_kernel = cupy.ElementwiseKernel(
'T x0, T x1, T c0, T c1', 'T out',
'out = (x0 - c0) * (x0 - c0) + (x1 - c1) * (x1 - c1)',
'var_kernel'
)
sum_kernel = cupy.ReductionKernel(
'T x, S mask', 'T out',
'mask ? x : 0',
'a + b', 'out = a', '0',
'sum_kernel'
)
count_kernel = cupy.ReductionKernel(
'T mask', 'float32 out',
'mask ? 1.0 : 0.0',
'a + b', 'out = a', '0.0',
'count_kernel'
)
def fit_xp(X, n_clusters, max_iter):
assert X.ndim == 2
# Get NumPy or CuPy module from the supplied array.
xp = cupy.get_array_module(X)
n_samples = len(X)
# Make an array to store the labels indicating which cluster each sample is
# contained.
pred = xp.zeros(n_samples)
# Choose the initial centroid for each cluster.
initial_indexes = xp.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
# Compute the new label for each sample.
distances = xp.linalg.norm(X[:, None, :] - centers[None, :, :], axis=2)
new_pred = xp.argmin(distances, axis=1)
# If the label is not changed for each sample, we suppose the
# algorithm has converged and exit from the loop.
if xp.all(new_pred == pred):
break
pred = new_pred
# Compute the new centroid for each cluster.
i = xp.arange(n_clusters)
mask = pred == i[:, None]
sums = xp.where(mask[:, :, None], X, 0).sum(axis=1)
counts = xp.count_nonzero(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def fit_custom(X, n_clusters, max_iter):
assert X.ndim == 2
n_samples = len(X)
pred = cupy.zeros(n_samples,dtype='float32')
initial_indexes = cupy.random.choice(n_samples, n_clusters, replace=False)
centers = X[initial_indexes]
for _ in range(max_iter):
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
new_pred = cupy.argmin(distances, axis=1)
if cupy.all(new_pred == pred):
break
pred = new_pred
i = cupy.arange(n_clusters)
mask = pred == i[:, None]
sums = sum_kernel(X, mask[:, :, None], axis=1)
counts = count_kernel(mask, axis=1).reshape((n_clusters, 1))
centers = sums / counts
return centers, pred
def draw(X, n_clusters, centers, pred, output):
# Plot the samples and centroids of the fitted clusters into an image file.
for i in range(n_clusters):
labels = X[pred == i]
plt.scatter(labels[:, 0], labels[:, 1], c=numpy.random.rand(3))
plt.scatter(
centers[:, 0], centers[:, 1], s=120, marker='s', facecolors='y',
edgecolors='k')
plt.savefig(output)
def run_cpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with timer(' CPU '):
centers, pred = fit_xp(X_train, n_clusters, max_iter)
def run_gpu(gpuid, n_clusters, num, max_iter, use_custom_kernel):##, output
samples = numpy.random.randn(num, 3)
X_train = numpy.r_[samples + 1, samples - 1]
with cupy.cuda.Device(gpuid):
X_train = cupy.asarray(X_train)
with timer(' GPU '):
if use_custom_kernel:
centers, pred = fit_custom(X_train, n_clusters, max_iter)
else:
centers, pred = fit_xp(X_train, n_clusters, max_iter)
btw i am working in colab pro 25GB(RAM), the code is working with n_clusters=200 and num= 1000000 but if i use bigger numbers the error appear, i am running the code like this:
run_gpu(0,200,1000000,10,True)
This is the error that i have
Any suggestion will be welcome, thanks for your time.
Assuming that CuPy is clever enough not to create explicit copies of the broadcasted input of var_kernel, the output distances has to have a size of 2 * num * num_clusters which are exactly the 6,400,000,000 Bytes it is trying to allocate. You could have a way smaller memory footprint by never actually writing the distances to memory which means fusing the var_kernel with argmin. See this part of the docs.
If I understand the example there correctly, this should work:
#cupy.fuse(kernel_name='argmin_distance')
def argmin_distance(x1, y1, x2, y2):
return cupy.argmin((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2), axis = 1)
The next question would be where the other 13.7GB come from. A big part of them might just be the instances of distances from earlier iterations. I'm not a CuPy expert, but at least in Python/Numpy your use of distances inside the loop would not reuse the same memory, but allocate more memory each time you call the var_kernel. The same problem is visible with pred which is allocated before the loop. If CuPy does things the Numpy way, the solution would be to just put [:] in there like
pred[:] = new_pred
or
distances[:,:,:] = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 1], centers[None, :, 0])
For this to work, you need to allocate distances before the loop as well. Also this isn't needed anymore when using kernel fusion, so just take it as an example. It may be best to allocate everything beforehand and then use this syntax everywhere in the loop.
I don't know enough about CuPy to answer why fit_xp doesn't have the same problem (or does it?). But my guess would be that garbage collection with CuPy objects works differently there. If garbage collection were "quick enough" in fit_custom it should work even without kernel fusion or reusing already allocated arrays.
Other problems or at least oddities with your code:
Why are you comparing the zeroth coordinate of centers with the first coordinate of X? Wouldn't it make more sense to call
distances = var_kernel(X[:, None, 0], X[:, None, 1],
centers[None, :, 0], centers[None, :, 1])
Why are you creating 3D data when only using the projection on the 2D plane? So why not
samples = numpy.random.randn(num, 2)
Why are you using floats for (the initial version of) pred? The argmin should give an integer type result.

Loop until matrix is full?

I have a conditional statement that adds row of binary values from matrix A to matrix B. I want to put this in a loop so that it continues to add rows from matrix A until matrix B is full. Currently matrix B is initialized as 10 by 10 matrix of zeros. Do I need to initialize matrix B differently in order to create this condition or is there a way of doing it as is?
Below is roughly how my code looks so far
from random import sample
import numpy as np
matrixA = np.random.randint(2, size=(10,10))
matrixB = np.zeros((10,10))
x, y = sample(range(1, 10), k=2)
if someCondition:
matrixB = np.append(matrixB, [matrixA[x]], axis=0)
else:
matrixB = np.append(matrixB, [matrixA[y]], axis=0)
you don't need a loop for it. It is really easy to just do it using smart indexing. For example:
import numpy as np
A = np.random.randint(0, 10, size=(20,10))
B = np.empty((10, 10))
print(A)
# Copy till the row that satisfies your conditions. Here I assume it to be 10
B = A[:10, :]
print(B)

Pairwise Cosine Similarity using TensorFlow

How can we efficiently calculate pairwise cosine distances in a matrix using TensorFlow? Given an MxN matrix, the result should be an MxM matrix, where the element at position [i][j] is the cosine distance between i-th and j-th rows/vectors in the input matrix.
This can be done with Scikit-Learn fairly easily as follows:
from sklearn.metrics.pairwise import pairwise_distances
pairwise_distances(input_matrix, metric='cosine')
Is there an equivalent method in TensorFlow?
There is an answer for getting a single cosine distance here: https://stackoverflow.com/a/46057597/288875 . This is based on tf.losses.cosine_distance .
Here is a solution which does this for matrices:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
M = 3
# input
input = tf.placeholder(tf.float32, shape = (M, M))
# normalize each row
normalized = tf.nn.l2_normalize(input, dim = 1)
# multiply row i with row j using transpose
# element wise product
prod = tf.matmul(normalized, normalized,
adjoint_b = True # transpose second matrix
)
dist = 1 - prod
input_matrix = np.array(
[[ 1, 1, 1 ],
[ 0, 1, 1 ],
[ 0, 0, 1 ],
],
dtype = 'float32')
print "input_matrix:"
print input_matrix
from sklearn.metrics.pairwise import pairwise_distances
print "sklearn:"
print pairwise_distances(input_matrix, metric='cosine')
print "tensorflow:"
print sess.run(dist, feed_dict = { input : input_matrix })
which gives me:
input_matrix:
[[ 1. 1. 1.]
[ 0. 1. 1.]
[ 0. 0. 1.]]
sklearn:
[[ 0. 0.18350345 0.42264974]
[ 0.18350345 0. 0.29289323]
[ 0.42264974 0.29289323 0. ]]
tensorflow:
[[ 5.96046448e-08 1.83503449e-01 4.22649741e-01]
[ 1.83503449e-01 5.96046448e-08 2.92893231e-01]
[ 4.22649741e-01 2.92893231e-01 0.00000000e+00]]
Note that this solution may not be the optimal one as it calculates all entries of the (symmetric) result matrix, i.e. does almost twice of the calculations. This is likely not a problem for small matrices, for large matrices a combination of loops may be faster.
Note also that this does not have a minibatch dimension so works for a single matrix only.
Elegant solution (output is the same as from scikit-learn pairwise_distances function):
def compute_cosine_distances(a, b):
# x shape is n_a * dim
# y shape is n_b * dim
# results shape is n_a * n_b
normalize_a = tf.nn.l2_normalize(a,1)
normalize_b = tf.nn.l2_normalize(b,1)
distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)
return distance
test
input_matrix = np.array([[1, 1, 1],
[0, 1, 1],
[0, 0, 1]], dtype = 'float32')
compute_cosine_distances(input_matrix, input_matrix)
output:
<tf.Tensor: id=442, shape=(3, 3), dtype=float32, numpy=
array([[5.9604645e-08, 1.8350345e-01, 4.2264974e-01],
[1.8350345e-01, 5.9604645e-08, 2.9289323e-01],
[4.2264974e-01, 2.9289323e-01, 0.0000000e+00]], dtype=float32)>

Enumerate through variable (porting PyMC to PyMC3)

I'm starting out with PyMC3 by translating this code from PyMC to PyMC3.
I'm not sure how to translate this segment:
v = pymc.Beta('v', alpha=1, beta=alpha, size=N_dp)
#pymc.deterministic
def p(v=v):
""" Calculate Dirichlet probabilities """
# Probabilities from betas
# this line creates the error:
value = [u*np.prod(1-v[:i]) for i,u in enumerate(v)]
# Enforce sum to unity constraint
value[-1] = 1-sum(value[:-1])
return value
z = pymc.Categorical('z', p, size=len(set(counties)))
I assume I have to replace p in the last line with p(v) and remove the #pymc.deterministic but the problem seems to be that I cannot enumerate through v: ValueError: length not known: ViewOp [id A] 'v'.
Can someone show me how to do the translation or link me to the relevant bit in the documentation? Thanks.
The Dirichlet distribution is actually built into pymc3, so that whole code block can be replaced by:
with pm.Model():
...
v = pm.Beta('v', alpha=1, beta=alpha, shape=N_dp)
p = pm.Dirichlet('p', a=v, shape=N_dp)
...
trace = pm.sample(20000)

In Julia, How can I column-normalize a sparse matrix?

If I have constructed a sparse matrix using the sparse(i, j, k) constructor, how can I then normalize the columns of the matrix (so that each column sums to 1)? I cannot efficiently normalize the entries before I create the matrix, so any help is appreciated. Thanks!
The easiest way would be a broadcasting division by the sum of the columns:
julia> A = sprand(4,5,.5)
A./sum(A,1)
4x5 Array{Float64,2}:
0.0 0.0989976 0.0 0.0 0.0795486
0.420754 0.458653 0.0986313 0.0 0.0
0.0785525 0.442349 0.0 0.856136 0.920451
0.500693 0.0 0.901369 0.143864 0.0
… but it looks like that hasn't been optimized for sparse matrices yet, and falls back to a full matrix. So a simple loop to iterate over the columns does the trick:
julia> for (col,s) in enumerate(sum(A,1))
s == 0 && continue # What does a "normalized" column with a sum of zero look like?
A[:,col] = A[:,col]/s
end
A
4x5 sparse matrix with 12 Float64 entries:
[2, 1] = 0.420754
[3, 1] = 0.0785525
[4, 1] = 0.500693
[1, 2] = 0.0989976
[2, 2] = 0.458653
[3, 2] = 0.442349
[2, 3] = 0.0986313
[4, 3] = 0.901369
[3, 4] = 0.856136
[4, 4] = 0.143864
[1, 5] = 0.0795486
[3, 5] = 0.920451
julia> sum(A,1)
1x5 Array{Float64,2}:
1.0 1.0 1.0 1.0 1.0
This works entirely within sparse matrices and is done in-place (although it is still allocating new sparse matrices for each column slice).
Given a Matrix A (does not matter whether or not it is sparse) normalize by any dimension
A ./ sum(A,1) or A ./ sum(A,2)
to show that it works:
A = sprand(10,10,0.3)
println(sum(A,1))
println(A ./ sum(A,1))
only caveat
A[1,:] = 0
println(A ./ sum(A,1))
as you can see the column 1 now only contains NaNs because we divide by zero. Also we end up with a Matrix and not a sparse Matrix.
On the other hand one can quickly come up with an efficient specialized solution for your problem.
function normalize_columns(A :: SparseMatrixCSC)
sums = sum(A,1)
I,J,V = findnz(A)
for idx in 1:length(V)
V[idx] /= sums[J[idx]]
end
sparse(I,J,V)
end
#Matt B came up with a very similar answer while I was typing this up :)
Remember that sparse matrices in Julia are in compressed column form. So you can access the data directly:
for col = 1 : size(A, 2)
i = A.colptr[col]
k = A.colptr[col+1] - 1
n = i <= k ? norm(A.nzval[i:k]) : 0.0 # or whatever you like
n > 0.0 && (A.nzval[i:k] ./= n)
end
# get the column sums of A
S = vec(sum(A,1))
# get the nonzero entries in A. ei is row index, ej is col index, ev is the value in A
ei,ej,ev = findnz(A)
# get the number or rows and columns in A
m,n = size(A)
# create a new normalized matrix. For each nonzero index (ei,ej), its new value will be
# the old value divided by the sum of that column, which can be obtained by S[ej]
A_normalized = sparse(ei,ej,ev./S[ej],m,n)
the following gives what you want:
A = sprand(4,5,0.5)
B = A./sparse(sum(A,1))
The problem is that sum(A,1) gives a 1x5 dense array so combining with the sparse matrix A through the ./ operator gives a dense array. So you need to force it to be of sparse type. Or you can type
sparse(A ./ sum(A,1)).

Resources