Loop until matrix is full? - matrix

I have a conditional statement that adds row of binary values from matrix A to matrix B. I want to put this in a loop so that it continues to add rows from matrix A until matrix B is full. Currently matrix B is initialized as 10 by 10 matrix of zeros. Do I need to initialize matrix B differently in order to create this condition or is there a way of doing it as is?
Below is roughly how my code looks so far
from random import sample
import numpy as np
matrixA = np.random.randint(2, size=(10,10))
matrixB = np.zeros((10,10))
x, y = sample(range(1, 10), k=2)
if someCondition:
matrixB = np.append(matrixB, [matrixA[x]], axis=0)
else:
matrixB = np.append(matrixB, [matrixA[y]], axis=0)

you don't need a loop for it. It is really easy to just do it using smart indexing. For example:
import numpy as np
A = np.random.randint(0, 10, size=(20,10))
B = np.empty((10, 10))
print(A)
# Copy till the row that satisfies your conditions. Here I assume it to be 10
B = A[:10, :]
print(B)

Related

Generate matrix in Keras based on row and column

How I could create a layer in Keras that ouputs matrix given dimensions (e.g. m, n) with cells having a value based on the row and column?
Here is the forumula:
A[i, 2j] = i / (10**(2*j))
A[i, 2j+1] = i / (10**(2*j))
I tried to look on the lamba function but it seems Keras passed only the cell value and not the indices! Any other options (not a loop)
You could do the following:
from keras.layers import Input
import keras.backend as K
import numpy as np
def CustomConstantInput(m, n):
x = np.arange(m)
y = 10 ** (2 * (np.arange(n) // 2))
matrix = x[:, None] / y[None, :]
print(matrix)
fixed_input = K.constant(matrix)
return Input(tensor=fixed_input)
t = CustomConstantInput(3, 4)

tf.boolean_mask, mask_dimension must be specified?

When using tf.boolean_mask(), a Value Error is raised. It reads "Number of mask dimensions must be specified, even if some dimensions are None. E.g. shape=[None] is ok, but shape=None is not.
I suspect that something is going wrong when I create my boolean mask s, because when I just create a boolean mask by hand, all works fine. However, I've checked the shape and the dtype of s so far, and couldn't notice anything suspicious. Both seemed to be identical to the shape and type of the boolean mask I created by hand.
Please see a screenshot of the problem.
The following should allow you to reproduce the error on your machine. You need tensorflow, numpy and scipy.
with tf.Session() as sess:
# receive five embedded vectors
v0 = tf.constant([[3.0,1.0,2.,4.,2.]])
v1 = tf.constant([[4.0,0,1.0,4,1.]])
v2 = tf.constant([[1.0,1.0,0.0,4.,8.]])
v3 = tf.constant([[1.,4,2.,5.,2.]])
v4 = tf.constant([[3.,2.,3.,2.,5.]])
# concatenate the five embedded vectors into a matrix
VT = tf.concat([v0,v1,v2,v3,v4],axis=0)
# perform SVD on the concatenated matrix
s, u1, u2 = tf.svd(VT)
e = tf.square(s) # list of eigenvalues
v = u1 # eigenvectors as column vectors
# sample a set
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
X = tf.boolean_mask(VT,s)
print(X.eval())
This is the code to generate s. s is a sample from a determinantal point process (for the mathematically interested).
Note that I'm using tf.py_func to wrap this python function:
import tensorflow as tf
import numpy as np
from scipy.linalg import orth
def sample_dpp_bin(e_val,e_vec):
# e_val = np.array of eigenvalues
# e_vec = array of eigenvectors (= column vectors)
eps = 0.01
# sample a set of eigenvectors
ind = (np.random.rand(len(e_val)) <= (e_val)/(1+e_val))
k = sum(ind)
if k == e_val.size:
return np.ones(e_val.size,dtype=bool) # check for full set
if k == 0:
return np.zeros(e_val.size,dtype=bool)
V = e_vec[:,np.array(ind)]
# sample a set of k items
sample = np.zeros(e_val.size,dtype=bool)
for l in range(k-1,-1,-1):
p = np.sum(V**2,axis=1)
p = np.cumsum(p / np.sum(p)) # item cumulative probabilities
i = int((np.random.rand() <= p).argmax()) # choose random item
sample[i] = True
j = (np.abs(V[i,:])>eps).argmax() # pick an eigenvector not orthogonal to e_i
Vj = V[:,j]
V = orth(V - (np.outer(Vj,(V[i,:]/Vj[i]))))
return sample
The output if I print s and tf.reshape(s) is
[False True True True True]
[5]
The output if I print VT and tf.reshape(VT) is
[[ 3. 1. 2. 4. 2.]
[ 4. 0. 1. 4. 1.]
[ 1. 1. 0. 4. 8.]
[ 1. 4. 2. 5. 2.]
[ 3. 2. 3. 2. 5.]]
[5 5]
Any help much appreciated.
Following example works for me.
import tensorflow as tf
import numpy as np
tensor = [[1, 2], [3, 4], [5, 6]]
mask = np.array([True, False, True])
t_m = tf.boolean_mask(tensor, mask)
sess = tf.Session()
print(sess.run(t_m))
Output:
[[1 2]
[5 6]]
Provide your runnable code snippet to reproduce the error. I think you might be doing something wrong in s.
Update:
s = tf.py_func(sample_dpp_bin,[e,v],tf.bool)
s_v = (s.eval())
X = tf.boolean_mask(VT,s_v)
print(X.eval())
mask should be a np array not TF tensor. You don't have to use tf.pyfunc.
The error message states that the shape of the mask is not defined. What do you get if you print tf.shape(s)? I'd bet the problem with your code is that the shape of s is completely unknown, and you could fix that with a simple call like s.set_shape((None)) (to simply specify that s is a 1-dimensional tensor). Consider this code snippet:
X = np.random.randint(0, 2, (100, 100, 3))
with tf.Session() as sess:
X_tf = tf.placeholder(tf.int8)
# X_tf.set_shape((None, None, None))
y = tf.greater(tf.reduce_max(X_tf, axis=(0, 1)), 0)
print(tf.shape(y))
z = tf.boolean_mask(X_tf, y, axis=2)
print(sess.run(z, feed_dict={X_tf: X}))
This prints a shape of Tensor("Shape_3:0", shape=(?,), dtype=int32) (i.e., even the dimensions of y are unknown) and returns the same error as you have. However, if you uncomment the set_shape line, then X_tf is known to be 3-dimensional and so s is 1-dimensional. The code then works. So, I think all you need to do is add a s.set_shape((None)) call after the py_func call.

Tensorflow tf.nn.in_top_k Error targets[0] is out of range

I have a tensorflow program with four output labels. I trained the model and am now evaluating separate data with it.
The issue is that after I use the code
import tensorflow as tf
import main
import Process
import Input
eval_dir = "/Users/Zanhuang/Desktop/NNP/model.ckpt-30"
checkpoint_dir = "/Users/Zanhuang/Desktop/NNP/checkpoint"
def evaluate():
with tf.Graph().as_default() as g:
images, labels = Process.eval_inputs()
forward_propgation_results = Process.forward_propagation(images)
init_op = tf.initialize_all_variables()
saver = tf.train.Saver()
top_k_op = tf.nn.in_top_k(forward_propgation_results, labels, 1)
with tf.Session(graph=g) as sess:
sess.run(init_op)
saver.restore(sess, eval_dir)
tf.train.start_queue_runners(sess=sess)
print(sess.run(top_k_op))
def main(argv=None):
evaluate()
if __name__ == '__main__':
tf.app.run()
In total, I only have one class.
My code for the error rate, where I introduce the labels in a one hot matrix is here:
def error(forward_propagation_results, labels):
labels = tf.one_hot(labels, 4)
tf.transpose(labels)
labels = tf.cast(labels, tf.float32)
mean_squared_error = tf.square(tf.sub(labels, forward_propagation_results))
cost = tf.reduce_mean(mean_squared_error)
train = tf.train.GradientDescentOptimizer(learning_rate = 0.05).minimize(cost)
tf.histogram_summary('accuracy', mean_squared_error)
tf.add_to_collection('losses', cost)
tf.scalar_summary('LOSS', cost)
return train, cost
The problem is invalid data in your labels tensor. From your comment, the labels tensor is a vector containing a single value: [40]. The value 40 is larger than the number of columns in the forward_propagation_result (which is 4).
The tf.nn.in_top_k(predictions, targets, k) op has the following behavior:
For each row predictions[i, :]:
result[i] is true if predictions[i, targets[i]] is one of the k largest elements in that row; otherwise it is false.
There is no value predictions[0, 40], because (as your comment shows) that argument is a 1 x 4 matrix. Therefore TensorFlow gives you an out of range error. This suggests that either your evaluation data are wrong, or you should be using a different evaluation function.

Getting element-wise equations of matrix multiplication in sympy

I've got 2 matrices, first of which is sparse with integer coefficients.
import sympy
A = sympy.eye(2)
A.row_op(1, lambda v, j: v + 2*A[0, j])
The 2nd is symbolic, and I perform an operation between them:
M = MatrixSymbol('M', 2, 1)
X = A * M + A.col(1)
Now, what I'd like is to get the element-wise equations:
X_{0,0} = A_{0,0}
X_{0,1} = 2*A_{0,0} + A_{0,1}
One way to do this is specifying a matrix in sympy with each element being an individual symbol:
rows = []
for i in range(shape[0]):
col = []
for j in range(shape[1]):
col.append(Symbol('%s_{%s,%d}' % (name,i,j)))
rows.append(col)
M = sympy.Matrix(rows)
Is there a way to do it with the MatrixSymbol above, and then get the resulting element-wise equations?
Turns out, this question has a very obvious answer:
MatrixSymbols in sympy can be indexed like a matrix, i.e.:
X[i,j]
gives the element-wise equations.
If one wants to subset more than one element, the MatrixSymbol must first be converted to a sympy.Matrix class:
X = sympy.Matrix(X)
X # lists all indices as `X[i, j]`
X[3:4,2] # arbitrary subsets are supported
Note that this does not allow all operations of a numpy array/matrix (such as indexing with a boolean equivalent), so you might be better of creating a numpy array with sympy symbols:
ijstr = lambda i,j: sympy.Symbol(name+"_{"+str(int(i))+","+str(int(j))+"}")
matrix = np.matrix(np.fromfunction(np.vectorize(ijstr), shape))

numpy: evaluating function in matrix, using previous array as argument in calculating the next

I have an m x n array: a, where the integers m > 1E6, and n <= 5.
I have functions F and G, which are composed like this: F( u, G ( u, t)). u is a 1 x n array, t is a scalar, and F and G returns 1 x n arrays.
I need to evaluate each row of a in F, and use previously evaluated row as the u-array for the next evaluation. I need to make m such evaluations.
This has to be really fast. I was previously impressed by scitools.std StringFunction evaluaion for a whole array, but this problem requires using the previously calculated array as an argument in calculating the next. I don't know if StringFunction can do this.
For example:
a = zeros((1000000, 4))
a[0] = asarray([1.,69.,3.,4.1])
# A is a float defined elsewhere, h is a function which accepts a float as its argument and returns an arbitrary float. h is defined elsewhere.
def G(u, t):
return asarray([u[0], u[1]*A, cos(u[2]), t*h(u[3])])
def F(u, t):
return u + G(u, t)
dt = 1E-6
for i in range(1, 1000000):
a[i] = F(a[i-1], i*dt)
i += 1
The problem with the above code is that it is slow as hell. I need to get these calculations done by numpy milliseconds.
How can I do what I want?
Thank you for our time.
Kind regards,
Marius
This sort of thing is very difficult to do in numpy. If we look at this by column we see a few simpler solutions.
a[:,0] is very easy:
col0 = np.ones((1000))*2
col0[0] = 1 #Or whatever start value.
np.cumprod(col0, out=col0)
np.allclose(col0, a[:1000,0])
True
As mentioned earlier this will overflow very quickly. a[:,1] can be done much along the same lines.
I do not believe there is a way to do the next two columns inside numpy alone quickly. We can turn to numba for this:
from numba import auotojit
def python_loop(start, count):
out = np.zeros((count), dtype=np.double)
out[0] = start
for x in xrange(count-1):
out[x+1] = out[x] + np.cos(out[x+1])
return out
numba_loop = autojit(python_loop)
np.allclose(numba_loop(3,1000),a[:1000,2])
True
%timeit python_loop(3,1000000)
1 loops, best of 3: 4.14 s per loop
%timeit numba_loop(3,1000000)
1 loops, best of 3: 42.5 ms per loop
Although its worth pointing out that this converges to pi/2 very very quickly and there is little point in calculating this recursion past ~20 values for any start value. This returns the exact same answer to double point precision- I didn't bother finding the cutoff, but it is much less then 50:
%timeit tmp = np.empty((1000000));
tmp[:50] = numba_loop(3,50);
tmp[50:] = np.pi/2
100 loops, best of 3: 2.25 ms per loop
You can do something similar with the fourth column. Of course you can autojit all of the functions, but this gives you several different options to try out depending on numba usage:
Use cumprod for the first two columns
Use an approximation for column 3 (and possible 4) where only the first few iterations are calculated
Implement columns 3 and 4 in numba using autojit
Wrap everything inside of an autojit loop (the best option)
The way you have presented this all rows past ~200 will either be np.inf or np.pi/2. Exploit this.
Slightly faster. Your first column is basicly 2^n. Calculating 2^n for n up to 1000000 is gonna overflow.. second column is even worse.
def calc(arr, t0=1E-6):
u = arr[0]
dt = 1E-6
h = lambda x: np.random.random(1)*50.0
def firstColGen(uStart):
u = uStart
while True:
u += u
yield u
def secondColGen(uStart, A):
u = uStart
while True:
u += u*A
yield u
def thirdColGen(uStart):
u = uStart
while True:
u += np.cos(u)
yield u
def fourthColGen(uStart, h, t0, dt):
u = uStart
t = t0
while True:
u += h(u) * dt
t += dt
yield u
first = firstColGen(u[0])
second = secondColGen(u[1], A)
third = thirdColGen(u[2])
fourth = fourthColGen(u[3], h, t0, dt)
for i in xrange(1, len(arr)):
arr[i] = [first.next(), second.next(), third.next(), fourth.next()]

Resources