Why are these two Numpy arrays of different shapes? - numpy-ndarray

Consider,
import numpy as np
Wb = np.array([
np.empty(3, dtype=np.ndarray),
np.empty(3, dtype=np.ndarray)
], dtype=np.ndarray)
print(Wb.shape) # (2, 3)
Wb = np.empty(2, dtype=np.ndarray)
Wb[0] = np.empty(3, dtype=np.ndarray)
Wb[1] = np.empty(3, dtype=np.ndarray)
print(Wb.shape) # (2,)
With the first way, I initialise Wb with a list of the same elements I assign manually in the second way. Should the shapes of final Wb be different? Why?

Related

same output (different probability) from keras sequential binary image classification model

try to build an image binary classification model using keras. Unfortunately, get a same output every time. The probability for each test sample was different, but they all favor one label.
The datasets are balanced. label L(n=250) vs. Label E(n=250): 300 for train, 100 for validate, 100 for test. There is no sample overlap among those groups.
After failing to predict the test dataset, I also used the training dataset for prediction which meant the model would make predictions for the samples that had just been trained. I know it does not make any sense. But it also got same output: Counter({0: 300}).
from keras.layers.core import Dense, Flatten, Dropout
from keras.layers.convolutional import Conv2D, MaxPooling2D, SeparableConv2D
import keras
from keras import layers
from skimage.transform import resize
import math
import os,random
import cv2
import numpy as np
import pandas as pd
from keras.models import Sequential
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix
from collections import Counter
import matplotlib.pyplot as plt
class DataGenerator(keras.utils.Sequence):
def __init__(self, datas, batch_size=32, shuffle=True):
self.batch_size = batch_size
self.datas = datas
self.indexes = np.arange(len(self.datas))
self.shuffle = shuffle
def __len__(self):
return math.ceil(len(self.datas) / float(self.batch_size))
def __getitem__(self, index):
batch_indexs = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
batch_datas = [self.datas[k] for k in batch_indexs]
X, y = self.data_generation(batch_datas)
return X, y
def on_epoch_end(self):
if self.shuffle == True:
np.random.shuffle(self.indexes)
def data_generation(self, batch_datas):
images = []
labels = []
for i, data in enumerate(batch_datas):
image = resize((cv2.imread(data)/255),(128, 128))
image = list(image)
images.append(image)
right = data.rfind("\\",0)
left = data.rfind("\\",0,right)+1
class_name = data[left:right]
if class_name=="e":
labels.append(0)
else:
labels.append(1)
return np.array(images), np.array(labels)
def create_model():
model = Sequential()
model.add(Conv2D(8, kernel_size=(3, 3),
input_shape=(128, 128, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, kernel_size=(3, 3),
padding="same",
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
return model
e_train = []
e_test = []
l_test = []
l_train = []
for file in os.listdir('\e\train')
e_train.append(os.path.join('\e\train',file))
for file in os.listdir('\e\test')
e_test.append(os.path.join('\e\test',file))
for file in os.listdir('\l\train')
l_train.append(os.path.join('\l\train',file))
for file in os.listdir('\l\test')
e_test.append(os.path.join('\l\test',file))
data_tr = e_train + l_train
data_te = e_test + l_test
g_te = DataGenerator(data_te)
kf = KFold(n_splits=4, shuffle=True, random_state=seed)
fold = 1
for train, test in kf.split(data_tr):
model = create_model()
g_tr = DataGenerator(data_tr[train])
g_v = DataGenerator(data_tr[test])
H = model.fit_generator(generator = g_tr, epochs=10,
validation_data = g_v, shuffle=False,
max_queue_size=10,workers=1)
pred = model.predict(g_te, max_queue_size=10, workers=1, verbose=1)
print(pred)
# the probability was different, but the right column of probability were always bigger
# [[0.49817565 0.5018243 ]
# [0.4872172 0.5127828 ]
# [0.48092505 0.519075 ]
predicted_class_indices = [np.argmax(probas) for probas in pred]
print(Counter(predicted_class_indices))
# the output was the same
# Counter({0: 100})
fold = fold + 1
Any thoughts would be appreciated.
Solution:
Instead of solving a binary classification problem, convert it into a multi-class problem with two classes. So, the output of the last layer will have a softmax activation which will provide a probability distribution for the classes. Refer this tutorial wherein you'll understand the changes which need to be made.
Explanation
You should not use the sigmoid activation in the output layer of the model, while using relu activation in the intermediate layers. The vanilla ReLU ( Rectified Linear Unit ) activation is defined as,
Hence, the range of the activation function is [ 0 , infinity ). On the other hand, considering a sigmoid activation,
The range of the sigmoid function is ( 0 , 1 ). So, if a large signal ( > 0 ) is passed through the sigmoid function, the output will be very close to 1 i.e. a fully saturated firing. The output of the relu function can provide a large signal from the intermediate layers, hence making a fully saturated firing ( or 1s ) at the output layer where the sigmoid activation is performed.
If the logits are [ 5.6 , 1.2 , 3.2 , 4.8 ], the output of the sigmoid function is,
[0.9963157 , 0.76852477, 0.96083426, 0.99183744]
and that of softmax is,
[0.6441953 , 0.00790901, 0.05844009, 0.2894557 ]

Loop until matrix is full?

I have a conditional statement that adds row of binary values from matrix A to matrix B. I want to put this in a loop so that it continues to add rows from matrix A until matrix B is full. Currently matrix B is initialized as 10 by 10 matrix of zeros. Do I need to initialize matrix B differently in order to create this condition or is there a way of doing it as is?
Below is roughly how my code looks so far
from random import sample
import numpy as np
matrixA = np.random.randint(2, size=(10,10))
matrixB = np.zeros((10,10))
x, y = sample(range(1, 10), k=2)
if someCondition:
matrixB = np.append(matrixB, [matrixA[x]], axis=0)
else:
matrixB = np.append(matrixB, [matrixA[y]], axis=0)
you don't need a loop for it. It is really easy to just do it using smart indexing. For example:
import numpy as np
A = np.random.randint(0, 10, size=(20,10))
B = np.empty((10, 10))
print(A)
# Copy till the row that satisfies your conditions. Here I assume it to be 10
B = A[:10, :]
print(B)

Tensorflow: Efficient multinomial sampling (Theano x50 faster?)

I want to be able to sample from a multinomial distribution very efficiently and apparently my TensorFlow code is very... very slow...
The idea is that, I have:
A vector: counts = [40, 50, 26, ..., 19] for example
A matrix of probabilities: probs = [[0.1, ..., 0.5], ... [0.3, ..., 0.02]] such that np.sum(probs, axis=1) = 1
Let's say len(counts) = N and len(probs) = (N, 50). What I want to do is (in our example):
sample 40 times from the first probability vector of the matrix probs
sample 50 times from the second probability vector of the matrix probs
...
sample 19 times from the Nth probability vector of the matrix probs
such that my final matrix looks like (for example):
A = [[22, ... 13], ..., [12, ..., 3]] where np.sum(A, axis=1) == counts
(i.e the sum over each row = the number in the corresponding row of counts vector)
Here is my TensorFlow code sample:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.distributions as ds
import time
nb_distribution = 100 # number of probability distributions
counts = np.random.randint(2000, 3500, size=nb_distribution) # define number of counts (vector of size 100 with int in 2000, 3500)
# print(u[:40]) # should be the same as the output of print(np.sum(res, 1)[:40]) in the tf.Session()
# probsn is a matrix of probability:
# each row of probsn contains a vector of size 30 that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30))
probsn /= np.sum(probsn, axis=1)[:, None]
counts = tf.Variable(counts, dtype=tf.float32)
probs = tf.Variable(tf.convert_to_tensor(probsn.astype(np.float32)))
# sample from the multinomial
dist = ds.Multinomial(total_count=counts, probs=probs)
out = dist.sample()
start = time.time()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(out)
# print(np.sum(res, 1)[:40])
print(time.time() - start)
elapsed time: 0.12 seconds
My equivalent code in Theano:
import numpy as np
import theano
from theano.tensor import _shared
nb_distribution = 100 # number of probability distributions
counts = np.random.randint(2000, 3500, size=nb_distribution)
#print(u[:40]) # should be the same as the output of print(np.sum(v_sample(), 1)[:40])
counts = _shared(counts) # define number of counts (vector of size 100 with int in 2000, 3500)
# probsn is a matrix of probability:
# each row of probsn contains a vector that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30))
probsn /= np.sum(probsn, axis=1)[:, None]
probsn = _shared(probsn)
from theano.tensor.shared_randomstreams import RandomStreams
np_rng = np.random.RandomState(12345)
theano_rng = RandomStreams(np_rng.randint(2 ** 30))
v_sample = theano.function(inputs=[], outputs=theano_rng.multinomial(n=counts, pvals=probsn))
start_t = time.time()
out = np.sum(v_sample(), 1)[:40]
# print(out)
print(time.time() - start_t)
elapsed time: 0.0025 seconds
Theano is like 100x faster... Is there something wrong with my TensorFlow code? How can I sample from a multinomial distribution efficiently in TensorFlow?
The problem is that the TensorFlow multinomial sample() method actually uses the method calls _sample_n(). This method is defined here. As we can see in the code to sample from the multinomial the code produces a matrix of one_hot for each row and then reduce the matrix into a vector by summing over the rows:
math_ops.reduce_sum(array_ops.one_hot(x, depth=k), axis=-2)
It is inefficient because it uses extra memory. To avoid this I have used the
tf.scatter_nd function. Here is a fully runnable example:
import tensorflow as tf
import numpy as np
import tensorflow.contrib.distributions as ds
import time
tf.reset_default_graph()
nb_distribution = 100 # number of probabilities distribution
u = np.random.randint(2000, 3500, size=nb_distribution) # define number of counts (vector of size 100 with int in 2000, 3500)
# probsn is a matrix of probability:
# each row of probsn contains a vector of size 30 that sums to 1
probsn = np.random.uniform(size=(nb_distribution, 30))
probsn /= np.sum(probsn, axis=1)[:, None]
counts = tf.Variable(u, dtype=tf.float32)
probs = tf.Variable(tf.convert_to_tensor(probsn.astype(np.float32)))
# sample from the multinomial
dist = ds.Multinomial(total_count=counts, probs=probs)
out = dist.sample()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(out) # if remove this line the code is slower...
start = time.time()
res = sess.run(out)
print(time.time() - start)
print(np.all(u == np.sum(res, axis=1)))
This code took 0.05 seconds to compute
def vmultinomial_sampling(counts, pvals, seed=None):
k = tf.shape(pvals)[1]
logits = tf.expand_dims(tf.log(pvals), 1)
def sample_single(args):
logits_, n_draw_ = args[0], args[1]
x = tf.multinomial(logits_, n_draw_, seed)
indices = tf.cast(tf.reshape(x, [-1,1]), tf.int32)
updates = tf.ones(n_draw_) # tf.shape(indices)[0]
return tf.scatter_nd(indices, updates, [k])
x = tf.map_fn(sample_single, [logits, counts], dtype=tf.float32)
return x
xx = vmultinomial_sampling(u, probsn)
# check = tf.expand_dims(counts, 1) * probs
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(xx) # if remove this line the code is slower...
start_t = time.time()
res = sess.run(xx)
print(time.time() -start_t)
#print(np.sum(res, axis=1))
print(np.all(u == np.sum(res, axis=1)))
This code took 0.016 seconds
The drawback is that my code doesn't actually parallelize the computation (even though parallel_iterations parameter is set to 10 by default in map_fn, putting it to 1 doesn't change anything...)
Maybe someone will find something better because it is still very slow as compare to Theano's implementation (due to the fact that it doesn't take advantage of the parallelization... and yet, here, parallelization makes sense because sampling one row is indenpendent from sampling another one...)

PyMC for Model Averaging

I am interested in applying PyMC to model averaging. My goal is to estimate many linear models and average estimates across them, weighting by their posterior model probabilities. I am currently using the Bayesian Information Criterion (BIC) to approximate the likelihood of my data (therefore, my analysis is not fully Bayesian). I have successfully simulated a Markov Chain of models using one of my own scripts but I want to use PyMC because it seems like a great tool.
In my attempts thus far, I have not been forming the Markov Chain correctly. I am not visiting models with higher posterior weights more often than others. I will include the example code below. Please also see the IPython notebook here! on github for the math markup and code together.
import numpy as np
from pymc import stochastic, DiscreteMetropolis, MCMC
import statsmodels.api as sm
import pandas as pd
import random
def pack(alist, rank):
binary = [str(1) if i in alist else str(0) for i in xrange(0,rank)]
string = '0b1'+''.join(binary)
return int(string, 2)
def unpack(integer):
string = bin(integer)[3:]
return [int(i) for i in xrange(len(string)) if string[i]=='1']
def make_bma():
# Simulating Data
size = 100
rank = 20
X = 10*np.random.randn(size, rank)
error = 30*np.random.randn(size,1)
coefficients = np.array([10, 2, 2, 2, 2, 2]).reshape((6,1))
y = np.dot(sm.add_constant(X[:,:5], prepend=True), coefficients) + error
# Number of allowable regressors
predictors = [3,4,5,6,7]
#stochastic(dtype=int)
def regression_model():
def logp(value):
columns = unpack(value)
x = sm.add_constant(X[:,columns], prepend=True)
corr = np.corrcoef(x[:,1:], rowvar=0)
prior = np.linalg.det(corr)
ols = sm.OLS(y,x).fit()
posterior = np.exp(-0.5*ols.bic)*prior
return np.log(posterior)
def random():
k = np.random.choice(predictors)
columns = sorted(np.random.choice(xrange(0,rank), size=k, replace=False))
return pack(columns, rank)
class ModelMetropolis(DiscreteMetropolis):
def __init__(self, stochastic):
DiscreteMetropolis.__init__(self, stochastic)
def propose(self):
'''considers a neighborhood around the previous model,
defined as having one regressor removed or added, provided
the total number of regressors coincides with predictors
'''
# Building set of neighboring models
last = unpack(self.stochastic.value)
last_indicator = np.zeros(rank)
last_indicator[last] = 1
last_indicator = last_indicator.reshape((-1,1))
neighbors = abs(np.diag(np.ones(rank)) - last_indicator)
neighbors = neighbors[:,np.any([neighbors.sum(axis=0) == i \
for i in predictors], axis=0)]
neighbors = pd.DataFrame(neighbors)
# Drawing one model at random from the neighborhood
draw = random.choice(xrange(neighbors.shape[1]))
self.stochastic.value = pack(list(neighbors[draw][neighbors[draw]==1].index), rank)
# def step(self):
#
# logp_p = self.stochastic.logp
#
# self.propose()
#
# logp = self.stochastic.logp
#
# if np.log(random.random()) > logp_p - logp:
#
# self.reject()
return locals()
if __name__ == '__main__':
model = make_bma()
M = MCMC(model)
M.use_step_method(model['ModelMetropolis'], model['regression_model'])
M.sample(iter=5000, burn=1000, thin=1)
model_chain = M.trace("regression_model")[:]
from collections import Counter
counts = Counter(model_chain).items()
counts.sort(reverse=True, key=lambda x: x[1])
for f in counts[:10]:
columns = unpack(f[0])
print('Visits:', f[1])
print(np.array([1. if i in columns else 0 for i in range(0,M.rank)]))
print(M.coefficients.flatten())
X = sm.add_constant(M.X[:, columns], prepend=True)
corr = np.corrcoef(X[:,1:], rowvar=0)
prior = np.linalg.det(corr)
fit = sm.OLS(model['y'],X).fit()
posterior = np.exp(-0.5*fit.bic)*prior
print(fit.params)
print('R-squared:', fit.rsquared)
print('BIC', fit.bic)
print('Prior', prior)
print('Posterior', posterior)
print(" ")
It sounds like you are trying to do something akin to reversible jump MCMC, where you are sampling from the model space in addition to the parameter space(s). PyMC does not currently do rjMCMC, though it probably ought to. The trick is to account for the change in dimension when moving among models. If you do have a modest number of models, you can use an indicator function to select from the models, all of which are fit simultaneously.

numpy - calculate value from matrix indexes

I need to create a new (n*m) x 4 matrix (e.g. named b) from a n x m matrix (e.g. named a), but I don't want to use nested loops for speed reasons. Here how I would do it with a nested loop:
for j in xrange(1,m+1):
for i in xrange(1,n+1):
index = (j-1)*n+i
b[index,1] = a[i,j]
b[index,2] = index
b[index,3] = s1*i+s2*j+s3
b[index,4] = s4*i+s5*j+s6
The question is, therefore, how to create a new matrix with values derived from original matrix indexes? Thanks
If you can use numpy, you could try
import numpy as np
# Create an empty array
b = np.empty((np.multiply(*a.shape), 4), dtype=a.dtype)
# Get two (nxm) of indices
(irows, icols) = np.indices(a)
# Fill the b array
b[...,0] = a.flat
b[...,1] = np.arange(a.size)
b[...,2] = (s1*irows + s2*icols + s3).flat
b[...,3] = (s4*irows + s5*icols + s6).flat
some minor corrections (could not post as comment :/ ) for those who will have a similar question:
import numpy as np
# Create an empty array
b = np.empty((a.size, 4), dtype=a.dtype)
# Get two (nxm) of indices (starting from 1)
(irows, icols) = np.indices(a.shape) + 1
# Fill the b array
b[...,0] = a.flat
b[...,1] = np.arange(a.size) + 1
b[...,2] = (s1*irows + s2*icols + s3).flat
b[...,3] = (s4*irows + s5*icols + s6).flat

Resources