Quickly calculating centroid of binary numpy array - performance

I have a numpy array of 0's and 1's (512 x 512). I'd like to calculate the centroid of the shape of 1's (they're all connected in one circular blob in the middle of the array).
for i in xrange(len(array[:,0])):
for j in xrange(len(array[0,:])):
if array[i,j] == 1:
x_center += i
y_center += j
count = (aorta == 1).sum()
x_center /= count
y_center /= count
Is there a way to speed up my calculation above? Could I use numpy.where() or something? Are there any python functions for doing this in parallel?

You can replace the two nested loops to get the coordinates of the center point, like so -
x_center, y_center = np.argwhere(array==1).sum(0)/count

You could also use scipy.ndimage.measurements.center_of_mass.

import numpy as np
x_c = 0
y_c = 0
area = array.sum()
it = np.nditer(array, flags=['multi_index'])
for i in it:
x_c = i * it.multi_index[1] + x_c
y_c = i * it.multi_index[0] + y_c
centroid = int(x_c/area), int(y_c/area)
Thats should be the centroid of a binary image using numpy.nditer()

Related

Finding min and max of a linear system

So I'm trying to make sense of a scenario in my class exercise which is to find the max and min value of a function. I have two vectors, w and v, of weights which are to sum to 1. The vectors are w = [0.6, 0.2, 0.2]^T v = [0.8, -0.2, 0.4]^T
These vectors form a linear combination of weights M = Aw + Bv, and A and B must sum to 1.
The function we are then optimizing is r = [0.1, 0.2, 0.1] • M
The constraints are as follows: 0 ≤ (0.6A + 0.8B) <= 1 , 0 ≤ (0.2A - 0.2B) <= 1 , 0 ≤ (0.2A + 0.4B) <= 1
The answer we should get are A = B = .5 for the minimum value of r which is 0.1. For the maximum we should get A = 2, B = -1 with r = 0.16. But the values I'm getting for the max are A = 3.5714286, B = -1.4285714, and for the min I'm getting A = B = 0.
Below is the code.
import pulp as p
from pulp import *
problem = LpProblem('Car Factory', LpMaximize)
A = LpVariable('Amound of w', cat=LpContinuous)
B = LpVariable('Amount of v', cat=LpContinuous)
#Objective Function
problem += (0.1)*(0.6*A + 0.8*B) + (0.2)*(0.2*A - 0.2*B) + (0.1)*(0.2*A + 0.4*B) , 'Objective Function'
#Constraints
problem += (0.6*A + 0.8*B) <= 1 , 'A'
problem += (0.6*A + 0.8*B) >= 0 , 'AL'
problem += (0.2*A - 0.2*B) <= 1, 'B'
problem += (0.2*A - 0.2*B) >= 0, 'BL'
problem += (0.2*A + 0.4*B) <= 1, 'C'
problem += (0.2*A + 0.4*B) >= 0, 'CL'
problem.solve()
print("Amount of w: ", A.varValue)
print("Amount of v: ", B.varValue)
print("total: ", value(problem.objective))
I'm sure it has to do with the set up which I'm just not seeing. And also is there a more efficient way to put this together?
I think you are missing a constraint, which would explain your deviation from the expected result. Where is your constraint that:
A + B == 1
Also, you are importing pulp twice, which may cause some confusion in the namespace of your code. Do one or the other, not both.
On expressing the problem more efficiently...? Nahh. You could treat your two column vectors as arrays of length 3 and do the math in your objective a bit differently, but it probably isn't worth it and your variables are just scalars, so I'd write it as you did. Now if the vectors were much larger, or if the variables were vectors, sure, I'd do something else.
pulp doesn't naturally handle vectors (like numpy arrays) to my knowledge. If you are going to be doing a lot of optimization in vector-matrix format and you are comfortable with the linear algebra, you might look at cvxpy which handles them naturally. If you're in a class that uses pulp, it's just fine to learn the basics.

Efficient way of computing multivariate gaussian varying the mean - Matlab

Is there a efficient way to do the computation of a multivariate gaussian (as below) that returns matrix p , that is, making use of some sort of vectorization? I am aware that matrix p is symmetric, but still for a matrix of size 40000x3, for example, this will take quite a long time.
Matlab code example:
DataMatrix = [3 1 4; 1 2 3; 1 5 7; 3 4 7; 5 5 1; 2 3 1; 4 4 4];
[rows, cols ] = size(DataMatrix);
I = eye(cols);
p = zeros(rows);
for k = 1:rows
p(k,:) = mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I);
end
Stage 1: Hack into source code
Iteratively we are performing mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I)
The syntax is : mvnpdf(X,Mu,Sigma).
Thus, the correspondence with our input becomes :
X = DataMatrix(:,:);
Mu = DataMatrix(k,:);
Sigma = I
For the sizes relevant to our situation, the source code mvnpdf.m reduces to -
%// Store size parameters of X
[n,d] = size(X);
%// Get vector mean, and use it to center data
X0 = bsxfun(#minus,X,Mu);
%// Make sure Sigma is a valid covariance matrix
[R,err] = cholcov(Sigma,0);
%// Create array of standardized data, and compute log(sqrt(det(Sigma)))
xRinv = X0 / R;
logSqrtDetSigma = sum(log(diag(R)));
%// Finally get the quadratic form and thus, the final output
quadform = sum(xRinv.^2, 2);
p_out = exp(-0.5*quadform - logSqrtDetSigma - d*log(2*pi)/2)
Now, if the Sigma is always an identity matrix, we would have R as an identity matrix too. Therefore, X0 / R would be same as X0, which is saved as xRinv. So, essentially quadform = sum(X0.^2, 2);
Thus, the original code -
for k = 1:rows
p(k,:) = mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I);
end
reduces to -
[n,d] = size(DataMatrix);
[R,err] = cholcov(I,0);
p_out = zeros(rows);
K = sum(log(diag(R))) + d*log(2*pi)/2;
for k = 1:rows
X0 = bsxfun(#minus,DataMatrix,DataMatrix(k,:));
quadform = sum(X0.^2, 2);
p_out(k,:) = exp(-0.5*quadform - K);
end
Now, if the input matrix is of size 40000x3, you might want to stop here. But with system resources permitting, you can vectorize everything as discussed next.
Stage 2: Vectorize everything
Now that we see what's actually going on and that the computations look parallelizable, it's time to step-up to use bsxfun in 3D with his good friend permute for a vectorized solution, like so -
%// Get size params and R
[n,d] = size(DataMatrix);
[R,err] = cholcov(I,0);
%// Calculate constants : "logSqrtDetSigma" and "d*log(2*pi)/2`"
K1 = sum(log(diag(R)));
K2 = d*log(2*pi)/2;
%// Major thing happening here as we calclate "X0" for all iterations
%// in one go with permute and bsxfun
diffs = bsxfun(#minus,DataMatrix,permute(DataMatrix,[3 2 1]));
%// "Sigma" is an identity matrix, so it plays no in "/R" at "xRinv = X0 / R".
%// Perform elementwise squaring and summing rows to get vectorized "quadform"
quadform1 = squeeze(sum(diffs.^2,2))
%// Finally use "quadform1" and get vectorized output as a 2D array
p_out = exp(-0.5*quadform1 - K1 - K2)

MATLAB vectorization: computing a neighborhood matrix

Given two vectors X and Y of length n, representing points on the plane, and a neighborhood radius rad, is there a vectorized way to compute the neighborhood matrix of the points?
In other words, can the following (painfully slow for large n) loop be vectorized:
neighborhood_mat = zeros(n, n);
for i = 1 : n
for j = 1 : i - 1
dist = norm([X(j) - X(i), Y(j) - Y(i)]);
if (dist < radius)
neighborhood_mat(i, j) = 1;
neighborhood_mat(j, i) = 1;
end
end
end
Approach #1
bsxfun based approach -
out = bsxfun(#minus,X,X').^2 + bsxfun(#minus,Y,Y').^2 < radius^2
out(1:n+1:end)= 0
Approach #2
Distance matrix calculation using matrix-multiplication based approach (possibly faster) -
A = [X(:) Y(:)]
A_t = A.'; %//'
out = [-2*A A.^2 ones(n,3)]*[A_t ; ones(3,n) ; A_t.^2] < radius^2
out(1:n+1:end)= 0
Approach #3
With pdist and squareform -
A = [X(:) Y(:)]
out = squareform(pdist(A))<radius
out(1:n+1:end)= 0
Approach #4
You can use pdist as with the previous approach, but avoid squareform with some logical indexing to get the final output of neighbourhood matrix as shown below -
A = [X(:) Y(:)]
dists = pdist(A)< radius
mask_lower = bsxfun(#gt,[1:n]',1:n) %//'
%// OR tril(true(n),-1)
mask_upper = bsxfun(#lt,[1:n]',1:n) %//'
%// OR mask_upper = triu(true(n),1)
%// OR mask_upper = ~mask_lower; mask_upper(1:n+1:end) = false;
out = zeros(n)
out(mask_lower) = dists
out_t = out' %//'
out(mask_upper) = out_t(mask_upper)
Note: As one can see, for the all above mentioned approaches, we are using pre-allocation for the output. A fast way to pre-allocate would be with out(n,n) = 0 and is based upon this wonderful blog on undocumented MATLAB. This should really speed up those approaches!
The following approach is great if the number of points in your neighborhoods is small or you run low on memory using the brute-force approach:
If you have the statistics toolbox installed, you can have a look at the rangesearch method.
(Free alternatives include the k-d tree implementations of a range search on the File Exchange.)
The usage of rangesearch is straightforward:
P = [X,Y];
[idx,D] = rangesearch(P, P, rad);
It returns a cell-array idx of the indices of nodes within reach and their distances D.
Depending on the size of your data, this could be beneficial in terms of speed and memory.
Instead of computing all pairwise distances and then filtering out those that are large, this algorithm builds a data structure called a k-d tree to more efficiently search close points.
You can then use this to build a sparse matrix:
I = cell2mat(idx.').';
J = runLengthDecode(cellfun(#numel,idx));
n = size(P,1);
S = sparse(I,J,1,n,n)-speye(n);
(This uses the runLengthDecode function from this answer.)
You can also have a look at the KDTreeSearcher class if your data points don't change and you want to query your data lots of times.

Code to calculate 1D Median Filter

I wonder if anyone knows some python or java code to calculate 1D median filter.
I have a file comma delimited with two fields: Date and Signal.
Something like that:
2014-06-01 11:22:12, 23.8
2014-06-01 11:23:12, 25.9
2014-06-01 11:24:12, 45.7
I would like to read this file and apply the 1D Median Filter with size 23
for the field Signal and save it in another file to remove the noise.
Thanks in advance.
Alexandre.
In case someone stumbled on this later.
To extract the data you can use regex, while for the custom median filter you can have a look here.
I will leave a copy down here in case it is removed:
def medfilt (x, k):
"""Apply a length-k median filter to a 1D array x.
Boundaries are extended by repeating endpoints.
"""
assert k % 2 == 1, "Median filter length must be odd."
assert x.ndim == 1, "Input must be one-dimensional."
k2 = (k - 1) // 2
y = np.zeros ((len (x), k), dtype=x.dtype)
y[:,k2] = x
for i in range (k2):
j = k2 - i
y[j:,i] = x[:-j]
y[:j,i] = x[0]
y[:-j,-(i+1)] = x[j:]
y[-j:,-(i+1)] = x[-1]
return np.median (y, axis=1)
scipy.signal.medfilt accepts 1D kernels:
import pandas as pd
import scipy.signal
def median_filter(file_name, new_file_name, kernel_size):
with open(file_name, 'r') as f:
df = pd.read_csv(f, header=None)
signal = df.iloc[:, 1].values
median = scipy.signal.medfilt(signal, kernel_size)
df = df.drop(df.columns[1], 1)
df[1] = median
df.to_csv(new_file_name, sep=',', index=None, header=None)
if __name__=='__main__':
median_filter('old_signal.csv', 'new_signal.csv', 23)

Algorithm to express elements of a matrix as a vector

Statement of Problem:
I have an array M with m rows and n columns. The array M is filled with non-zero elements.
I also have a vector t with n elements, and a vector omega
with m elements.
The elements of t correspond to the columns of matrix M.
The elements of omega correspond to the rows of matrix M.
Goal of Algorithm:
Define chi as the multiplication of vector t and omega. I need to obtain a 1D vector a, where each element of a is a function of chi.
Each element of chi is unique (i.e. every element is different).
Using mathematics notation, this can be expressed as a(chi)
Each element of vector a corresponds to an element or elements of M.
Matlab code:
Here is a code snippet showing how the vectors t and omega are generated. The matrix M is pre-existing.
[m,n] = size(M);
t = linspace(0,5,n);
omega = linspace(0,628,m);
Conceptual Diagram:
This appears to be a type of integration (if this is the right word for it) along constant chi.
Reference:
Link to reference
The algorithm is not explicitly stated in the reference. I only wish that this algorithm was described in a manner reminiscent of computer science textbooks!
Looking at Figure 11.5, the matrix M is Figure 11.5(a). The goal is to find an algorithm to convert Figure 11.5(a) into 11.5(b).
It appears that the algorithm is a type of integration (averaging, perhaps?) along constant chi.
It appears to me that reshape is the matlab function you need to use. As noted in the link:
B = reshape(A,siz) returns an n-dimensional array with the same elements as A, but reshaped to siz, a vector representing the dimensions of the reshaped array.
That is, create a vector siz with the number m*n in it, and say A = reshape(P,siz), where P is the product of vectors t and ω; or perhaps say something like A = reshape(t*ω,[m*n]). (I don't have matlab here, or would run a test to see if I have the product the right way around.) Note, the link does not show an example with one number (instead of several) after the matrix parameter to reshape, but I would expect from the description that A = reshape(t*ω,m*n) might also work.
You should add a pseudocode or a link to the algorithm you want to implement. From what I could understood I have developed the following code anyway:
M = [1 2 3 4; 5 6 7 8; 9 10 11 12]' % easy test M matrix
a = reshape(M, prod(size(M)), 1) % convert M to vector 'a' with reshape command
[m,n] = size(M); % Your sample code
t = linspace(0,5,n); % Your sample code
omega = linspace(0,628,m); % Your sample code
for i=1:length(t)
for j=1:length(omega) % Acces a(chi) in the desired order
chi = length(omega)*(i-1)+j;
t(i) % related t value
omega(j) % related omega value
a(chi) % related a(chi) value
end
end
As you can see, I also think that the reshape() function is the solution to your problems. I hope that this code helps,
The basic idea is to use two separate loops. The outer loop is over the chi variable values, whereas the inner loop is over the i variable values. Referring to the above diagram in the original question, the i variable corresponds to the x-axis (time), and the j variable corresponds to the y-axis (frequency). Assuming that the chi, i, and j variables can take on any real number, bilinear interpolation is then used to find an amplitude corresponding to an element in matrix M. The integration is just an averaging over elements of M.
The following code snippet provides an overview of the basic algorithm to express elements of a matrix as a vector using the spectral collapsing from 2D to 1D. I can't find any reference for this, but it is a solution that works for me.
% Amp = amplitude vector corresponding to Figure 11.5(b) in book reference
% M = matrix corresponding to the absolute value of the complex Gabor transform
% matrix in Figure 11.5(a) in book reference
% Nchi = number of chi in chi vector
% prod = product of timestep and frequency step
% dt = time step
% domega = frequency step
% omega_max = maximum angular frequency
% i = time array element along x-axis
% j = frequency array element along y-axis
% current_i = current time array element in loop
% current_j = current frequency array element in loop
% Nchi = number of chi
% Nivar = number of i variables
% ivar = i variable vector
% calculate for chi = 0, which only occurs when
% t = 0 and omega = 0, at i = 1
av0 = mean( M(1,:) );
av1 = mean( M(2:end,1) );
av2 = mean( [av0 av1] );
Amp(1) = av2;
% av_val holds the sum of all values that have been averaged
av_val_sum = 0;
% loop for rest of chi
for ccnt = 2:Nchi % 2:Nchi
av_val_sum = 0; % reset av_val_sum
current_chi = chi( ccnt ); % current value of chi
% loop over i vector
for icnt = 1:Nivar % 1:Nivar
current_i = ivar( icnt );
current_j = (current_chi / (prod * (current_i - 1))) + 1;
current_t = dt * (current_i - 1);
current_omega = domega * (current_j - 1);
% values out of range
if(current_omega > omega_max)
continue;
end
% use bilinear interpolation to find an amplitude
% at current_t and current_omega from matrix M
% f_x_y is the bilinear interpolated amplitude
% Insert bilinear interpolation code here
% add to running sum
av_val_sum = av_val_sum + f_x_y;
end % icnt loop
% compute the average over all i
av = av_val_sum / Nivar;
% assign the average to Amp
Amp(ccnt) = av;
end % ccnt loop

Resources