Finding center point given distance matrix - matrix

I have a matrix (really a loaded image) in which every element is a L2 distance from some unknown center point.
Here is a trivial example
A = [1.4142 1.0000 1.4142 2.2361]
[1.0000 0.0000 1.0000 2.0000]
[1.4142 1.0000 1.4142 2.2361]
In this case, the center is obviously at coordinate (1,1) (index A[1,1] in a 0-indexed matrix or 2D array).
However, in the case where my centers are not constrained to be integer indices, it's no longer as obvious. For example, given this matrix B, where is my center coordinate?
B = [3.0292 1.9612 2.8932 5.8252]
[1.2292 0.1612 1.0932 4.0252]
[1.4292 0.3612 1.2932 4.2252]
How would you find that the answer in this case is at row 1.034 and column 1.4?
I am aware of the trilateration solution (having provided MATLAB code to visualize that in 3D previously), but is there a more efficient way (e.g. one without a matrix inversion)?
This question is sort of language agnostic, as I am looking more for algorithmic help. If you could stick to MATLAB, Python, or C++ though in a solution, that would be great ;-).

While having no experience with similar tasks, i read some stuff and also tried something.
When unfamiliar with this topic it's hard to grasp it seems and all those resources i found are a bit chaotic.
Still unclear in regards to theory for me:
is the problem as stated above a convex-optimization problem (local-minimum = global-minimum; would mean access to powerful solvers!)
there are much more resources about more generic problems (Sensor Network
Localization), which are non-convex and where extremely complex methods have been developed
is your trilateration-approach able to exploit > 3 points (trilateration vs. multilateration; at least this code does not seem like it can which means: bad performance with noise!)
Here some example code with two approaches:
A: Convex-optimization: SOCP-Relaxation
Not impressive performance, but should be powerful as approximation for big-data
Guaranteed global-optimum for this relaxation!
Implemented with cvxpy
B: Nonlinear-programming optimization
Implemented using scipy.optimize
Pretty much perfect in my synthetic experiments; even good results in noisy case; despite the fact we are using numerical-differentiation (automatic-diff hard to use here)
Some additional remark:
Your example B surely has some (pretty bad) noise or some other problem in my opinion, as my approaches are completely off; while especially approach B shines for my synthetic-data (at least that's my impression)
import numpy as np
import cvxpy as cvx
from scipy.spatial.distance import cdist
from scipy.optimize import minimize
""" Create noise-free (not anymore!) fake-problem """
real_x = np.random.random(size=2) * 3
M, N = 5, 10
pos = np.array([(i,j) for i in range(M) for j in range(N)]) # ugly -> tile/repeat/stack
real_x_stacked = np.vstack([real_x for i in range(pos.shape[0])])
Y = cdist(pos, real_x[np.newaxis])
Y += np.random.normal(size=Y.shape)*NOISE_DISTS # Let's add some noise!
print('real x: ', real_x)
print('dist mat: ', np.round(Y,3).T)
""" Helper """
def cost(x, Y, pos):
res = np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
return np.linalg.norm(res, 2)
print('cost with real_x (check vs. noisy): ', cost(real_x, Y, pos))
def solve_socp_relax(pos, Y):
x = cvx.Variable(2)
y = cvx.Variable(pos.shape[0])
fake_stack = [x for i in range(pos.shape[0])] # hacky
objective = cvx.sum_entries(cvx.norm(y - Y))
x_stacked = cvx.reshape(cvx.vstack(*fake_stack), pos.shape[0], 2) # hacky
constraints = [cvx.norm(pos - x_stacked, 2, axis=1) <= y]
problem = cvx.Problem(cvx.Minimize(objective), constraints)
problem.solve(solver=cvx.ECOS, verbose=False)
return x.value.T
""" SOLVER NLP """
def solve_nlp(pos, Y):
sol = minimize(cost, np.zeros(pos.shape[1]), args=(Y, pos), method='BFGS')
# print(sol)
return sol.x
""" TEST """
socp_relax_sol = solve_socp_relax(pos, Y)
print('SOCP RELAX SOL: ', socp_relax_sol)
nlp_sol = solve_nlp(pos, Y)
print('NLP SOL: ', nlp_sol)
real x: [ 1.25106601 2.16097348]
dist mat: [[ 2.444 1.599 1.348 1.276 2.399 3.026 4.07 4.973 6.118 6.746
2.143 1.149 0.412 0.766 1.839 2.762 3.851 4.904 5.734 6.958
2.377 1.432 0.856 1.056 1.973 2.843 3.885 4.95 5.818 6.84
2.711 2.015 1.689 1.939 2.426 3.358 4.385 5.22 6.076 6.97
3.422 3.153 2.759 2.81 3.326 4.162 4.734 5.627 6.484 7.336]]
cost with real_x (check vs. noisy): 0.665125233772
SOCP RELAX SOL: [[ 1.95749275 2.00607253]]
NLP SOL: [ 1.23560791 2.16756168]
Edit: Further speedup can be achieved (especially in large-scale) in using nonlinear-least-squares instead of the more general NLP-approach! My results are still the same (as expected if the problem would be convex). Timings between NLP/NLS can look like 9 vs. 0.5 seconds!
This is my recommended method!
def solve_nls(pos, Y):
def res(x, Y, pos):
return np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
sol = least_squares(res, np.zeros(pos.shape[1]), args=(Y, pos), method='lm')
# print(sol)
return sol.x
Especially the second-approach (NLP) will also run for much bigger instances (cvxpy's overhead hurts; that's not a downside of the SOCP-solver which should scale much much better!).
Here some output for M, N = 500, 1000 with some more noise:
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 24.706 23.573 23.693 ..., 1090.29 1091.216
cost with real_x (check vs. noisy): 353.354267797
NLP SOL: [ 12.51082419 21.60911561]
used: 5.9552763315495625 # SECONDS
So in my experiments it works, but i won't give any global-convergence guarantees or reconstruction-guarantees (still missing some theory).
At first i though about using the global optimum of the relaxed-SOCP-problem as initial-point in the NLP-solver, but i did not find any example where this is needed!
Some just-for-fun visuals using:
M, N = 20, 30
import matplotlib.pyplot as plt
plt.imshow(Y.reshape(M, N), cmap='viridis', interpolation='none')
plt.scatter(nlp_sol[1], nlp_sol[0], color='red', s=20)
plt.xlim((0, N))
plt.ylim((0, M))
And some super noisy case (nice performance!):
M, N = 50, 100
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 22.329 18.745 27.588 ..., 94.967 80.034 91.206]]
cost with real_x (check vs. noisy): 354.527196716
NLP SOL: [ 12.44158986 21.50164637]
used: 0.01050068340320306

If I understand correctly, you have a matrix A, where A[i,j] holds the distance from (i,j) to some unknown point (y,x). You could find (y,x) like this:
Square each element of A, to make a matrix B say.
We then want to find (y,x) so
(y-i)*(y-i) + (x-j)*(x-j) = B[i,j]
Subtracting each equation from the 0,0 equation and rearranging:
2*i*y + 2*j*x = B[0,0] + i*i + j*j - B[i,j]
This can be solved by linear least squares. Note that since there are 2 unknowns, the matix inversion (better, factorisation) involved will be on a 2x2 matrix and so not time consuming. You could indeed, given just the dimensions of A, work out the required matrix and its inverse analytically.


Julia: isposdef() fails for large matrices?

I have a positive definite covariance matrix C of size 3n x 3n constructed from n^2 blocks of size 3x3.
Running MvNormal (with e.g a zero mean vector) on this matrix to draw Gaussian random vectors, I am getting the error
PosDefException: matrix is not positive definite; Cholesky factorization failed.
and indeed checking isposef(C) returns false when n becomes too large. However my matrix should be positive definite for any n, so it seems that there is some kind of numerical instability (perhaps due to the determinant becoming too small or too large beyond machine precision).
The reproducible code I am using to generate C is below:
# inputs
l_sq = 1
xmax = 2
# kernel function used to construct covariance matrix C
function corr(x, y, l_sq)
n = d_sq/(2l_sq)
return exp(-n)*(Matrix{Float64}(I, length(x), length(x)))
nb_grid_points = grid_size^3
gaussian_vector_dim = 3*nb_grid_points
oneD_grid = LinRange(-xmax, xmax, grid_size)
# get input set X which indexes grid points
threeD_grid = collect.(Iterators.product(oneD_grid, oneD_grid, oneD_grid))
grid_points = vec(reshape(threeD_grid,:,1))
# build C by blocks
C = Array{Float64}(undef, gaussian_vector_dim, gaussian_vector_dim)
for i in 1:nb_grid_points
for j in 1:nb_grid_points
#block covariance matrix C consist of DxD correlation-function matrices K_i,j for i,j=1,...,nb_grid_points
C[3*(i-1)+1:(3*i),(3*(j-1)+1):(3*j)] = corr(grid_points[i], grid_points[j],l_sq)
# plot covariance matrix
plt.imshow(C,cmap="Blues", interpolation="none")
plt.title("Covariance matrix")
print("C is symmetric:",issymmetric(C))
print("\ndet C=",det(C))
print("\nC is positive definite=",isposdef(C))
Maintaining, l_sq = 1 , xmax = 2, the code above gives isposdef(C) = false when grid_size=10 but isposdef(C)=true if grid_size is 9 or less.
Why is this failure occurring and how can I fix it? Perhaps I can help Julia by indicating that the covariance matrix is sparse?

Facility Location - Algorithm to Minimize facilities serving customers with distance constraint

I have for example, 1000 customers located in Europe with different latitude and longitude. I want to find the minimal number of facilities that can serve all customers, subject to the constraint that each customer must be served within 24hr delivery (here I use a maximum allowed transportation distance from a facility to a customer as the constraint for ensuring 24hr delivery service (distance is straight line between two locations, calculated based on Euclidean distance/straight line).
So, with each warehouse that can only serve the customers within certain distance e.g. 600 km, what is the algorithms that can help me to find the minimal number of facilities needed to service all customers, and their respective latitude and longitude. An example is shown in the attached pic below.
example of finding minimal warehouses and their locaitons
This falls in the category of facility location problems. There is quite a rich literature about these problems. The p-center problem is close to what you want.
Some notes:
Besides solving a formal mathematical optimization model, often heuristics (and meta-heuristics) are used.
The distances are a rough approximation of real travel time. That also means approximate solutions are probably good enough.
Besides just finding the minimum number of facilities needed to service all customers, we can refine the locations by minimizing the distances.
A math programming model for the pure "minimize number of facilities" can be formulated as a Mixed Integer Quadratically Constrained problem (MIQCP). This can be solved with standard solvers (e.g. Cplex and Gurobi). Below is an example I cobbled together:
With 1000 random customer locations, I can find a proven optimal solution:
---- 57 VARIABLE n.L = 4.000 number of open facilties
---- 57 VARIABLE isopen.L use facility
facility1 1.000, facility2 1.000, facility3 1.000, facility4 1.000
---- 60 PARAMETER locations
x y
facility1 26.707 31.796
facility2 68.739 68.980
facility3 28.044 67.880
facility4 76.921 34.929
See here for more details.
Basically we solve two models:
Model 1 finds the number of warehouses needed (minimize number subject to maximum distance constraint)
Model 2 finds the optimal placement of the warehouses (minimize total distance)
After solving model 1 we see (for a 50 customer random problem):
We need three warehouses. Although no link exceeds the maximum distance constraint, this is not an optimal placement.
After solving model 2 we see:
This now optimally places the three warehouses by minimizing the sum of length of the links. To be precise I minimized the sum of the squared lengths. Getting rid of the square root allowed me to use a quadratic solver.
Both models are of the convex Mixed Integer Quadratically Constrained Problem type (MIQCP). I used a readily available solver to solve these models.
Python codes with Gurobi as the solver:
from gurobipy import *
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
customer = np.random.uniform(minxy,maxxy,[customer_num,2])
#Model 1 : Minimize number of warehouses
m = Model()
for j in range(dc_num):
dc[j] = m.addVar(lb=0,ub=1,vtype=GRB.BINARY, name="DC%d" % j)
x[j]= m.addVar(lb=0, ub=maxxy, vtype=GRB.CONTINUOUS, name="x%d" % j)
y[j] = m.addVar(lb=0, ub=maxxy, vtype=GRB.CONTINUOUS, name="y%d" % j)
for i in range(len(customer)):
for j in range(len(dc)):
assign[(i,j)] = m.addVar(lb=0,ub=1,vtype=GRB.BINARY, name="Cu%d from DC%d" % (i,j))
for i in range(len(customer)):
for j in range(len(dc)):
m.addConstr(((customer[i][0] - x[j])*(customer[i][0] - x[j]) +\
(customer[i][1] - y[j])*(customer[i][1] - \
y[j])) <= max_dist*max_dist + M*(1-assign[(i,j)]))
for i in range(len(customer)):
m.addConstr(quicksum(assign[(i,j)] for j in range(len(dc))) <= 1)
for i in range(len(customer)):
for j in range(len(dc)):
m.addConstr(assign[(i, j)] <= dc[j])
for j in range(dc_num-1):
m.addConstr(dc[j] >= dc[j+1])
m.addConstr(quicksum(assign[(i,j)] for i in range(len(customer)) for j in range(len(dc))) >= covered_customers)
#sum n
for j in dc:
print('\nOptimal Solution is: %g' % m.objVal)
for v in m.getVars():
print('%s %g' % (v.varName, v.x))
# # print(v)
# #Model 2: Optimal location of warehouses
m2 = Model() #create Model 2
# m_new = Model()
for j in range(optimal_n):
x[j]= m2.addVar(lb=0, ub=maxxy, vtype=GRB.CONTINUOUS, name="x%d" % j)
y[j] = m2.addVar(lb=0, ub=maxxy, vtype=GRB.CONTINUOUS, name="y%d" % j)
for i in range(len(customer)):
for j in range(optimal_n):
assign[(i,j)] = m2.addVar(lb=0,ub=1,vtype=GRB.BINARY, name="Cu%d from DC%d" % (i,j))
for i in range(len(customer)):
for j in range(optimal_n):
d[(i,j)] = m2.addVar(lb=0,ub=max_dist*max_dist,vtype=GRB.CONTINUOUS, name="d%d,%d" % (i,j))
for i in range(len(customer)):
for j in range(optimal_n):
m2.addConstr(((customer[i][0] - x[j])*(customer[i][0] - x[j]) +\
(customer[i][1] - y[j])*(customer[i][1] - \
y[j])) - M*(1-assign[(i,j)]) <= d[(i,j)])
m2.addConstr(d[(i,j)] <= max_dist*max_dist)
for i in range(len(customer)):
m2.addConstr(quicksum(assign[(i,j)] for j in range(optimal_n)) <= 1)
m2.addConstr(quicksum(assign[(i,j)] for i in range(len(customer)) for j in range(optimal_n)) >= covered_customers)
L = quicksum(d[(i,j)] for i in range(len(customer)) for j in range(optimal_n))
#########Print Optimization Result
print('\nOptimal Solution is: %g' % m2.objVal)
for v in m2.getVars():
print('%s %g' % (v.varName, v.x))
if v.varName.startswith("x"):
if v.varName.startswith("y"):
if v.varName.startswith("Cu") and v.x == 1:
print([int(s) for s in re.findall("\d+", v.varName)])
temp=[int(s) for s in re.findall("\d+", v.varName)]
g_list.append(temp[1]+len(customer)) #new id mapping to j_list
if v.varName.startswith("Cu") and v.x == 0:
temp=[int(s) for s in re.findall("\d+", v.varName)]
if v.varName.startswith("d") and v.x > 0.00001:
#########Draw Netword
# prepare data
for i,k in enumerate(dc_cor):
for i in dc_list:
dc_customer.append(df[df['DC_drawID'] == i]['Customer'].tolist())
print('\n', dc_customer)
G = nx.DiGraph()
e = []
node = []
o_node = []
for c, k in enumerate(dc_list):
G.add_node(k, pos=(dc_cor[c][0], dc_cor[c][1]))
v = dc_customer[c]
for n, i in enumerate(v):
G.add_node(i, pos=(customer[i][0], customer[i][1]))
u = (k, v[n])
G.add_edge(k, v[n])
for m,x in enumerate(omit_i_list):
G.add_node(x, pos=(customer[x][0], customer[x][1]))
nx.draw_networkx_nodes(G, dc_cor, nodelist=d_node, with_labels=True, width=2, style='dashed', font_color='w', font_size=10, font_family='sans-serif', node_shape='^',
nx.draw_networkx_nodes(G, customer, nodelist=o_node, with_labels=True, width=2, style='dashed', font_color='w', font_size=10, font_family='sans-serif', node_color='purple',
nx.draw(G, nx.get_node_attributes(G, 'pos'), nodelist=node, edgelist=e, with_labels=True,
width=2, style='dashed', font_color='w', font_size=10, font_family='sans-serif', node_color='purple')
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('Optimization_Result.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

Integration of orbits with solar system gravity fields from Skyfield - speed issues

In the time tests shown below, I found that Skyfield takes several hundred microseconds up to a millisecond to return for a single time value in jd, but the incremental cost for longer JulianDate objects (a list of points in time) is only about one microsecond per point. I see similar speeds using Jplephem and with two different ephemerides.
My question here is: if I want to random-access points in time, for example as a slave to an external Runge-Kutta routine which uses its own variable stepsize, is there a way I can do this faster within python (without having to learn to compile code)?
I understand this is not at all the typical way Skyfield is intended to be used. Normally we'd load a JulianDate object with a long list of time points and then calculate them at once, and probably do that a few times, not thousands of times (or more), the way an orbit integrator might do.
Workaround: I can imagine a work-around where I build my own NumPy database by running Skyfield once using a JulianDate object with fine time granularity, then writing my own Runge-Kutta routine which changes step sizes up and down by discrete amounts such that the timesteps always correspond directly to the striding of NumPy array.
Or I could even try re-interpolating. I am not doing highly precise calculations so a simple NumPy or SciPy 2nd order might be fine.
Ultimately I'd like to try integrating the path of objects under the influence of the gravity field of the solar system (e.g. deep-space satellite, comet, asteroid). When looking for an orbit solution one might try millions of starting state vectors in 6D phase space. I know I should be using things like method because gravity travels at the speed of light like everything else. This seems to cost substantial time since (I'm guessing) it's an iterative calculation ("Let's see... where would Jupiter have been such that I feel it's gravity right NOW"). But let's peel the cosmic onion one layer at a time.
Figure 1. Skyfield and JPLephem performance on my laptop for different length JulianDate objects, for de405 and de421. They are all about the same - (very) roughly about a half-millisecond for the first point and a microsecond for each additional point. Also the very first point to be calculated when the script runs (Earth (blue) with len(jd) = 1) has an additional millisecond artifact.
Earth and Moon are slower because it is a two-step calculation internally (the Earth-Moon Barycenter plus the individual orbits about the Barycenter). Mercury may be slower because it moves so fast compared to the ephemeris time steps that it requires more coefficients in the (costly) Chebyshev interpolation?
SCRIPT FOR SKYFIELD DATA the JPLephem script is farther down
import numpy as np
import matplotlib.pyplot as plt
from skyfield.api import load, JulianDate
import time
ephem = 'de421.bsp'
ephem = 'de405.bsp'
de = load(ephem)
earth = de['earth']
moon = de['moon']
earth_barycenter = de['earth barycenter']
mercury = de['mercury']
jupiter = de['jupiter barycenter']
pluto = de['pluto barycenter']
things = [ earth, moon, earth_barycenter, mercury, jupiter, pluto ]
names = ['earth', 'moon', 'earth barycenter', 'mercury', 'jupiter', 'pluto']
ntimes = [i*10**n for n in range(5) for i in [1, 2, 5]]
years = [np.zeros(1)] + [np.linspace(0, 100, n) for n in ntimes[1:]] # 100 years
microsecs = []
for y in years:
jd = JulianDate(utc=(1900 + y, 1, 1))
mics = []
for thing in things:
tstart = time.clock()
answer =
mics.append(1E+06 * (time.clock() - tstart))
microsecs = np.array(microsecs).T
many = [len(y) for y in years]
fig = plt.figure()
ax = plt.subplot(111, xlabel='length of JD object',
title='time for with ' + ephem )
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
ax.get_xticklabels() + ax.get_yticklabels()):
item.set_fontsize(item.get_fontsize() + 4) #
for name, mics in zip(names, microsecs):
ax.plot(many, mics, lw=2, label=name)
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.savefig("skyfield speed test " + ephem.split('.')[0])
SCRIPT FOR JPLEPHEM DATA the Skyfield script is above
import numpy as np
import matplotlib.pyplot as plt
from jplephem.spk import SPK
import time
ephem = 'de421.bsp'
ephem = 'de405.bsp'
kernel =
jd_1900_01_01 = 2415020.5004882407
ntimes = [i*10**n for n in range(5) for i in [1, 2, 5]]
years = [np.zeros(1)] + [np.linspace(0, 100, n) for n in ntimes[1:]] # 100 years
barytup = (0, 3)
earthtup = (3, 399)
# moontup = (3, 301)
microsecs = []
for y in years:
mics = []
#for thing in things:
jd = jd_1900_01_01 + y * 365.25 # roughly, it doesn't matter here
tstart = time.clock()
answer = kernel[earthtup].compute(jd) + kernel[barytup].compute(jd)
mics.append(1E+06 * (time.clock() - tstart))
microsecs = np.array(microsecs)
many = [len(y) for y in years]
fig = plt.figure()
ax = plt.subplot(111, xlabel='length of JD object',
title='time for jplephem [0,3] and [3,399] with ' + ephem )
# from here:
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
ax.get_xticklabels() + ax.get_yticklabels()):
item.set_fontsize(item.get_fontsize() + 4)
#for name, mics in zip(names, microsecs):
ax.plot(many, microsecs, lw=2, label='earth')
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.ylim(1E+02, 1E+06)
plt.savefig("jplephem speed test " + ephem.split('.')[0])

Dirichlet process in PyMC 3

I would like to implement to implement the Dirichlet process example referenced in
Implementing Dirichlet processes for Bayesian semi-parametric models (source: here) in PyMC 3.
In the example the stick-breaking probabilities are computed using the pymc.deterministic
v = pymc.Beta('v', alpha=1, beta=alpha, size=N_dp)
def p(v=v):
""" Calculate Dirichlet probabilities """
# Probabilities from betas
value = [u*[:i]) for i,u in enumerate(v)]
# Enforce sum to unity constraint
value[-1] = 1-sum(value[:-1])
return value
z = pymc.Categorical('z', p, size=len(set(counties)))
How would you implement this in PyMC 3 which is using Theano for the gradient computation?
I tried the following solution using the theano.scan method:
with pm.Model() as mod:
conc = Uniform('concentration', lower=0.5, upper=10)
v = Beta('v', alpha=1, beta=conc, shape=n_dp)
p, updates = theano.scan(fn=lambda stick, idx: stick * - v[:idx]),
sequences=[v, t.arange(n_dp)])
t.set_subtensor(p[-1], 1 - t.sum(p[:-1]))
category = Categorical('category', p, shape=n_algs)
sd = Uniform('precs', lower=0, upper=20, shape=n_dp)
means = Normal('means', mu=0, sd=100, shape=n_dp)
points = Normal('obs',
step1 = pm.Slice([conc, v, sd, means])
step3 = pm.ElemwiseCategoricalStep(var=category, values=range(n_dp))
trace = pm.sample(2000, step=[step1, step3], progressbar=True)
Which sadly is really slow and does not obtain the original parameters of the synthetic data.
Is there a better solution and is this even correct?
Not sure I have a good answer but perhaps this could be sped up by instead using a theano blackbox op which allows you to write a distribution (or deterministic) in python code. E.g.:

Vectorizing three for loops

I'm quite new to Matlab and I need help in speeding up some part of my code. I am writing a Matlab application that performs 3D matrix convolution but unlike in standard convolution, the kernel is not constant, it needs to be calculated for each pixel of an image.
So far, I have ended up with a working code, but incredibly slow:
function result = calculateFilteredImages(images, T)
% images - matrix [480,360,10] of 10 grayscale images of height=480 and width=360
% reprezented as a value in a range [0..1]
% i.e. images(10,20,5) = 0.1231;
% T - some matrix [480,360,10, 3,3] of double values, calculated earlier
kerN = 5; %kernel size
mid=floor(kerN/2); %half the kernel size
offset=mid+1; %kernel offset
[h,w,n] = size(images);
%add padding so as not to get IndexOutOfBoundsEx during summation:
%[i.e. changes [1 2 3...10] to [0 0 1 2 ... 10 0 0]]
images = padarray(images,[mid, mid, mid]);
result(h,w,n)=0; %preallocate, faster than zeros(h,w,n)
kernel(kerN,kerN,kerN)=0; %preallocate
% the three parameters below are not important in this problem
% (are used to calculate sigma in x,y,z direction inside the loop)
d = 3;
for a=1:n;
for b=1:w;
for c=1:h;
M(:,:)=T(c,b,a,:,:); % M is now a 3x3 matrix
[R D] = eig(M); %get eigenvectors and eigenvalues - R and D are now 3x3 matrices
% eigenvalues
l1 = D(1,1);
l2 = D(2,2);
l3 = D(3,3);
sig1=sig( l1 , sigMin, sigMax, d);
sig2=sig( l2 , sigMin, sigMax, d);
sig3=sig( l3 , sigMin, sigMax, d);
% calculate kernel
for i=-mid:mid
for j=-mid:mid
for k=-mid:mid
x_new = [i,j,k] * R; %calculate new [i,j,k]
kernel(offset+i, offset+j, offset+k) = exp(- (((x_new(1))^2 )/(sig1^2) + ((x_new(2))^2)/(sig2^2) + ((x_new(3))^2)/(sig3^2)) /2);
% normalize
%perform summation
for i=-mid:mid
for j=-mid:mid
for k=-mid:mid
xm_sum = xm_sum + kernel(offset+i, offset+j, offset+k) * images(c+mid+i, b+mid+j, a+mid+k);
I tried replacing the "calculating kernel" part with
sigma=[sig1 sig2 sig3]
[x,y,z] = ndgrid(-mid:mid,-mid:mid,-mid:mid);
k2 = arrayfun(#(x, y, z) exp(-(norm([x,y,z]*R./sigma)^2)/2), x,y,z);
but it turned out to be even slower than the loop. I went through several articles and tutorials on vectorization but I'm quite stuck with this one.
Can it be vectorized or somehow speeded up using something else?
I'm new to Matlab, maybe there are some build-in functions that could help in this case?
The profiling result:
Sample data which was used during profiling:
As Dennis noted, this is a lot of code, cutting it down to the minimum that's slow given by the profiler will help. I'm not sure if my code is equivalent to yours, can you try it and profile it? The 'trick' to Matlab vectorization is using .* and .^, which operate element-by-element instead of having to use loops.
Take your rewritten part:
sigma=[sig1 sig2 sig3]
[x,y,z] = ndgrid(-mid:mid,-mid:mid,-mid:mid);
k2 = arrayfun(#(x, y, z) exp(-(norm([x,y,z]*R./sigma)^2)/2), x,y,z);
And just pick one sigma for now. Looping over 3 different sigmas isn't a performance problem if you can vectorize the underlying k2 formula.
EDIT: Changed the matrix_to_norm code to be x(:), and no commas. See Generate all possible combinations of the elements of some vectors (Cartesian product)
Then try:
% R & mid my test variables
R = [1 2 3; 4 5 6; 7 8 9];
mid = 5;
[x,y,z] = ndgrid(-mid:mid,-mid:mid,-mid:mid);
% meshgrid is also a possibility, check that you are getting the order you want
% Going to break the equation apart for now for clarity
% Matrix operation, should already be fast.
matrix_to_norm = [x(:) y(:) z(:)]*R/sig1
% Ditto
matrix_normed = norm(matrix_to_norm)
% Note the .^ - I believe you want element-by-element exponentiation, this will
% vectorize it.
k2 = exp(-0.5*(matrix_normed.^2))
