Jacobi method converging then diverging - algorithm

I am working to solve Poisson's equation (in 2d axisymmetric cylindrical coordinates) using the Jacobi method. The L2 norm decreases from ~1E3 on the first iteration (I have a really bad guess) to ~0.2 very slowly. Then, the L2 norm begins to increase over many iterations.
My geometry is parallel plates with sharp points at r = 0 on both plates. (If that matters).
Is there some error in my code? Do I need to go to a different algorithm? (I have a not yet working DADI algorithm.)
Here is my Jacobi method algorithm. Then this just in wrapped in a while loop.
subroutine Jacobi(PoissonRHS, V, resid)
implicit none
real, dimension(0:,0:) :: PoissonRHS, V
REAL resid
integer :: i,j, lb, ub
real, dimension(0:size(V,1)-1, 0:size(V,2)-1) :: oldV
real :: dr = delta(1)
real :: dz = delta(2)
real :: dr2 = (delta(1))**(-2)
real :: dz2 = (delta(2))**(-2)
integer :: M = cells(1)
integer :: N = cells(2)
oldV = V
!Note: All of the equations are second order accurate
!If at r = 0 and in the computational domain
! This is the smoothness condition, dV(r=0)/dr = 0
V(0,:) = (4.0*oldV(1,:)-oldV(2,:))/3.0
!If at r = rMax and in the computational domain
! This is an approximation and should be fixed to improve accuracy, it should be
! lim r->inf V' = 0, while this is V'(r = R) = 0
V(M, 1:N-1) = 0.5 / (dr2 + dz2) * ( &
(2.0*dr2)*oldV(M-1,1:N-1) + &
dz2 * (oldV(M,2:N) + oldV(M,0:N-2)) &
- PoissonRHS(M,1:N-1))
do i = 1, M-1
lb = max(0, nint(lowerBoundary(i * dr) / dz)) + 1
ub = min(N, nint(upperBoundary(i * dr) / dz)) - 1
V(i,lb:ub) = 0.5 / (dr2 + dz2) * ( &
((1.0 - 0.5/dble(i))*dr2)*oldV(i-1,lb:ub) + &
((1.0 + 0.5/dble(i))*dr2)*oldV(i+1,lb:ub) + &
dz2 * (oldV(i,lb+1:ub+1) + oldV(i,lb-1:ub-1)) &
- PoissonRHS(i,lb:ub))
V(i, 0:lb-1) = V0
V(i, ub+1:N) = VL
enddo
!compare to old V values to check for convergence
resid = sqrt(sum((oldV-V)**2))
return
end subroutine Jacobi

Based on additional readings it seems like it was a precision problem. Because (for example), I had the expression
V(i,lb:ub) = 0.5 / (dr2 + dz2) * ( &
((1.0 - 0.5/dble(i))*dr2)*oldV(i-1,lb:ub) + &
((1.0 + 0.5/dble(i))*dr2)*oldV(i+1,lb:ub) + &
dz2 * (oldV(i,lb+1:ub+1) + oldV(i,lb-1:ub-1)) &
- PoissonRHS(i,lb:ub))
where dr2 and dz2 are very large. So by distributing these I got terms that were ~1 and the code converges (slowly, but that's a function of the mathematics).
So my new code is
subroutine Preconditioned_Jacobi(PoissonRHS, V, resid)
implicit none
real, dimension(0:,0:) :: PoissonRHS, V
REAL resid
integer :: i,j, lb, ub
real, dimension(0:size(V,1)-1, 0:size(V,2)-1) :: oldV
real :: dr = delta(1)
real :: dz = delta(2)
real :: dr2 = (delta(1))**(-2)
real :: dz2 = (delta(2))**(-2)
real :: b,c,d
integer :: M = cells(1)
integer :: N = cells(2)
b = 0.5*(dr**2)/((dr**2) + (dz**2))
c = 0.5*(dz**2)/((dr**2) + (dz**2))
d = -0.5 / (dr2 + dz2)
oldV = V
!Note: All of the equations are second order accurate
!If at r = 0 and in the computational domain
! This is the smoothness condition, dV(r=0)/dr = 0
V(0,:) = (4.0*oldV(1,:)-oldV(2,:))/3.0 !same as: oldV(0,:) - 2.0/3.0 * (1.5 * oldV(0,:) - 2.0 * oldV(1,:) + 0.5 * oldV(2,:) - 0)
!If at r = rMax and in the computational domain
! This is an approximation and should be fixed to improve accuracy, it should be
! lim r->inf V' = 0, while this is V'(r = R) = 0
V(M,1:N-1) = d*PoissonRHS(M,1:N-1) &
+ 2.0*c * oldV(M-1,1:N-1) &
+ b * ( oldV(M,0:N) + oldV(M,2:N) )
do i = 1, M-1
lb = max(0, nint(lowerBoundary(i * dr) / dz)) + 1
ub = min(N, nint(upperBoundary(i * dr) / dz)) - 1
V(i,lb:ub) = d*PoissonRHS(i,lb:ub) &
+ (c * (1.0-0.5/dble(i)) * oldV(i-1,lb:ub)) &
+ (c * (1.0+0.5/dble(i)) * oldV(i+1,lb:ub)) &
+ b * (oldV(i,lb-1:ub-1) + oldV(i,lb+1:ub+1))
V(i, 0:lb-1) = V0
V(i, ub+1:N) = VL
enddo
!compare to old V values to check for convergence
resid = sum(abs(oldV-V))
return
end subroutine Preconditioned_Jacobi

Related

Why does the code terminate with a "Solution Not Found" error and "EXIT: Converged to a point of local infeasibility. Problem may be infeasible"?

I cannot seem to figure out why IPOPT cannot find a solution to this. Initially, I thought the problem was totally infeasible but when I reduce the value of col_total to any number below 161000 or comment out the last constraint equation that contains col_total, it solves and EXITs with an Optimal Solution Found and a final objective value function of -161775.256826753. I have solved the same Maximization problem using Artificial Bee Colony and Particle Swamp Optimization techniques, and they solve and return optimal objective value function at least 225000 and 226000 respectively. Could it be that another solver is required? I have also tried APOPT, BPOPT, and IPOPT and have tinkered around with the tolerance values, but no combination none seems to work just yet. The code is posted below. Any guidance will be hugely appreciated.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.32485226746
col_total = 176542.96546512868
rat = 14
nn = 5
u_hub_lowerbound = 5.777777777777778
c_pow = 0.59230249
p_max = 0.5 * rho * area * c_pow * free**3
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
# i = 0
b = 1
c = 0
v_s = list()
for i in range(nn-1): # Loop runs for nn-1 times
# print(i)
# print(i,b,c)
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) * np.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - u_hub_lowerbound >= 0)
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
gamma.append(gam)
#Equations
m.Equation(x[i] - axi_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) * \
v_s[i]**3) > 0)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solver_options = ['linear_solver ma27','mu_strategy adaptive','max_iter 2500', 'tol 1.0e-5' ]
m.solve()
Built the equations without .value in the expressions. The x[i].value is only needed at the end to view the solution after the solution is complete or to initialize the value of x[i]. The expression m.Maximize(y) is more readable than m.Obj(-y) although they are equivalent.
from gekko import GEKKO
import numpy as np
distances = np.array([[[0, 0],[0,0],[0,0],[0,0]],\
[[155,0],[0,0],[0,0],[0,0]],\
[[310,0],[155,0],[0,0],[0,0]],\
[[465,0],[310,0],[155,0],[0,0]],\
[[620,0],[465,0],[310,0],[155,0]]])
alpha = 0.5 / np.log(30/0.075)
diam = 31
free = 7
rho = 1.2253
area = np.pi * (diam / 2)**2
min_v = 5.5
axi_max = 0.069262150781
col_total = 20000
p_max = 4000
rat = 14
nn = 5
# Initialize Model
m = GEKKO(remote=True)
#initialize variables, Set lower and upper bounds
x = [m.Var(value = 0.03902278, lb = 0, ub = axi_max) \
for i in range(nn)]
i = 0
b = 1
c = 0
v_s = list()
for turbs in range(nn-1): # Loop runs for nn-1 times
squared_defs = list()
while i < b:
d = distances[b][c][0]
r = distances[b][c][1]
ss = (2 * (alpha * d) / diam)
tt = r / ((diam/2) + (alpha * d))
squared_defs.append((2 * x[i] / (1 + ss**2)) \
* m.exp(-(tt**2)) ** 2)
i+=1
c+=1
#Equations
m.Equation((free * (1 - (sum(squared_defs))**0.5)) - rat <= 0)
m.Equation(min_v - (free * (1 - (sum(squared_defs))**0.5)) <= 0 )
v_s.append(free * (1 - (sum(squared_defs))**0.5))
squared_defs.clear()
b+=1
a=0
c=0
# Inserts free as the first item on the v_s list to
# increase len(v_s) to nn, so that 'v_s' and 'x'
# are of same length
v_s.insert(0, free)
beta = list()
gamma = list()
for i in range(len(x)):
bet = (4*x[i]*((1-x[i])**2) * rho * area) / 2
gam = bet * v_s[i]**3
#Equations
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) - p_max <= 0)
m.Equation((((4*x[i]*((1-x[i])**2) * rho * area) / 2) \
* v_s[i]**3) > 0)
gamma.append(gam)
#Equation
m.Equation(col_total - sum(gamma) <= 0)
#Objective
y = sum(gamma)
m.Maximize(y) # Maximize
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
m.options.SOLVER = 3
m.solve()
This gives a successful solution with maximized objective 20,000:
Number of Iterations....: 12
(scaled) (unscaled)
Objective...............: -4.7394814741924645e+00 -1.9999999999929641e+04
Dual infeasibility......: 4.4698510326511536e-07 1.8862194343304290e-03
Constraint violation....: 3.8275766582203308e-11 1.2941979026166479e-07
Complementarity.........: 2.1543608536533588e-09 9.0911246952931704e-06
Overall NLP error.......: 4.6245685940749926e-10 1.8862194343304290e-03
Number of objective function evaluations = 80
Number of objective gradient evaluations = 13
Number of equality constraint evaluations = 80
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 13
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 12
Total CPU secs in IPOPT (w/o function evaluations) = 0.010
Total CPU secs in NLP function evaluations = 0.011
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is -19999.9999999296
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 3.210000000399305E-002 sec
Objective : -19999.9999999296
Successful solution
---------------------------------------------------

How to keep the distance between $n$ particles within a certain range?

I am working on a problem in Molecular Dynamics and need to randomly generate a position array for np particles within a box of size [-L,L] x [-L,L]. In fact, I need to generate the x-array for the x-coordinates with x(1) = 0 and the y-array for the y-coordinates with y(1)=y(2) =0. I need the particles to be such that the distances between neighboring particles are within some range (e.g: 0.9 <= r <= 1.1) like in the following picture:
However in my code I get something like this:
See how the red lines are larger than what I want.
My code is
REAL, DIMENSION(np) :: x, y
REAL :: w1, w2, minv, maxv, xij, yij, rij
INTEGER :: i, j
!Generating random coordinates for the particles
x(1) = 0.0d0
y(1) = 0.0d0
y(2) = 0.0d0
!-------------------------------------------------------------------------
! translation and rotaion of the whole system were froze (saving 4 degrees of
! freedome)
! x(1) = 0.0d0; y(1) = 0.0d0 fix one particle in the origin
! y(2) = 0.0d0 fix the second particle on the x-axis
!-------------------------------------------------------------------------
rmatrix = 100.0
minv = 0.0
maxv = 10
iter0 = 0
101 DO WHILE(maxv >= 1.1 .OR. minv <= 0.9)
iter0 = iter0 + 1
PRINT *, iter0
CALL init_random_seed()
DO i = 2, np
CALL RANDOM_NUMBER(w1)
x(i) = 10 * w1 - 5
END DO
DO i = 3, np
CALL RANDOM_NUMBER(w2)
y(i) = 10 * w2 - 5
END DO
! rmatrix contains the distances between all particles
DO i = 1, np
DO j = 1, np
IF(j .NE. i) THEN
xij = x(i) - x(j)
yij = y(i) - y(j)
rij = SQRT(xij * xij + yij * yij)
rmatrix(i,j) = rij
END IF
END DO
END DO
minv = MINVAL(rmatrix) ! This is the minimum distance between any two
! particles ( distance cannot be smaller)
! which is the left endpoint of the range interval
DO i = 1, np ! Here is my attempt to control the righ endpoint of
DO j = 1, np ! the range interval. ( This needs to be edited)
IF(j .NE. i) THEN
maxv = MIN(maxv, rmatrix(i,j))
END IF
END DO
IF(maxv >= 1.1) THEN
GOTO 101
END IF
END DO
END DO
CONTANIS
SUBROUTINE init_random_seed()
INTEGER :: i, n, clock
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
CALL RANDOM_SEED(size = n)
ALLOCATE(seed(n))
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
CALL RANDOM_SEED(PUT = seed)
END SUBROUTINE init_random_seed

How to pick a number based on probability?

I want to select a random number from 0,1,2,3...n, however I want to make it that the chance of selecting k|0<k<n will be lower by multiplication of x from selecting k - 1 so x = (k - 1) / k. As bigger the number as smaller the chances to pick it up.
As an answer I want to see the implementation of the next method:
int pickANumber(n,x)
This is for a game that I am developing, I saw those questions as related but not exactly that same:
How to pick an item by its probability
C Function for picking from a list where each element has a distinct probabili
p1 + p2 + ... + pn = 1
p1 = p2 * x
p2 = p3 * x
...
p_n-1 = pn * x
Solving this gives you:
p1 + p2 + ... + pn = 1
(p2 * x) + (p3 * x) + ... + (pn * x) + pn = 1
((p3*x) * x) + ((p4*x) * x) + ... + ((p_n-1*x) * x) + pn = 1
....
pn* (x^(n-1) + x^(n-2) + ... +x^1 + x^0) = 1
pn*(1-x^n)/(1-x) = 1
pn = (1-x)/(1-x^n)
This gives you the probability you need to set to pn, and from it you can calculate the probabilities for all other p1,p2,...p_n-1
Now, you can use a "black box" RNG that chooses a number with a distribution, like those in the threads you mentioned.
A simple approach to do it is to set an auxillary array:
aux[i] = p1 + p2 + ... + pi
Now, draw a random number with uniform distribution between 0 to aux[n], and using binary search (aux array is sorted), get the first value, which matching value in aux is greater than the random uniform number you got
Original answer, for substraction (before question was editted):
For n items, you need to solve the equation:
p1 + p2 + ... + pn = 1
p1 = p2 + x
p2 = p3 + x
...
p_n-1 = pn + x
Solving this gives you:
p1 + p2 + ... + pn = 1
(p2 + x) + (p3 + x) + ... + (pn + x) + pn = 1
((p3+x) + x) + ((p4+x) + x) + ... + ((p_n-1+x) + x) + pn = 1
....
pn* ((n-1)x + (n-2)x + ... +x + 0) = 1
pn* x = n(n-1)/2
pn = n(n-1)/(2x)
This gives you the probability you need to set to pn, and from it you can calculate the probabilities for all other p1,p2,...p_n-1
Now, you can use a "black box" RNG that chooses a number with a distribution, like those in the threads you mentioned.
Be advised, this is not guaranteed you will have a solution such that 0<p_i<1 for all i, but you cannot guarantee one given from your requirements, and it is going to depend on values of n and x to fit.
Edit This answer was for the OPs original question, which was different in that each probability was supposed to be lower by a fixed amount than the previous one.
Well, let's see what the constraints say. You want to have P(k) = P(k - 1) - x. So we have:
P(0)
P(1) = P(0) - x
P(2) = P(0) - 2x
...
In addition, Sumk P(k) = 1. Summing, we get:
1 = (n + 1)P(0) -x * n / 2 (n + 1),
This gives you an easy constraint between x and P(0). Solve for one in terms of the other.
For this I would use the Mersenne Twister algorithm for a uniform distribution which Boost provides, then have a mapping function to map the results of that random distribution to the actual number select.
Here's a quick example of a potential implementation, although I left out the quadtratic equation implementation since it is well known:
int f_of_xib(int x, int i, int b)
{
return x * i * i / 2 + b * i;
}
int b_of_x(int i, int x)
{
return (r - ( r ) / 2 );
}
int pickANumber(mt19937 gen, int n, int x)
{
// First, determine the range r required where the probability equals i * x
// since probability of each increasing integer is x higher of occuring.
// Let f(i) = r and given f'(i) = x * i then r = ( x * i ^2 ) / 2 + b * i
// where b = ( r - ( x * i ^ 2 ) / 2 ) / i . Since r = x when i = 1 from problem
// definition, this reduces down to b = r - r / 2. therefore to find r_max simply
// plugin x to find b, then plugin n for i, x, and b to get r_max since r_max occurs
// when n == i.
// Find b when
int b = b_of_x(x);
int r_max = f_of_xib(x, n, b);
boost::uniform_int<> range(0, r_max);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > next(gen, range);
// Now to map random number to desired number, just find the positive value for i
// when r is the return random number which boils down to finding the non-zero root
// when 0 = ( x * i ^ 2 ) / 2 + b * i - r
int random_number = next();
return quadtratic_equation_for_positive_value(1, b, r);
}
int main(int argc, char** argv)
{
mt19937 gen;
gen.seed(time(0));
pickANumber(gen, 10, 1);
system("pause");
}

vectorize/optimize this code in MATLAB?

I am building my first large-scale MATLAB program, and I've managed to write original vectorized code for everything so for until I came to trying to create an image representing vector density in stereographic projection. After a couple failed attempts I went to the Mathworks file exchange site and found an open source program which fits my needs courtesy of Malcolm Mclean. With a test matrix his function produces something like this:
And while this is almost exactly what I wanted, his code relies on a triply nested for-loop. On my workstation a test data matrix of size 25000x2 took 65 seconds in this section of code. This is unacceptable since I will be scaling up to a data matrices of size 500000x2 in my project.
So far I've been able to vectorize the innermost loop (which was the longest/worst loop), but I would like to continue and be rid of the loops entirely if possible. Here is Malcolm's original code that I need to vectorize:
dmap = zeros(height, width); % height, width: scalar with default value = 32
for ii = 0: height - 1 % 32 iterations of this loop
yi = limits(3) + ii * deltay + deltay/2; % limits(3) & deltay: scalars
for jj = 0 : width - 1 % 32 iterations of this loop
xi = limits(1) + jj * deltax + deltax/2; % limits(1) & deltax: scalars
dd = 0;
for kk = 1: length(x) % up to 500,000 iterations in this loop
dist2 = (x(kk) - xi)^2 + (y(kk) - yi)^2;
dd = dd + 1 / ( dist2 + fudge); % fudge is a scalar
end
dmap(ii+1,jj+1) = dd;
end
end
And here it is with the changes I've already made to the innermost loop (which was the biggest drain on efficiency). This cuts the time from 65 seconds down to 12 seconds on my machine for the same test matrix, which is better but still far slower than I would like.
dmap = zeros(height, width);
for ii = 0: height - 1
yi = limits(3) + ii * deltay + deltay/2;
for jj = 0 : width - 1
xi = limits(1) + jj * deltax + deltax/2;
dist2 = (x - xi) .^ 2 + (y - yi) .^ 2;
dmap(ii + 1, jj + 1) = sum(1 ./ (dist2 + fudge));
end
end
So my main question, are there any further changes I can make to optimize this code? Or even an alternative method to approach the problem? I've considered using C++ or F# instead of MATLAB for this section of the program, and I may do so if I cannot get to a reasonable efficiency level with the MATLAB code.
Please also note that at this point I don't have ANY additional toolboxes, if I did then I know this would be trivial (using hist3 from the statistics toolbox for example).
Mem consuming solution
yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width ) - .5 * deltax;
dx = bsxfun( #minus, x(:), xi ) .^ 2;
dy = bsxfun( #minus, y(:), yi ) .^ 2;
dist2 = bsxfun( #plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
dmap = sum( 1./(dist2 + fudge ) , 3 );
EDIT
handling extremely large x and y by breaking the operation into blocks:
blockSize = 50000; % process up to XX elements at once
dmap = 0;
yi = limits(3) + deltay * ( 1:height ) - .5 * deltay;
xi = limits(1) + deltax * ( 1:width ) - .5 * deltax;
bi = 1;
while bi <= numel(x)
% take a block of x and y
bx = x( bi:min(end, bi + blockSize - 1) );
by = y( bi:min(end, bi + blockSize - 1) );
dx = bsxfun( #minus, bx(:), xi ) .^ 2;
dy = bsxfun( #minus, by(:), yi ) .^ 2;
dist2 = bsxfun( #plus, permute( dy, [2 3 1] ), permute( dx, [3 2 1] ) );
dmap = dmap + sum( 1./(dist2 + fudge ) , 3 );
bi = bi + blockSize;
end
This is a good example of why starting a loop from 1 matters. The only reason that ii and jj are initiated at 0 is to kill the ii * deltay and jj * deltax terms which however introduces sequentiality in the dmap indexing, preventing parallelization.
Now, by rewriting the loops you could use parfor() after opening a matlabpool:
dmap = zeros(height, width);
yi = limits(3) + deltay*(1:height) - .5*deltay;
matlabpool 8
parfor ii = 1: height
for jj = 1: width
xi = limits(1) + (jj-1) * deltax + deltax/2;
dist2 = (x - xi) .^ 2 + (y - yi(ii)) .^ 2;
dmap(ii, jj) = sum(1 ./ (dist2 + fudge));
end
end
matlabpool close
Keep in mind that opening and closing the pool has significant overhead (10 seconds on my Intel Core Duo T9300, vista 32 Matlab 2013a).
PS. I am not sure whether the inner loop instead of the outer one can be meaningfully parallelized. You can try to switch the parfor to the inner one and compare speeds (I would recommend going for the big matrix immediately since you are already running in 12 seconds and the overhead is almost as big).
Alternatively, this problem can be solved in using kernel density estimation techniques. This is part of the Statistics Toolbox, or there's this KDE implementation by Zdravko Botev (no toolboxes required).
For the example code below, I get 0.3 seconds for N = 500000, or 0.7 seconds for N = 1000000.
N = 500000;
data = [randn(N,2); rand(N,1)+3.5, randn(N,1);]; % 2 overlaid distrib
tic; [bandwidth,density,X,Y] = kde2d(data); toc;
imagesc(density);

Trilateration and locating the point (x,y,z)

I want to find the coordinate of an unknown node which lie somewhere in the space which has its reference distance away from 3 or more nodes which all of them have known coordinate.
This problem is exactly like Trilateration as described here Trilateration.
However, I don't understand the part about "Preliminary and final computations" (refer to the wikipedia site). I don't get where I could find P1, P2 and P3 just so I can put to those equation?
Thanks
Trilateration is the process of finding the center of the area of intersection of three spheres. The center point and radius of each of the three spheres must be known.
Let's consider your three example centerpoints P1 [-1,1], P2 [1,1], and P3 [-1,-1]. The first requirement is that P1' be at the origin, so let us adjust the points accordingly by adding an offset vector V [1,-1] to all three:
P1' = P1 + V = [0, 0]
P2' = P2 + V = [2, 0]
P3' = P3 + V = [0,-2]
Note: Adjusted points are denoted by the ' (prime) annotation.
P2' must also lie on the x-axis. In this case it already does, so no adjustment is necessary.
We will assume the radius of each sphere to be 2.
Now we have 3 equations (given) and 3 unknowns (X, Y, Z of center-of-intersection point).
Solve for P4'x:
x = (r1^2 - r2^2 + d^2) / 2d //(d,0) are coords of P2'
x = (2^2 - 2^2 + 2^2) / 2*2
x = 1
Solve for P4'y:
y = (r1^2 - r3^2 + i^2 + j^2) / 2j - (i/j)x //(i,j) are coords of P3'
y = (2^2 - 2^2 + 0 + -2^2) / 2*-2 - 0
y = -1
Ignore z for 2D problems.
P4' = [1,-1]
Now we translate back to original coordinate space by subtracting the offset vector V:
P4 = P4' - V = [0,0]
The solution point, P4, lies at the origin as expected.
The second half of the article is describing a method of representing a set of points where P1 is not at the origin or P2 is not on the x-axis such that they fit those constraints. I prefer to think of it instead as a translation, but both methods will result in the same solution.
Edit: Rotating P2' to the x-axis
If P2' does not lie on the x-axis after translating P1 to the origin, we must perform a rotation on the view.
First, let's create some new vectors to use as an example:
P1 = [2,3]
P2 = [3,4]
P3 = [5,2]
Remember, we must first translate P1 to the origin. As always, the offset vector, V, is -P1. In this case, V = [-2,-3]
P1' = P1 + V = [2,3] + [-2,-3] = [0, 0]
P2' = P2 + V = [3,4] + [-2,-3] = [1, 1]
P3' = P3 + V = [5,2] + [-2,-3] = [3,-1]
To determine the angle of rotation, we must find the angle between P2' and [1,0] (the x-axis).
We can use the dot product equality:
A dot B = ||A|| ||B|| cos(theta)
When B is [1,0], this can be simplified: A dot B is always just the X component of A, and ||B|| (the magnitude of B) is always a multiplication by 1, and can therefore be ignored.
We now have Ax = ||A|| cos(theta), which we can rearrange to our final equation:
theta = acos(Ax / ||A||)
or in our case:
theta = acos(P2'x / ||P2'||)
We calculate the magnitude of P2' using ||A|| = sqrt(Ax + Ay + Az)
||P2'|| = sqrt(1 + 1 + 0) = sqrt(2)
Plugging that in we can solve for theta
theta = acos(1 / sqrt(2)) = 45 degrees
Now let's use the rotation matrix to rotate the scene by -45 degrees.
Since P2'y is positive, and the rotation matrix rotates counter-clockwise, we'll use a negative rotation to align P2 to the x-axis (if P2'y is negative, don't negate theta).
R(theta) = [cos(theta) -sin(theta)]
[sin(theta) cos(theta)]
R(-45) = [cos(-45) -sin(-45)]
[sin(-45) cos(-45)]
We'll use double prime notation, '', to denote vectors which have been both translated and rotated.
P1'' = [0,0] (no need to calculate this one)
P2'' = [1 cos(-45) - 1 sin(-45)] = [sqrt(2)] = [1.414]
[1 sin(-45) + 1 cos(-45)] = [0] = [0]
P3'' = [3 cos(-45) - (-1) sin(-45)] = [sqrt(2)] = [ 1.414]
[3 sin(-45) + (-1) cos(-45)] = [-2*sqrt(2)] = [-2.828]
Now you can use P1'', P2'', and P3'' to solve for P4''. Apply the reverse rotation to P4'' to get P4', then the reverse translation to get P4, your center point.
To undo the rotation, multiply P4'' by R(-theta), in this case R(45). To undo the translation, subtract the offset vector V, which is the same as adding P1 (assuming you used -P1 as your V originally).
This is the algorithm I use in a 3D printer firmware. It avoids rotating the coordinate system, but it may not be the best.
There are 2 solutions to the trilateration problem. To get the second one, replace "- sqrtf" by "+ sqrtf" in the quadratic equation solution.
Obviously you can use doubles instead of floats if you have enough processor power and memory.
// Primary parameters
float anchorA[3], anchorB[3], anchorC[3]; // XYZ coordinates of the anchors
// Derived parameters
float Da2, Db2, Dc2;
float Xab, Xbc, Xca;
float Yab, Ybc, Yca;
float Zab, Zbc, Zca;
float P, Q, R, P2, U, A;
...
inline float fsquare(float f) { return f * f; }
...
// Precompute the derived parameters - they don't change unless the anchor positions change.
Da2 = fsquare(anchorA[0]) + fsquare(anchorA[1]) + fsquare(anchorA[2]);
Db2 = fsquare(anchorB[0]) + fsquare(anchorB[1]) + fsquare(anchorB[2]);
Dc2 = fsquare(anchorC[0]) + fsquare(anchorC[1]) + fsquare(anchorC[2]);
Xab = anchorA[0] - anchorB[0];
Xbc = anchorB[0] - anchorC[0];
Xca = anchorC[0] - anchorA[0];
Yab = anchorA[1] - anchorB[1];
Ybc = anchorB[1] - anchorC[1];
Yca = anchorC[1] - anchorA[1];
Zab = anchorB[2] - anchorC[2];
Zbc = anchorB[2] - anchorC[2];
Zca = anchorC[2] - anchorA[2];
P = ( anchorB[0] * Yca
- anchorA[0] * anchorC[1]
+ anchorA[1] * anchorC[0]
- anchorB[1] * Xca
) * 2;
P2 = fsquare(P);
Q = ( anchorB[1] * Zca
- anchorA[1] * anchorC[2]
+ anchorA[2] * anchorC[1]
- anchorB[2] * Yca
) * 2;
R = - ( anchorB[0] * Zca
+ anchorA[0] * anchorC[2]
+ anchorA[2] * anchorC[0]
- anchorB[2] * Xca
) * 2;
U = (anchorA[2] * P2) + (anchorA[0] * Q * P) + (anchorA[1] * R * P);
A = (P2 + fsquare(Q) + fsquare(R)) * 2;
...
// Calculate Cartesian coordinates given the distances to the anchors (La, Lb and Lc)
// First calculate PQRST such that x = (Qz + S)/P, y = (Rz + T)/P.
// P, Q and R depend only on the anchor positions, so they are pre-computed
const float S = - Yab * (fsquare(Lc) - Dc2)
- Yca * (fsquare(Lb) - Db2)
- Ybc * (fsquare(La) - Da2);
const float T = - Xab * (fsquare(Lc) - Dc2)
+ Xca * (fsquare(Lb) - Db2)
+ Xbc * (fsquare(La) - Da2);
// Calculate quadratic equation coefficients
const float halfB = (S * Q) - (R * T) - U;
const float C = fsquare(S) + fsquare(T) + (anchorA[1] * T - anchorA[0] * S) * P * 2 + (Da2 - fsquare(La)) * P2;
// Solve the quadratic equation for z
float z = (- halfB - sqrtf(fsquare(halfB) - A * C))/A;
// Substitute back for X and Y
float x = (Q * z + S)/P;
float y = (R * z + T)/P;
Here are the Wikipedia calculations, presented in an OpenSCAD script, which I think helps to understand the problem in a visual wayand provides an easy way to check that the results are correct. Example output from the script
// Trilateration example
// from Wikipedia
//
// pA, pB and pC are the centres of the spheres
// If necessary the spheres must be translated
// and rotated so that:
// -- all z values are 0
// -- pA is at the origin
pA = [0,0,0];
// -- pB is on the x axis
pB = [10,0,0];
pC = [9,7,0];
// rA , rB and rC are the radii of the spheres
rA = 9;
rB = 5;
rC = 7;
if ( pA != [0,0,0]){
echo ("ERROR: pA must be at the origin");
assert(false);
}
if ( (pB[2] !=0 ) || pC[2] !=0){
echo("ERROR: all sphere centers must be in z = 0 plane");
assert(false);
}
if (pB[1] != 0){
echo("pB centre must be on the x axis");
assert(false);
}
// show the spheres
module spheres(){
translate (pA){
sphere(r= rA, $fn = rA * 10);
}
translate(pB){
sphere(r = rB, $fn = rB * 10);
}
translate(pC){
sphere (r = rC, $fn = rC * 10);
}
}
function unit_vector( v) = v / norm(v);
ex = unit_vector(pB - pA) ;
echo(ex = ex);
i = ex * ( pC - pA);
echo (i = i);
ey = unit_vector(pC - pA - i * ex);
echo (ey = ey);
d = norm(pB - pA);
echo (d = d);
j = ey * ( pC - pA);
echo (j = j);
x = (pow(rA,2) - pow(rB,2) + pow(d,2)) / (2 * d);
echo( x = x);
// size of the cube to subtract to show
// the intersection of the spheres
cube_size = [10,10,10];
if ( ((d - rA) >= rB) || ( rB >= ( d + rA)) ){
echo ("Error Y not solvable");
}else{
y = (( pow(rA,2) - pow(rC,2) + pow(i,2) + pow(j,2)) / (2 * j))
- ( i / j) * x;
echo(y = y);
zpow2 = pow(rA,2) - pow(x,2) - pow(y,2);
if ( zpow2 < 0){
echo ("z not solvable");
}else{
z = sqrt(zpow2);
echo (z = z);
// subtract a cube with one of its corners
// at the point where the sphers intersect
difference(){
spheres();
translate ([x,y - cube_size[1],z]){
cube(cube_size);
}
}
translate ([x,y - cube_size[1],z]){
%cube(cube_size);
}
}
}

Resources