What is most efficient way of finding N nearest turtles? - performance

In Netlogo (5.3.* or 6.*) I need an efficient way of one turtle to find the agentset of the N nearest (in terms of Cartesian distance).
The obvious code:
min-n-of N turtles [distance myself]
involves NetLogo calculating the distance to all other turtles, so O(N).
if one knows enough turtles are on the local patch one can do
min-n-of N turtles-here [distance myself]
but testing if there are enough on the patch takes time

Related

find best k-means from a list of candidates

I have an array of size n of points called A, and a candidate array of size O(k)>k called S. I want to find k points in S such that the sum of squared distances from the points of A to their closest point from the k points would be minimized. One way to do it would be to check the cost of any possible k points in S and take the minimum, but that would take O(k^k*n) time, is there any more efficient way to do it?
I need either an optimal solution or a constant approximation.
The reason I need this is that I'm trying to find a constant approximation for the k-means as fast as possible and later use this for a coreset construction (coreset=data minimization while still keeping the cost of any query approximately the same). I was able to show that if we assume that in the optimal clustering each cluster has omega(n/k) points we can create pretty fast a list of size O(k) canidates that contains inside of them a 3-approximation for the k-means, so I was wondering if we can find those k points or a constant approximation for their costs in time which is faster than exhaustive search.
Example for k=2
In this example S is the green dots and A is the red dots. The algorithm should return the 2 circled points from S since they minimize the sum of squared distances from the points of A to their closest point of the 2.
I have an array of size n of points called A, and a candidate array of size O(k)>k called S. I want to find k points in S such that the sum of squared distances from the points of A to their closest point from the k points would be minimized.
It sounds like this could be solved simply by checking the N points against the K points to find the k points in N with the smallest squared distance.
Therefore, I'm now fairly sure this is actually finding the k-nearest neighbors (K-NN as a computational geometry problem, not the pattern recognition definition) in the N points for each point in the K points and not actually k-means.
For higher dimensionality, it is often useful to also consider the dimensionality, D in the algorithm.
The algorithm mentioned is indeed O(NDk^2) then when considering K-NN instead. That can be improved to O(NDk) by using Quickselect algorithm on the distances. This allows for checking the list of N points against each of the K points in O(N) to find the nearest k points.
https://en.wikipedia.org/wiki/Quickselect
Edit:
Seems there is some confusion on quickselect and if it can be used. Here is a O(DkNlogN) solution that uses a standard sort O(NlogN) instead of quickselect O(N). Though this might be faster in practice and as you can see in most languages it's pretty easy to implement.
results = {}
for y in F:
def distanceSquared(x):
distance(x,y) # Custom distance for each y
# First k sorted by distanceSquared
results[y] = S.sort(key=distanceSquared)[:k]
return results
Update for new visual
# Build up distance sums O(A*N*D)
results = {}
for y in F:
def distanceSquared(x):
distance(x,y) # Custom distance for each y
# Sum of distance squared from y for all points in S
results[y] = sum(map(distanceSquared, S))
def results_key_value(key):
results[key]
# First k results sorted by key O(D*AlogA)
results.keys().sort(key=results_key_value)[:k]
You could approximate by only considering Z random points chosen from the S points. Alternatively, you could merge points in S if they are close enough together. This could reduce S to a much smaller size as long S remains about F^2 or larger in size, it shouldn't affect which points in F are chosen too much. Though you would also need to adjust the weight of the points to handle that better. IE: the square distance of a point that represents 10 points is multiplied by 10 to account for it acting as 10 points instead of just 1.

Hill climbing in an n-dimensional space: finding the neighbor

In hill climbing for 1 dimension, I try two neighbors - a small delta to the left and one to the right of my current point, and then keep the one that gives a higher value of the objective function. How do I extend it to an n-dimensional space? How does one define a neighbor for an n-dimensional space? Do I have to try 2^n neighbors (a delta applied to each of the dimension)?
You don't need to compare each pair of neighbors, you need to compute a set of neighbors, e.g. on a circle (sphere/ hypersphere in a higher dimensions) with a radius of delta, and then take the one with the highest values to "climb up". In any case you will discretize the neighborhood of your current solution and compute the score function for each neighbor. When you can differentiate your function, than, Gradient ascent/descent based algorithms may solve your problem:
1) Compute the gradient (direction of steepest ascent)
2) Go a small step into the direction of the gradient
3) Stop if solution does not change
A common problem with those algorithms is, that you often only find local maxima / minima. You can find a great overview on gradient descent/ascent algorithms here: http://sebastianruder.com/optimizing-gradient-descent/
If you are using IEEE-754 floating point numbers then the obvious answer is something like (2^52*(log_2(delta)+1023))^(n-1)+1 if delta>=2^(-1022) (more or less depending on your search space...) as that is the only way you can be certain that there are no more neighboring solutions with a distance of delta.
Even assuming you instead take a random fixed size sample of all points within a given distance of delta, lets say delta=.1, you would still have the problem that if the distance from the local optimum was .0001 the probability of finding an improvement in just 1 dimension would be less than .0001/.1/2=0.05% so you would need to take more and more random samples as you get closer to the local optimum (of which you don't know the value...).
Obviously hill climbing is not intended for the real number space or theoretical graph spaces with infinite degree. You should instead be using a global search algorithm.
One example of a multidimensional search algorithm which needs only O(n) neighbours instead of O(2^n) neighbours is the Torczon simplex method described in Multidirectional search: A direct search algorithm for parallel machines (1989). I chose this over the more widely known Nelder-Mead method because the Torczon simplex method has a convergence proof (convergence to a local optimum given some reasonable conditions).

Optimisation to minimse distance between nearest neighbours Matlab

I have identified nearest neighbours amongst a population. I wish to assign a vector of weights to the population such that the difference in weights between nearest neighbours is minimised via an optimisation. I have built:
fun=#(x)sum(nthroot(x-logicalmatrix*x),2)
A=ones(1,height(Population));
b=1;
Aeq=A;
beq=1; % Solution should sum to 1
lb=zeros(height(Population),1); % Lower Bounds
ub=ones(height(Population),1); % UpperBounds
[opt_combinedfun,~,residualCombfun]=fmincon(fun,lb,A,b,Aeq,beq,lb,ub,[],options);
However, although sometimes it returns a solution within bounds it does not appear optimal. The 'logicalmatrix' is a n x n logical matrix identifying the nearest neighbours. The problem is that logicalmatrix is singular which causes the optimisation to return:
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate.
is fmincon the wrong function to use? Or is there a way to get around the singularity? A more robust way to achieve this optimisation?

find a point closest to other points

Given N points(in 2D) with x and y coordinates. You have to find a point P (in N given points) such that the sum of distances from other(N-1) points to P is minimum.
for ex. N points given p1(x1,y1),p2(x2,y2) ...... pN(xN,yN).
we have find a point P among p1 , p2 .... PN whose sum of distances from all other points is minimum.
I used brute force approach , but I need a better approach. I also tried by finding median, mean etc. but it is not working for all cases.
then I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?
If your points are nicely distributed and if there are so many of them that brute force (calculating the total distance from each point to every other point) is unappealing the following might give you a good enough answer. By 'nicely distributed' I mean (approximately) uniformly or (approximately) randomly and without marked clustering in multiple locations.
Create a uniform k*k grid, where k is an odd integer, across your space. If your points are nicely distributed the one which you are looking for is (probably) in the central cell of this grid. For all the other cells in the grid count the number of points in each cell and approximate the average position of the points in each cell (either use the cell centre or calculate the average (x,y) for points in the cell).
For each point in the central cell, compute the distance to every other point in the central cell, and the weighted average distance to the points in the other cells. This will, of course, be the distance from the point to the 'average' position of points in the other cells, weighted by the number of points in the other cells.
You'll have to juggle the increased accuracy of higher values for k against the increased computational load and figure out what works best for your points. If the distribution of points across cells is far from uniform then this approach may not be suitable.
This sort of approach is quite widely used in large-scale simulations where points have properties, such as gravity and charge, which operate over distances. Whether it suits your needs, I don't know.
The point in consideration is known as the Geometric Median
The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general.
I'm not sure if I understand your question but when you calculate the minimum spanning tree the sum from any point to any other point from the tree is minimum.

algorithm to find a point among n points in plane to minimize the sum of distances

I have an algorithm problem here. It is different from the normal Fermat Point problem.
Given a set of n points in the plane, I need to find which one can minimize the sum of distances to the rest of n-1 points.
Is there any algorithm you know of run less than O(n^2)?
Thank you.
One solution is to assume median is close to the mean and for a subset of points close to the mean exhaustively calculate sum of distances. You can choose klog(n) points closest to the mean, where k is an arbitrarily chosen constant (complexity nlog(n)).
Another possible solution is Delaunay Triangulation. This triangulation is possible in O(nlogn) time. The triangulation results in a graph with one vertex for each point and edges to satisfy delauney triangulation.
Once you have the triangulation, you can start at any point and compare sum-of-distances of that point to its neighbors and keep moving iteratively. You can stop when the current point has the minimum sum-of-distance compared to its neighbors. Intuitively, this will halt at the global optimal point.
I think the underlying assumption here is that you have a dataset of points which you can easily bound, as many algorithms which would be "good enough" in practice may not be rigorous enough for theory and/or may not scale well for arbitrarily large solutions.
A very simple solution which is probably "good enough" is to sort the coordinates on the Y ordinate, then do a stable sort on the X ordinate.
Take the rectangle defined by the min(X,Y) and max(X,Y) values, complexity O(1) as the values will be at known locations in the sorted dataset.
Now, working from the center of your sorted dataset, find coordinate values as close as possible to {Xctr = Xmin + (Xmax - Xmin) / 2, Yctr = Ymin + (Ymax - Ymin) / 2} -- complexity O(N) bounded by your minimization criteria, distance being the familiar radius from {Xctr,Yctr}.
The worst case complexity would be comparing your centroid to every other point, but once you get away from the middle points you will not be improving the global optimal and should terminate the search.

Resources