I have a lot of points on the surface of the sphere.
How can I calculate the area/spot of the sphere that has the largest point density?
I need this to be done very fast. If this was a square for example I guess I could create a grid and then let the points vote which part of the grid is the best.
I have tried with transforming the points to spherical coordinates and then do a grid, both this did not work well since points around north pole are close on the sphere but distant after the transform.
Thanks
There is in fact no real reason to partition the sphere into a regular non-overlapping mesh, try this:
partition your sphere into semi-overlapping circles
see here for generating uniformly distributed points (your circle centers)
Dispersing n points uniformly on a sphere
you can identify the points in each circle very fast by a simple dot product..it really doesn't matter if some points are double counted, the circle with the most points still represents the highest density
mathematica implementation
this takes 12 seconds to analyze 5000 points. (and took about 10 minutes to write )
testcircles = { RandomReal[ {0, 1}, {3}] // Normalize};
Do[While[ (test = RandomReal[ {-1, 1}, {3}] // Normalize ;
Select[testcircles , #.test > .9 & , 1] ) == {} ];
AppendTo[testcircles, test];, {2000}];
vmax = testcircles[[First#
Ordering[-Table[
Count[ (testcircles[[i]].#) & /# points , x_ /; x > .98 ] ,
{i, Length[testcircles]}], 1]]];
To add some other, alternative schemes to the mix: it's possible to define a number of (almost) regular grids on sphere-like geometries by refining an inscribed polyhedron.
The first option is called an icosahedral grid, which is a triangulation of the spherical surface. By joining the centres of the triangles about each vertex, you can also create a dual hexagonal grid based on the underlying triangulation:
Another option, if you dislike triangles (and/or hexagons) is the cubed-sphere grid, formed by subdividing the faces of an inscribed cube and projecting the result onto the spherical surface:
In either case, the important point is that the resulting grids are almost regular -- so to evaluate the region of highest density on the sphere you can simply perform a histogram-style analysis, counting the number of samples per grid cell.
As a number of commenters have pointed out, to account for the slight irregularity in the grid it's possible to normalise the histogram counts by dividing through by the area of each grid cell. The resulting density is then given as a "per unit area" measure. To calculate the area of each grid cell there are two options: (i) you could calculate the "flat" area of each cell, by assuming that the edges are straight lines -- such an approximation is probably pretty good when the grid is sufficiently dense, or (ii) you can calculate the "true" surface areas by evaluating the necessary surface integrals.
If you are interested in performing the requisite "point-in-cell" queries efficiently, one approach is to construct the grid as a quadtree -- starting with a coarse inscribed polyhedron and refining it's faces into a tree of sub-faces. To locate the enclosing cell you can simply traverse the tree from the root, which is typically an O(log(n)) operation.
You can get some additional information regarding these grid types here.
Treating points on a sphere as 3D points might not be so bad.
Try either:
Select k, do approximate k-NN search in 3D for each point in the data or selected point of interest, then weight the result by their distance to the query point. Complexity may vary for different approximate k-NN algorithms.
Build a space-partitioning data structure like k-d Tree, then do approximate (or exact) range counting query with a ball range centered at each point in the data or selected point of interest. Complexity is O(log(n) + epsilon^(-3)) or O(epsilon^(-3)*log(n)) for each approximate range query with state of the art algorithms, where epsilon is the range error threshold w.r.t. the size of the querying ball. For exact range query, the complexity is O(n^(2/3)) for each query.
Partition the sphere into equal-area regions (bounded by parallels and meridians) as described in my answer there and count the points in each region.
The aspect ratio of the regions will not be uniform (the equatorial regions will be more "squarish" when N~M, while the polar regions will be more elongated).
This is not a problem because the diameters of the regions go to 0 as N and M increase.
The computational simplicity of this method trumps the better uniformity of domains in the other excellent answers which contain beautiful pictures.
One simple modification would be to add two "polar cap" regions to the N*M regions described in the linked answer to improve the numeric stability (when the point is very close to a pole, its longitude is not well defined). This way the aspect ratio of the regions is bounded.
You can use the Peters projection, which preserves the areas.
This will allow you to efficiently count the points in a grid, but also in a sliding window (box Parzen window) by using the integral image trick.
If I understand correctly, you are trying to find the densepoint on sphere.
if points are denser at some point
Consider Cartesian coordinates and find the mean X,Y,Z of points
Find closest point to mean X,Y,Z that is on sphere (you may consider using spherical coordinates, just extend the radius to original radius).
Constraints
If distance between mean X,Y,Z and the center is less than r/2, then this algorithm may not work as desired.
I am not master of mathematics but may be it can solve by analytical way as:
1.Short the coordinate
2.R=(Σ(n=0. n=max)(Σ(m=0. M=n)(1/A^diff_in_consecative))*angle)/Σangle
A=may any constant
This is really just an inverse of this answer of mine
just invert the equations of equidistant sphere surface vertexes to surface cell index. Don't even try to visualize the cell different then circle or you go mad. But if someone actually do it then please post the result here (and let me now)
Now just create 2D cell map and do the density computation in O(N) (like histograms are done) similar to what Darren Engwirda propose in his answer
This is how the code looks like in C++
//---------------------------------------------------------------------------
const int na=16; // sphere slices
int nb[na]; // cells per slice
const int na2=na<<1;
int map[na][na2]; // surface cells
const double da=M_PI/double(na-1); // latitude angle step
double db[na]; // longitude angle step per slice
// sherical -> orthonormal
void abr2xyz(double &x,double &y,double &z,double a,double b,double R)
{
double r;
r=R*cos(a);
z=R*sin(a);
y=r*sin(b);
x=r*cos(b);
}
// sherical -> surface cell
void ab2ij(int &i,int &j,double a,double b)
{
i=double(((a+(0.5*M_PI))/da)+0.5);
if (i>=na) i=na-1;
if (i< 0) i=0;
j=double(( b /db[i])+0.5);
while (j< 0) j+=nb[i];
while (j>=nb[i]) j-=nb[i];
}
// sherical <- surface cell
void ij2ab(double &a,double &b,int i,int j)
{
if (i>=na) i=na-1;
if (i< 0) i=0;
a=-(0.5*M_PI)+(double(i)*da);
b= double(j)*db[i];
}
// init variables and clear map
void ij_init()
{
int i,j;
double a;
for (a=-0.5*M_PI,i=0;i<na;i++,a+=da)
{
nb[i]=ceil(2.0*M_PI*cos(a)/da); // compute actual circle cell count
if (nb[i]<=0) nb[i]=1;
db[i]=2.0*M_PI/double(nb[i]); // longitude angle step
if ((i==0)||(i==na-1)) { nb[i]=1; db[i]=1.0; }
for (j=0;j<na2;j++) map[i][j]=0; // clear cell map
}
}
//---------------------------------------------------------------------------
// this just draws circle from point x0,y0,z0 with normal nx,ny,nz and radius r
// need some vector stuff of mine so i did not copy the body here (it is not important)
void glCircle3D(double x0,double y0,double z0,double nx,double ny,double nz,double r,bool _fill);
//---------------------------------------------------------------------------
void analyse()
{
// n is number of points and r is just visual radius of sphere for rendering
int i,j,ii,jj,n=1000;
double x,y,z,a,b,c,cm=1.0/10.0,r=1.0;
// init
ij_init(); // init variables and map[][]
RandSeed=10; // just to have the same random points generated every frame (do not need to store them)
// generate draw and process some random surface points
for (i=0;i<n;i++)
{
a=M_PI*(Random()-0.5);
b=M_PI* Random()*2.0 ;
ab2ij(ii,jj,a,b); // cell corrds
abr2xyz(x,y,z,a,b,r); // 3D orthonormal coords
map[ii][jj]++; // update cell density
// this just draw the point (x,y,z) as line in OpenGL so you can ignore this
double w=1.1; // w-1.0 is rendered line size factor
glBegin(GL_LINES);
glColor3f(1.0,1.0,1.0); glVertex3d(x,y,z);
glColor3f(0.0,0.0,0.0); glVertex3d(w*x,w*y,w*z);
glEnd();
}
// draw cell grid (color is function of density)
for (i=0;i<na;i++)
for (j=0;j<nb[i];j++)
{
ij2ab(a,b,i,j); abr2xyz(x,y,z,a,b,r);
c=map[i][j]; c=0.1+(c*cm); if (c>1.0) c=1.0;
glColor3f(0.2,0.2,0.2); glCircle3D(x,y,z,x,y,z,0.45*da,0); // outline
glColor3f(0.1,0.1,c ); glCircle3D(x,y,z,x,y,z,0.45*da,1); // filled by bluish color the more dense the cell the more bright it is
}
}
//---------------------------------------------------------------------------
The result looks like this:
so now just see what is in the map[][] array you can find the global/local min/max of density or whatever you need... Just do not forget that the size is map[na][nb[i]] where i is the first index in array. The grid size is controlled by na constant and cm is just density to color scale ...
[edit1] got the Quad grid which is far more accurate representation of used mapping
this is with na=16 the worst rounding errors are on poles. If you want to be precise then you can weight density by cell surface size. For all non pole cells it is simple quad. For poles its triangle fan (regular polygon)
This is the grid draw code:
// draw cell quad grid (color is function of density)
int i,j,ii,jj;
double x,y,z,a,b,c,cm=1.0/10.0,mm=0.49,r=1.0;
double dx=mm*da,dy;
for (i=1;i<na-1;i++) // ignore poles
for (j=0;j<nb[i];j++)
{
dy=mm*db[i];
ij2ab(a,b,i,j);
c=map[i][j]; c=0.1+(c*cm); if (c>1.0) c=1.0;
glColor3f(0.2,0.2,0.2);
glBegin(GL_LINE_LOOP);
abr2xyz(x,y,z,a-dx,b-dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a-dx,b+dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a+dx,b+dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a+dx,b-dy,r); glVertex3d(x,y,z);
glEnd();
glColor3f(0.1,0.1,c );
glBegin(GL_QUADS);
abr2xyz(x,y,z,a-dx,b-dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a-dx,b+dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a+dx,b+dy,r); glVertex3d(x,y,z);
abr2xyz(x,y,z,a+dx,b-dy,r); glVertex3d(x,y,z);
glEnd();
}
i=0; j=0; ii=i+1; dy=mm*db[ii];
ij2ab(a,b,i,j); c=map[i][j]; c=0.1+(c*cm); if (c>1.0) c=1.0;
glColor3f(0.2,0.2,0.2);
glBegin(GL_LINE_LOOP);
for (j=0;j<nb[ii];j++) { ij2ab(a,b,ii,j); abr2xyz(x,y,z,a-dx,b-dy,r); glVertex3d(x,y,z); }
glEnd();
glColor3f(0.1,0.1,c );
glBegin(GL_TRIANGLE_FAN); abr2xyz(x,y,z,a ,b ,r); glVertex3d(x,y,z);
for (j=0;j<nb[ii];j++) { ij2ab(a,b,ii,j); abr2xyz(x,y,z,a-dx,b-dy,r); glVertex3d(x,y,z); }
glEnd();
i=na-1; j=0; ii=i-1; dy=mm*db[ii];
ij2ab(a,b,i,j); c=map[i][j]; c=0.1+(c*cm); if (c>1.0) c=1.0;
glColor3f(0.2,0.2,0.2);
glBegin(GL_LINE_LOOP);
for (j=0;j<nb[ii];j++) { ij2ab(a,b,ii,j); abr2xyz(x,y,z,a-dx,b+dy,r); glVertex3d(x,y,z); }
glEnd();
glColor3f(0.1,0.1,c );
glBegin(GL_TRIANGLE_FAN); abr2xyz(x,y,z,a ,b ,r); glVertex3d(x,y,z);
for (j=0;j<nb[ii];j++) { ij2ab(a,b,ii,j); abr2xyz(x,y,z,a-dx,b+dy,r); glVertex3d(x,y,z); }
glEnd();
the mm is the grid cell size mm=0.5 is full cell size , less creates a space between cells
If you want a radial region of the greatest density, this is the robust disk covering problem with k = 1 and dist(a, b) = great circle distance (a, b) (see https://en.wikipedia.org/wiki/Great-circle_distance)
https://www4.comp.polyu.edu.hk/~csbxiao/paper/2003%20and%20before/PDCS2003.pdf
Consider using a geographic method to solve this. GIS tools, geography data types in SQL, etc. all handle curvature of a spheroid. You might have to find a coordinate system that uses a pure sphere instead of an earthlike spheroid if you are not actually modelling something on Earth.
For speed, if you have large numbers of points and want the densest location of them, a raster heatmap type solution might work well. You could create low resolution rasters, then zoom to areas of high density and create higher resolution only cells that you care about.
I have upto 10,000 randomly positioned points in a space and i need to be able to tell which the cursor is closest to at any given time. To add some context, the points are in the form of a vector drawing, so they can be constantly and quickly added and removed by the user and also potentially be unbalanced across the canvas space..
I am therefore trying to find the most efficient data structure for storing and querying these points. I would like to keep this question language agnostic if possible.
After the Update to the Question
Use two Red-Black Tree or Skip_list maps. Both are compact self-balancing data structures giving you O(log n) time for search, insert and delete operations. One map will use X-coordinate for every point as a key and the point itself as a value and the other will use Y-coordinate as a key and the point itself as a value.
As a trade-off I suggest to initially restrict the search area around the cursor by a square. For perfect match the square side should equal to diameter of your "sensitivity circle” around the cursor. I.e. if you’re interested only in a nearest neighbour within 10 pixel radius from the cursor then the square side needs to be 20px. As an alternative, if you’re after nearest neighbour regardless of proximity you might try finding the boundary dynamically by evaluating floor and ceiling relative to cursor.
Then retrieve two subsets of points from the maps that are within the boundaries, merge to include only the points within both sub sets.
Loop through the result, calculate proximity to each point (dx^2+dy^2, avoid square root since you're not interested in the actual distance, just proximity), find the nearest neighbour.
Take root square from the proximity figure to measure the distance to the nearest neighbour, see if it’s greater than the radius of the “sensitivity circle”, if it is it means there is no points within the circle.
I suggest doing some benchmarks every approach; it’s two easy to go over the top with optimisations. On my modest hardware (Duo Core 2) naïve single-threaded search of a nearest neighbour within 10K points repeated a thousand times takes 350 milliseconds in Java. As long as the overall UI re-action time is under 100 milliseconds it will seem instant to a user, keeping that in mind even naïve search might give you sufficiently fast response.
Generic Solution
The most efficient data structure depends on the algorithm you’re planning to use, time-space trade off and the expected relative distribution of points:
If space is not an issue the most efficient way may be to pre-calculate the nearest neighbour for each point on the screen and then store nearest neighbour unique id in a two-dimensional array representing the screen.
If time is not an issue storing 10K points in a simple 2D array and doing naïve search every time, i.e. looping through each point and calculating the distance may be a good and simple easy to maintain option.
For a number of trade-offs between the two, here is a good presentation on various Nearest Neighbour Search options available: http://dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt
A bunch of good detailed materials for various Nearest Neighbour Search algorithms: http://simsearch.yury.name/tutorial.html, just pick one that suits your needs best.
So it's really impossible to evaluate the data structure is isolation from algorithm which in turn is hard to evaluate without good idea of task constraints and priorities.
Sample Java Implementation
import java.util.*;
import java.util.concurrent.ConcurrentSkipListMap;
class Test
{
public static void main (String[] args)
{
Drawing naive = new NaiveDrawing();
Drawing skip = new SkipListDrawing();
long start;
start = System.currentTimeMillis();
testInsert(naive);
System.out.println("Naive insert: "+(System.currentTimeMillis() - start)+"ms");
start = System.currentTimeMillis();
testSearch(naive);
System.out.println("Naive search: "+(System.currentTimeMillis() - start)+"ms");
start = System.currentTimeMillis();
testInsert(skip);
System.out.println("Skip List insert: "+(System.currentTimeMillis() - start)+"ms");
start = System.currentTimeMillis();
testSearch(skip);
System.out.println("Skip List search: "+(System.currentTimeMillis() - start)+"ms");
}
public static void testInsert(Drawing d)
{
Random r = new Random();
for (int i=0;i<100000;i++)
d.addPoint(new Point(r.nextInt(4096),r.nextInt(2048)));
}
public static void testSearch(Drawing d)
{
Point cursor;
Random r = new Random();
for (int i=0;i<1000;i++)
{
cursor = new Point(r.nextInt(4096),r.nextInt(2048));
d.getNearestFrom(cursor,10);
}
}
}
// A simple point class
class Point
{
public Point (int x, int y)
{
this.x = x;
this.y = y;
}
public final int x,y;
public String toString()
{
return "["+x+","+y+"]";
}
}
// Interface will make the benchmarking easier
interface Drawing
{
void addPoint (Point p);
Set<Point> getNearestFrom (Point source,int radius);
}
class SkipListDrawing implements Drawing
{
// Helper class to store an index of point by a single coordinate
// Unlike standard Map it's capable of storing several points against the same coordinate, i.e.
// [10,15] [10,40] [10,49] all can be stored against X-coordinate and retrieved later
// This is achieved by storing a list of points against the key, as opposed to storing just a point.
private class Index
{
final private NavigableMap<Integer,List<Point>> index = new ConcurrentSkipListMap <Integer,List<Point>> ();
void add (Point p,int indexKey)
{
List<Point> list = index.get(indexKey);
if (list==null)
{
list = new ArrayList<Point>();
index.put(indexKey,list);
}
list.add(p);
}
HashSet<Point> get (int fromKey,int toKey)
{
final HashSet<Point> result = new HashSet<Point> ();
// Use NavigableMap.subMap to quickly retrieve all entries matching
// search boundaries, then flatten resulting lists of points into
// a single HashSet of points.
for (List<Point> s: index.subMap(fromKey,true,toKey,true).values())
for (Point p: s)
result.add(p);
return result;
}
}
// Store each point index by it's X and Y coordinate in two separate indices
final private Index xIndex = new Index();
final private Index yIndex = new Index();
public void addPoint (Point p)
{
xIndex.add(p,p.x);
yIndex.add(p,p.y);
}
public Set<Point> getNearestFrom (Point origin,int radius)
{
final Set<Point> searchSpace;
// search space is going to contain only the points that are within
// "sensitivity square". First get all points where X coordinate
// is within the given range.
searchSpace = xIndex.get(origin.x-radius,origin.x+radius);
// Then get all points where Y is within the range, and store
// within searchSpace the intersection of two sets, i.e. only
// points where both X and Y are within the range.
searchSpace.retainAll(yIndex.get(origin.y-radius,origin.y+radius));
// Loop through search space, calculate proximity to each point
// Don't take square root as it's expensive and really unneccessary
// at this stage.
//
// Keep track of nearest points list if there are several
// at the same distance.
int dist,dx,dy, minDist = Integer.MAX_VALUE;
Set<Point> nearest = new HashSet<Point>();
for (Point p: searchSpace)
{
dx=p.x-origin.x;
dy=p.y-origin.y;
dist=dx*dx+dy*dy;
if (dist<minDist)
{
minDist=dist;
nearest.clear();
nearest.add(p);
}
else if (dist==minDist)
{
nearest.add(p);
}
}
// Ok, now we have the list of nearest points, it might be empty.
// But let's check if they are still beyond the sensitivity radius:
// we search area we have evaluated was square with an side to
// the diameter of the actual circle. If points we've found are
// in the corners of the square area they might be outside the circle.
// Let's see what the distance is and if it greater than the radius
// then we don't have a single point within proximity boundaries.
if (Math.sqrt(minDist) > radius) nearest.clear();
return nearest;
}
}
// Naive approach: just loop through every point and see if it's nearest.
class NaiveDrawing implements Drawing
{
final private List<Point> points = new ArrayList<Point> ();
public void addPoint (Point p)
{
points.add(p);
}
public Set<Point> getNearestFrom (Point origin,int radius)
{
int prevDist = Integer.MAX_VALUE;
int dist;
Set<Point> nearest = Collections.emptySet();
for (Point p: points)
{
int dx = p.x-origin.x;
int dy = p.y-origin.y;
dist = dx * dx + dy * dy;
if (dist < prevDist)
{
prevDist = dist;
nearest = new HashSet<Point>();
nearest.add(p);
}
else if (dist==prevDist) nearest.add(p);
}
if (Math.sqrt(prevDist) > radius) nearest = Collections.emptySet();
return nearest;
}
}
I would like to suggest creating a Voronoi Diagram and a Trapezoidal Map (Basically the same answer as I gave to this question). The Voronoi Diagram will partition the space in polygons. Every point will have a polygon describing all points that are closest to it.
Now when you get a query of a point, you need to find in which polygon it lies. This problem is called Point Location and can be solved by constructing a Trapezoidal Map.
The Voronoi Diagram can be created using Fortune's algorithm which takes O(n log n) computational steps and costs O(n) space.
This website shows you how to make a trapezoidal map and how to query it. You can also find some bounds there:
Expected creation time: O(n log n)
Expected space complexity: O(n) But
most importantly, expected query
time: O(log n).
(This is (theoretically) better than O(√n) of the kD-tree.)
Updating will be linear (O(n)) I think.
My source(other than the links above) is: Computational Geometry: algorithms and applications, chapters six and seven.
There you will find detailed information about the two data structures (including detailed proofs). The Google books version only has a part of what you need, but the other links should be sufficient for your purpose. Just buy the book if you are interested in that sort of thing (it's a good book).
The most efficient data structure would be a kd-tree link text
Are the points uniformly distributed?
You could build a quad-tree up to a certain depth, say, 8. At the top you have a tree node that divides the screen into four quadrants. Store at each node:
The top left and the bottom right coordinate
Pointers to four child nodes, which divide the node into four quadrants
Build the tree up to a depth of 8, say, and at the leaf nodes, store a list of points associated with that region. That list you can search linearly.
If you need more granularity, build the quad-tree to a greater depth.
It depends on the frequency of updates and query. For fast query, slow updates, a Quadtree (which is a form of jd-tree for 2-D) would probably be best. Quadtree are very good for non-uniform point too.
If you have a low resolution you could consider using a raw array of width x height of pre-computed values.
If you have very few points or fast update, a simple array is enough, or may be a simple partitioning (which goes toward the quadtree).
So the answer depends on parameters of you dynamics. Also I would add that nowadays the algo isn't everything; making it use multiple processors or CUDA can give a huge boost.
You haven't specified the dimensions of you points, but if it's a 2D line drawing then a bitmap bucket - a 2D array of lists of points in a region, where you scan the buckets corresponding to and near to a cursor can perform very well. Most systems will happily handle bitmap buckets of the 100x100 to 1000x1000 order, the small end of which would put a mean of one point per bucket. Although asymptotic performance is O(N), real-world performance is typically very good. Moving individual points between buckets can be fast; moving objects around can also be made fast if you put the objects into the buckets rather than the points ( so a polygon of 12 points would be referenced by 12 buckets; moving it becomes 12 times the insertion and removal cost of the bucket list; looking up the bucket is constant time in the 2D array ). The major cost is reorganising everything if the canvas size grows in many small jumps.
If it is in 2D, you can create a virtual grid covering the whole space (width and height are up to your actual points space) and find all the 2D points which belong to every cell. After that a cell will be a bucket in a hashtable.