Radius Nearest Neighbors - get number of neighbors in neighbors - knn

I have coordinates. Each coordinates represent a building and gives number of resident. I would like to predict neighbors in radius 0.5km for any point. This point represents base stations. I want to calculate how many subscribers base station will cover. Anyway, I need number of neighbors in radius. According to my research, I found RadiusNearestNeighborsRegression technique. Bu this teqnique never give me the number of neighbors. It is only give the average value for all neighbors in radius. I need sum of neighbors output or number of neighbors. I will multiply average output and number of neighbors(thats gives me the total output for all neighbors.
I hope, I explained clearly.

You could partition the residences into regions such that, if the minimum distance between any 2 regions is too large, there would be no need to make comparisons between any pair of residences split between the 2 regions.
Also, when you have found that A & B are close enough, A would count towards B's neighbors and B would count towards A's w/ having to repeat that calculation.

Related

Finding the closest, "farthest" point from a set of points in 3-dimensions

Suppose I have some n points (in my case, 4 points) in 3 dimensions. I want to determine both the point a which minimizes the squared distance to each of these n points, as well as the largest difference that can exist between the distance from an arbitrary point b and any two of these n points (i.e. the two "farthest points").
How can this be most efficiently accomplished? I know that, in 2 dimensions and with 3 points, the solution to the point that minimized distance is the centroid of the triangle formed by the 3 points, and the solution to the largest difference can be found by taking a point located precisely at one (any?) of the 3 points. It seems that the same should be true in 3 dimensions, although I am unsure.
I want to determine both the point that minimizes distance from each of these n points
The centroid minimizes the sum of the squared distances to every point in the set. But will not minimize the max distance (the farther distance) to the points.
I suspect that you are interested in computing the center and radius of the minimal sphere containing every point in the set. This is a classic problem in CG that can be solved in linear time quite easily in an approximate way, or exactly if you program the algorithm propossed by Emmerich Welzl.
If the number of points is as small as 4, an approximate solution is search the pair of point with maximum distance (there is 12 possible pairs) and compute the midpoint as center and half-distance as radius . Then, ensure that the other two points are also inside the sphere, or make it grow if necessary.
See more information at
https://en.wikipedia.org/wiki/Bounding_sphere
https://en.wikipedia.org/wiki/Smallest-circle_problem
The largest difference between the distances of a point to two given points is achieved when the three points are aligned and the unknown point is "outside" (there are infinitely many solutions). In this configuration, the difference is just the distance between the two given points.
If you mean to maximize all differences simultaneously (or rather the sum of differences), you must go to infinity in some direction. That direction maximizes the sum of the lengths of the projections of all edges.

Car Racing Through Grid Algorithm Puzzle

I'm working through a problem in an algorithms book offered by a nearby university. The following problem is about graphing algorithms (Kruskal's Algorithm, BFS, DFS, Prim's Algorithm). I've been working on it for a few days now and I'm stuck.
The problem is as follows:
Consider a computer game about racing cars in 2D. Your car is a pixel and the course is encoded as a set of valid pixels on an n × n screen: you’re given a 2D array where you can look up any pixel to see if it’s valid. Your objective is to get from a given start position to a given end position as fast as possible.
Here are the rules:
Time is measured in unit steps
As mentioned, you begin at some start position with zero velocity
At every time step you can modify your horizontal velocity
by 1, or keep it the same. The same holds independently for vertical velocity. So if at a particular time you are at pixel (x, y) and already have velocity (vx, vy), then at the next time step you will be at position (x + vx, y + vy), after which each component of your velocity may change by ±1 if you wish.
You’re not allowed to shoot through the end position with arbitrary velocity. You must stop there to pick up your trophy.
At every time step your car must be at a valid pixel, but also between steps you must not drive over invalid pixels. To help with this last part, you’re given a table, T, of pixel pairs, where for each pair there is a bit letting you know if it’s legal to travel from one pixel to the other in a straight line. So in constant time you can you can look up any pair to see if moving directly from one to the other is OK.
Formulate this game as a graph problem and describe how to find the optimal route for any given race course. What is the time complexity of your algorithm?
What I've gotten so far is to represent nodes in the graph as velocities (n2 of them) and to represent edges between nodes as changes in velocity. Also, as there exists invalid locations, I was thinking there must be some way to weight the edges so as to run Kruskal's Algorithm on them.
I would suggest to define the graph as an unweighted graph and with following definition of vertices and edges:
its vertices are the pairs of valid pixels that represent legal straight-line travels, as defined in table T. The number of vertices thus corresponds to the size of table T, i.e. O(n4). The vertices would include zero-distance travels as well, i.e. where the pixel pair is a repetition of the same pixel.
However, we can reduce the number of vertices by noting that there is a maximum velocity, in either direction, that can be attained when starting with a velocity of 0 and will not lead to car-crashes on the grid boundary. In one dimension the maximum velocity can only evolve from 0 to 1, to 2, to 3, in one step after the other. As the car cannot go further than n pixels in one direction, and should be able to also stop in time, we can find that the velocity in one direction should never be greater than sqrt(n).
For example, this accelerating and then breaking could give the sum of these velocities: 1+2+3+4+5+4+3+2+1 = 25. If n = 25, then the car would have traversed the whole grid (in one dimension), with a maximum velocity of 5. Any greater velocity would have led to a crash.
So the interesting pixel pairs have horizontal and vertical distances between them that are both less than or equal to sqrt(n). This means for a given pixel, there are at most sqrt(n)*sqrt(n), i.e. n other pixels to combine it with. As there are n2 pixels, we have O(n3) pairs (vertices) to consider.
its edges connect vertices (i.e. pixel pairs) that can be consecutively travelled. By consequence
two connected vertices represent two pixel pairs that have (at least) one pixel in common, i.e. the pixel where the first travel unit ends and the second one starts;
These vertices have at most 9 neighbors, as during the drive the x-velocity can change by either -1, 0, or +1 (3 options), and the same holds for the y-velocity, giving 9 possibilities.
There are two special vertices in that graph:
The one that corresponds to the (start, start) pixel pair, as this represents the initial state (0 velocity at the start pixel)
The one that corresponds to the (target, target) pixel pair, as this represents the target state (0 velocity at the target pixel)
The problem can then be formulated as finding the shortest path between those two vertices.
The time complexity for the single sourced shortest path problem in an unweighted graph is O(E+V), which in this particular problem is O(9V+V), is O(V), is O(n3). This algorithm is a BFS in which you track the paths of length 1, then extend all these to length 2 (pruning where you get to an already visited vertex), ...etc, until you hit the target vertex. The path with which you hit it is a shortest path.

set of points with max distance

I have a set with 6000 vectors (n dim). I calculated the distance matrix to collect all the distances. Suppose I want to spot the 10 points that are more distant each other, i.d. they are the subset in which the sum of distances is the highest between every other possibilities in that set.
If I want to discriminate also with a threshold avoiding two points too close, what should be the fastest algorithm that can do that?
cheers
Daniele

Two salesmen - one always visits the nearest neighbour, the other the farthest

Consider this question relative to graph theory:
Let G a complete (every vertex is connected to all the other vertices) non-directed graph of size N x N. Two "salesmen" travel this way: the first always visits the nearest non visited vertex, the second the farthest, until they have both visited all the vertices. We must generate a matrix of distances and the starting points for the two salesmen (they can be different) such that:
All the distances are unique Edit: positive integers
The distance from a vertex to itself is always 0.
The difference between the total distance covered by the two salesmen must be a specific number, D.
The distance from A to B is equal to the distance from B to A
What efficient algorithms cn be useful to help me? I can only think of backtracking, but I don't see any way to reduce the work to be done by the program.
Geometry is helpful.
Using the distances of points on a circle seems like it would work. Seems like you could determine adjust D by making the circle radius larger or smaller.
Alternatively really any 2D shape, where the distances are all different could probably used as well. In this case you should scale up or down the shape to obtain the correct D.
Edit: Now that I think about it, the simplest solution may be to simply pick N random 2D points, say 32 bit integer coordinates to lower the chances of any distances being too close to equal. If two distances are too close, just pick a different point for one of them until it's valid.
Ideally, you'd then just need to work out a formula to determine the relationship between D and the scaling factor, which I'm not sure of offhand. If nothing else, you could also just use binary search or interpolation search or something to search for scaling factor to obtain the required D, but that's a slower method.

find a point closest to other points

Given N points(in 2D) with x and y coordinates. You have to find a point P (in N given points) such that the sum of distances from other(N-1) points to P is minimum.
for ex. N points given p1(x1,y1),p2(x2,y2) ...... pN(xN,yN).
we have find a point P among p1 , p2 .... PN whose sum of distances from all other points is minimum.
I used brute force approach , but I need a better approach. I also tried by finding median, mean etc. but it is not working for all cases.
then I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?
If your points are nicely distributed and if there are so many of them that brute force (calculating the total distance from each point to every other point) is unappealing the following might give you a good enough answer. By 'nicely distributed' I mean (approximately) uniformly or (approximately) randomly and without marked clustering in multiple locations.
Create a uniform k*k grid, where k is an odd integer, across your space. If your points are nicely distributed the one which you are looking for is (probably) in the central cell of this grid. For all the other cells in the grid count the number of points in each cell and approximate the average position of the points in each cell (either use the cell centre or calculate the average (x,y) for points in the cell).
For each point in the central cell, compute the distance to every other point in the central cell, and the weighted average distance to the points in the other cells. This will, of course, be the distance from the point to the 'average' position of points in the other cells, weighted by the number of points in the other cells.
You'll have to juggle the increased accuracy of higher values for k against the increased computational load and figure out what works best for your points. If the distribution of points across cells is far from uniform then this approach may not be suitable.
This sort of approach is quite widely used in large-scale simulations where points have properties, such as gravity and charge, which operate over distances. Whether it suits your needs, I don't know.
The point in consideration is known as the Geometric Median
The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general.
I'm not sure if I understand your question but when you calculate the minimum spanning tree the sum from any point to any other point from the tree is minimum.

Resources