Matching massive amount of coordinates from two lists - algorithm

I'm searching for an more efficient Algorithm to match coordinates between two lists.
Given are two Lists with Lat/Long Values. My goal is it to find for every Coordinate in the first list, all matching coordinates from the other list in a given Radius, like 500 meters for example.
Right now it's just brute forced by two for loops, just doing the calculation of the distance and checking if its within my radius for every coordinate. But that brings me to a complexity of O(n²).
To improve this, my first idea would be to do something similar to a Hashmap:
Classify the first list to bigger "fields" by cutting off some decimals at the end. An example would be:
lat: 44.7261 long: 8.2831 -> lat: 44.72 long: 8,28
lat: 43.8102 long: 9.7612 -> lat: 43.81 long: 9.76
lat: 44.7281 long: 8.2899 -> lat: 44.72 long: 8,28
So some "groups" of coordinates are created.
Now I only need to iterate once over the second list and looking in which group a specific coordinate lies and do the Calculation with all Coordinates in that group.
Visually you could describe the idea of creating squares in the map that are my Hashs. Then first looking in what hash the current coordinate lies and comparing all coordinates in that hash with the current one.
Like this I can reduce the complexity from O(n²) to O(n+m*(average_size_of_groups))
If a coordinate will be at the border of a group I'll need to check the neighbours of this group too.
But somehow I believe there is a more efficient way to match these two lists. I was looking for algorithms that treat such kind of problems, but my google searches weren't successful.
Thank you very much :)

Your algorithm is pretty good, but the best size for your groups is smaller than you seem to be guessing, and that means you're doing too many comparisons.
Instead of just cutting off a few decimal places, you should divide the points into squares that are the same size as your radius.
Then each point is compared with the points it its own group and the 8 neighboring groups.

A common optimization for this kind of thing is to pre-process your array of points and create a two-dimensional array of "buckets", with each bucket holding a list of points.. One dimension is the latitude, and the other is longitude. If you want a granularity of 500 meters, then each bucket represents a 500x500 meter square.
You'll need a way to map a lat/lon value to an x/y value for your matrix. You decide what lat/lon will correspond to your 0,0 matrix square. Then, to compute the lat/lon for any point, you subtract the offset (the lat/lon from 0,0), and convert the latitude and longitude to meters. Then divide each by 500 and put the point in the resulting bucket.
This gets a little tricky, of course, because the distance between degrees of longitude depends on the latitude, as described in https://gis.stackexchange.com/questions/142326/calculating-longitude-length-in-miles.
Now, when somebody says "Give me all the points within 500 meters of Austin", you can get the lat/lon of Austin, convert to bucket coordinates as described above, and then compare that with all the points from that bucket and the 8 surrounding buckets.
The size of the array is the range of latitude, converted to meters and divided by 500, multiplied by the range of longitude, also converted to meters and divided by 500.
The Earth's circumference of approximately 40,100 km gives you a estimated maximum size of this array: 80,200 x 80,200, or about 6.432 billion buckets if you want your buckets to be 500 meters. If you want to cover that large a range, you'll probably want to use a sparse matrix representation.

Related

how to count (ONLY ONCE) common latitude or longitude lying in the edge between two blocks of grid

I built a grid that includes the US map. The grid consists of latitudes and longitudes that represent small rectangles inside the US map. A small rectangle consists of many latitudes and longitudes.
From a data set that contains the population, I could reside each part of data in its place on the US grid based on their latitude and longitude.
My goal is to let the user to enter a queryBorders (inside the gird) to get the accurate population.
The problem is some points' (or population)latitudes and longitudes live in the border of two neighbors, which cause these points to get counted more than one time. This gives inaccurate results.
In the illustration above, how do I get the accurate result (excluding the repeated points) for the queryBorders (a,b,c,d)?
(getting help from "How to scale down a range of numbers with a known min and max value" to get space for each block!)
Thanks in advance.

How to know if two sets of points have same pattern

I have 2 sets of points in 3D have the same count, I want to know if the have the same pattern, I thought I may project them on XZ,XY and YZ planes then compare the projections in each plane but I am not sure how to do this, I thought the convex hull may help but it won't be accurate.
Is there an easy algorithm to do that? the complexity is not a big issue so far as the points count will be tiny, I implement in Java.
Can I solve this in 3D direct with the same algorithm ?
The attached image shows an example of what I mean.
Edit:
No guarantee for order.
No scale, there are rotation and translation only.
I would gather some information about each point: information that only depends on "shape", not on the actual translation/rotation. For instance, it could be the sum of all the distances between the point and any other point of the shape. Or it could be the largest angle between any two points, as seen from the point under consideration. Choose whatever metric brings the most diversity.
Then sort the points by that metric.
Do the above for both groups of points.
As a first step you can compare both groups by their sorted list of metrics. Allow for a little error margin, since you will be dealing with floating point precision limitations. If they cannot be mapped to each other, abort the algorithm: they are different shapes.
Now translate the point set so that the first point in the ordered list is mapped to the origin (0, 0, 0), i.e. subtract the first point from all points in the group.
Now rotate the point set around the Y axis, so that the second point in the ordered list coincides with XY plane. The rotate the point set around the Z axis, so that that point coincides with the X-axis: it should map to (d, 0, 0), where d is the distance between the first and second point in the sorted list.
Finally, rotate the point set around the X axis, so that the third point in the ordered list coincides with the XY plane. If that point is colinear with the previous points, you need to continue doing this with the next point(s) until you have rotated a non-colinear point.
Do this with both groups of points. Then compare the so-transformed coordinates of both lists.
This is the main algorithm, but I have omitted the cases where the metric value is the same for two points, and thus the sorted list could have permutations without breaking the sort order:
In that case you need to perform the above transformations with the different permutations of those equally valued points at the start of the sorted list, for as long as there is no fit.
Also, while checking the fit, you should take into account that the matching point may not be in the exact same order as in the other group's sorted list, and you should verify the next points that have the same metric as well.
If you have a fixed object with different shapes and movements, pair-wise- or multi-matching can be a helpful solution for you. For example see this paper. This method can be extended for higher-dimensions as well.
If you have two different sets of points that come from different objects and you find the similarity between them, one solution can be computing discrete Frechet distance in both sets of points and then compare their value.
The other related concept is Shape Reconstruction. You can mix the result of a proper shape reconstruction algorithm with two previous methods to compute the similarity:

Finding Closest Point to a Set of Points (in Lat/Long)

I am given a random set of points (not told how many) with a latitude and longitude and need to sort them.
I will then be given another random point (lat/long) and need to find the point closest to it (from the set above).
When I implement a linear search (put the set in a list and then calculate all the distances), it takes approximately 7 times longer than permitted.
So, I need a much more efficient sorting algorithm, but I am unsure how I could do this, especially since I'm given points that don't exist on a flat plane.
If your points are moderately well distributed then geometric hashing is a simple method to speed up nearest neighbor searches in practice. The idea is simply to register your objects in grid cells and do your search cell-wise so you can restrict your search to a local neighborhood.
This little python demo applied the idea to circles in the plane:
So in your case you can choose some fixed N and split the longitude coordinates in [0, 2pi] into N equal parts and the latitude coordinates in [0, pi] into N parts. This gives you N^2 cells on the sphere. You register all your initial points at these cells.
When you are given the query point p then you start searching in the cell that is hit by p and in a large enough neighborhood such that you cannot miss the closest point.
If you initially register n points then you could choose N to something like sqrt(n)/4 or so.

Using list of scattered points in 2D tilemap, how would I find the center of those points?

Have a look at this image for example (just a random image of scattered points from image search):
(reference)
You'll see the locations with blue points. Let's say the blue represents what I'm looking for. But I want to find the coordinates where there is the most blue. Meaning the most dense or center of most points (in the picture, it would approximate [.5, .5]).
If I have an arrayList of each and every blue point (x,y coordinates), then how do I use those points to find the center/most dense area of those points?
There are several options, dependent on what precisely you need. The simplest would be the mean, the average of all points: You sum all points up and divide by their number. Getting the most dense area is complicated, because at first you have to come up with a definition of "dense". One option would be: For each point P, find the 7 nearest neighbors N_P1...N_P7. The point P where the 7th neighbor has the smallest distance |P-N_P7| is the point with the highest density around it and you pick P as center. You can replace that 7 with any number that works for you. You could even replace it with some parameter from your data set, say 1/3 of the total number of points.

Binary search algorithm for 2 dimensional approximite data

Here is my specific problem. I need to write an algorithm that:
1) Takes these 2 arrays:
a) An array of about 3000 postcodes (or zip codes if you're in the US), with the longitude and latitude of the center point of the areas they cover (that is, 3 numbers per array element)
b) An array of about 120,000 locations, consisting of longitude and latitude
2) Converts each location to the postcode whose centerpoint is closests to the given longitude and latitude
Note that the longitudes and latitudes of the locations are very unlikely to precisely match those in the postcodes array. That's why I'm looking for the shortest distance to the center point of the area covered by the postcode.
I know how to calculate the distance between two longitude/latitude pairs. I also appreciate that being closests to the center point of an area covered by a postcode doesn't necessarily mean you are in the area covered by that postcode - if you're in a very big postcode area but close to the border, you may be closer to the center point of a neighbouring postcode area. However, in this case I don't have to take this into account - shortest distance to center point is enough.
A very simple way to solve this problem would be to visit each of the 120,000 locations, and find the postcode with the closest centerpoint by calculating the distance to each of the 3000 postcode centerpoints. That would mean 3000 x 120,000 = 360,000,000 distance calculations though.
If postcodes and locations were in a one-dimensional space (that is, identified by 1 number instead of 2), I could simply sort the postcode array by its one-dimensional centerpoint and then do a binary search in the postcode array for each location.
So I guess what I'm looking for is a way to sort the two dimensional space of longitudes and latitudes of the postcode center points, so I can perform a two dimensional binary search for each location. I've seen solutions to this problem, but those only work for direct matches, while I'm looking for the center point closests to a given location.
I am considering caching solutions, but if there is a fast two-dimensional binary search that I could use, that would make the solution much simpler.
This will be part of a batch program, so I'm not counting milli seconds but it can't take days either. It will run once a month without manual intervention.
You can use a space-filling-curve and a quadkey instead of a quadtree or a spatial index. There are some very interesting sfc like the hilbert curve and the moore curve with very interesting patterns.

Resources