Given a large list of GPS coordinates and their weight, is there an open source software (database/search engine) to get top N values inside a bounding box or a circle?
SELECT * FROM list WHERE IS_IN_BBOX(coords, bbox) ORDER BY weight DESC LIMIT 10;
I expect the list to be in tens of millions of items. The bounding box might be very large (whole world) or very small (zoom 18), but the search should still be reasonably fast. Also, could we use Elasticsearch for that? I saw that it has a distance based search, but not weight based search. How about Postgis?
You can can use ST_MakeEnvelope and && operator
SELECT *
FROM list
WHERE list.coords && ST_MakeEnvelope(left, bottom, right, top, srid)
ORDER BY list.weight;
repleace left, bottom, right, top and srid accordingly to search
Related
I am trying to find an optimal way to place a set of ranges in a larger range. I like to think of it as flat boxes that can move left or right. Some things to consider:
There are N boxes, each of them with a center point Ci.
There are N attractor points (one per box), we can call them Pi. Each box is attracted to one attractor point with a force proportional to the distance.
The order of the boxes is fixed. The order of the attractor points and of the boxes is the same. So C1 is attracted to P1, C2 to P2, etc.
The boxes cannot overlap.
I made a diagram that may make it easier to understand:
The question is, what algorithm can I use to move the boxes around so that each Ci is the closest possible to its respective Pi. In other words, how do I find the locations for the Ci points that minimizes the distance (Li) between all Ci-Pi pairs?
I'd also be helpful if you can point me in to some material to read or something, I'm not very familiar with this type of problems... My guess is that some sort of force-directed algorithm would work but I'm not sure how to implement those.
Since "each box is attracted to one attractor point with a force proportional to the distance", you are describing a system where the boxes are attached to the attractor points by springs (see Hooke's law), and you want to determine the state of the system at rest (the state of minimum potential energy).
Because the forces are proportional to the distances, what you want is to minimize the sum of the distances squared, or the sum of Li^2 from i=0 to i=n. Here is an algorithm to do that.
The idea is to group boxes that need to touch by the end and figure out their position as a group based on their corresponding attractor points.
The first step is not to find these groups, because we can actually start with one big group and cut it later if necessary. For simplicity, let's treat all Li as signed distances. So Li = Ci-Pi. Let's also name the sizes of the boxes, though it will be easier to handle half-sizes. So let Si be half the size of the i-th box. Finally, let's write the sum of Xi from i=a to i=b like sum[a,b](Xi).
Here is how to compute the position of a group of boxes, assuming each one touches the next. Li is a function of the position of the group: if x is that position, Li(x) = Ci(x) - Pi (where Ci(x) is just x plus some constant). x can be point of the group of box, for example the left edge of the first box.
We also know that sum[a,b](Li(x)^2) must be minimal. This means the derivative of that sum must be zero: sum[a,b](2*Li(x)) = 0. So:
sum[a,b](2*Li) = 0
sum[a,b](Li) = 0
sum[a,b](Ci - Pi) = 0
sum[a,b](Ci) = sum[a,b](Pi)
Computing sum[a,b](Pi) is trivial, and sum[a,b](Ci) can be expressed in terms of Ca (center of the first box), since C[i+1] = Ci + Si + S[i+1].
Now that you can compute the position of a group of boxes, do it first with a group made of all boxes, and then remove boxes from that group as follows.
Starting from the left, consider all boxes with Li > 0 and compute Q = sum(Li) for all corresponding i. Similarly, starting from the right, consider all boxes with Li < 0 and compute R = -sum(Li) for all corresponding i (note that negative sign, because we want the absolute value). Now, if Q > R, remove the boxes on the left and make a new group with them, otherwise remove the boxes on the right and make a new group with them.
You cannot make these two new groups at the same time, because removing boxes from one end can change the position of the original group, where boxes you would have removed from the other end should not be removed.
If you made a new group, repeat: compute the position of each separate group of boxes (they will never overlap at this point), and remove boxes if necessary. Otherwise, you have your solution.
It seems the objective is a quadratic function and all the constraints are linear. So I think you can solve it by standard quadratic programming solvers.
If we write S_i be the half-size of i-th box, and the Pi's are given, then:
Minimize y
with respect to C_1, C_2, ...C_n
subject to
y = sum_i (P_i - C_i)^2
C_i + S_i + S_{i+1} <= C_{i+1} for each i = 1, ... n-1
Edit: this is a crude solution to minimize the sum of all Li, which is no longer the question.
Let's name the boxes B, so Bi has center Ci. Let n be the number of boxes and points.
Assuming all the boxes can fit into the larger range, here is how I would do it:
Let Q(a, b) be the average of Pi from i=a to i=b.
Place all the boxes next to each other (in order) to form a superbox, so that the center of this superbox is at Q(1, n).
If it goes over one end of the larger range, move it so that it sits at the limit.
Then, for each Bi, move it as close to Pi as possible without moving other boxes (and while still being inside the larger range). Repeat until you can't move any more box.
Now, the only way to minimize the sum of all Li is as follows.
Let G be a group of boxes that touch. Let F(G) be the predicate: if the center boxes of a series are Bi and Bj (if there are an odd number of boxes in the series, i=j), then Ci != Pi and Cj != Pj.
Find a G such that F(G) is true, and move the corresponding boxes so that F(G) becomes false. If the group of boxes hit another box while moving, add that box to the group and repeat. Of course, don't move any box outside the larger range.
Once there is no G for which F(G) is true or for which you would need to move outside the larger range, you have your solution (one of potentially an infinite number).
Just for completion, I found a (probably subtompimal) solution that works pretty well and is very easy to implement.
Place all boxes with their Ci's at their Pi's.
Go over all boxes, from left to right and do the following:
Check if box i overlaps with the box to its left. If it is the first box, check if it overlaps with the range minimum.
If there is overlap, move the box to the right so that there is no left overlap.
Repeat step 2 but from right to left, checking right overlaps (or range maximum for the last box).
Repeat steps 2-3 until no more overlaps remain or a maximum number of repetitions is reached.
It's quite efficient for my relatively small dataset and I get good results with 10 repetitions of steps 2-3 (5 left to right checks, 5 right to left checks).
Supposed you've got a large amount of boxes drawn, and the user can draw a rectangular area over them.
While I'll be implementing it inside a browser, let's abstract it away and say we've got the coordinates of every point of every rectangle.
What are the most efficient data structures and algorithms here, given I want to check which boxes a) intersect b) are contained by the selection?
My current idea is to:
Sort all boxes by x
Via binsearch, check which boxes overlap x-wise with the selection area, then, for every x-wise overlapping box, check if they align y-wise as well.
or
Sort all boxes by x and y, each in separate array
Via binsearch, first find all x-overlapping boxes, then all y-overlapping boxes, then check which boxes are in both sets,
... though I'm pretty sure there's some well-known algorithm for such a problem.
I suppose by selected via some rectangle you mean either intersects some rectangle or is contained in some rectangle. If the "drawn boxes" are of fixed position, one approach which comes to mind is binary space partition. Roughly speaking, an (ideally balanced) binary space partition tree could be generated for the "drawn boxes". If the selection rectangle is positioned, the positions of its corners would be matched against the binary space partition tree, and large halfspaces could be excluded from explicit checking for intersection.
I am working on an interactive web application, and I'm currently working on implementing a multi-select feature similar to the way windows allows you to select multiple desktop icons by dragging a rectangle.
Due to limitations of the library I'm required to use, implementing this has already become quite resource intensive:
On initial click, store the position of the mouse cursor.
On each pixel that the mouse cursor moves, perform the following:
Destroy the previous selection rectangle, if it exists, so it doesn't appear on the screen anymore.
Calculate the width and height of the new selection retangle using the current cursor position and the current cursor position.
Create a new selection rectangle using the original cursor position, the width and the height
Display this rectangle on the screen
As you can see, there are quite a few things happening every time the cursor moves a single pixel. I've looked into this as much as I can and there's no way I can make it any more efficient or any faster.
My next step is actually selecting the objects on the screen when the selection rectangle moves over them. I need to implement this algorithm myself so I have freedom to make it as efficient/fast as possible. What I need to do is iterate through the objects on the screen and check each one to see if it lies in the rectangle. So the loop here is going to consume more resources. So, I need the checking to be done as efficiently as possible.
Each object that can be selected can be represented by a single point, P(x, y).
How can I check if P(x, y) is within the rectangles I create in the fastest/most efficient way?
Here's the relevant information:
The can be an arbitrary number of objects that can be selected on the screen at any one time
The selection rectangles will always be axis-aligned
The information I have about the rectangles is their original point, their height, and their width.
How can I achieve what I need to do as fast as possible?
Checking whether point P lies inside rectangle R is simple and fast
(in coordinate system with origin in the top left corner)
(P.X >= R.Left) and (P.X <= R.Right) and (P.Y >= R.Top) and (P.Y <= R.Bottom)
(precalculate Right and Bottom coordinates of rectangle)
Perhaps you could accelerate overall algorithm if objects fulfill to some conditions, that allow don't check all the objects at every step.
Example: sort object list by X coordinate and check only those objects that lies in Left..Right range
More advanced approach: organize objects in some space-partitioning data structure like kd-tree and execute range search very fast
You can iterate through every object on screen and check whether it lies in the rectangle in a Cartesian coordinate system using the following condition:
p.x >= rect.left && p.x <= rect.right && p.y <= rect.top && p.y >= rect.bottom
If are going to have not more than 1000 points on screen, just use the naive O(n) method by iterating through each point. If you are completely sure that you need to optimize this further, read on.
Depending on the frequency of updating the points and number of points being updated each frame, you may want to use a different method potentially involving a data structure like Range Trees, or settle for the naive O(n) method.
If the points aren't going to move around much and are sparse (i.e. far apart from each other), you can use a Range Tree or similar for O(log n) checks. Bear in mind though that updating such a spatial partitioning structure is resource intensive, and if you have a lot of points that are going to be moving around quite a bit, you may want to look at something else.
If a few points are going to be moving around over large distances, you may want to look at partitioning the screen into a grid of "buckets", and check only those buckets that are contained by the rectangle. Whenever a point moves from one bucket to another, the grid will have to update the affected buckets.
If memory is a constraint, you may want to look at using a modified Quad Tree which is limited by tree depth instead of bucket size, if the grid approach is not efficient enough.
If you have a lot of points moving around a lot every frame, I think you may be better of with the grid approach or just with the naive O(n) approach. Experiment and choose an approach that best suites your problem.
I have an array of box objects, defined by their (x,y,width,height) properties like so:
Box Q is anchored at corner point C. How can I programatically expand box Q to take up all the available space it has, while maintaining its aspect ratio?
I have had some luck by expanding box to be very large (from the top right corner) and then aligning to the top edge of the furthest box (in this case 5). If at that point other boxes overlap with Q, I remove the furthest box (5) and repeat (align to the top edge of 4), until no boxes overlap. The problem with this approach is that a box may overlap with Q (box 2 in the next image), but when I scale to meet its top edge, it is no longer contained, like this:
Any thoughts on an approach would be much appreciated,
Josh
but when I scale to meet its top edge, it is no longer contained
Instead scale to meet its
top edge
bottom edge
left edge
right edge
Then, see which scaling is valid (the box is contained after scaling) and results in the biggest box.
I can see two approaches here.
First is to iterate over all other boxes. For each box B, see how much (by what factor) you can expand your given box Q so that it will touch box B; after that take the minimal of all such factors. However, finding this factor for a given B is a non-trivial task, though definitely solvable.
At the same time, if you already have a code that checks for overlaps for a given factor, then you can apply binary search to find the maximal factor that does not lead to overlap.
So you know that if you expand it a lot (say by x times), it does overlap. If you do not expand it (that is, expand by 1 times), it does not overlap. So you have a segment [1,x] where to search for an answer. Try the middle --- expand by (x+1)/2 times and see whether it overlaps. If it overlaps, continue with segment [1, (x+1)/2], otherwise with segment [(x+1)/2, x]. Take the middle of the new segment, and so on until the end values of your segment are close enough.
Create a function that will take in a scaling factor as a parameter, and have it return true or false depending on if there is overlap found or not. It seems like you already have something like this function written.
Then use bisection search https://en.wikipedia.org/wiki/Bisection_method to find your scaling factor to a threshold that is satisfactory.
Let's say I have n number of equally sized and equally rotated squared boxes inside a limited area in a 2D coordinate system (floating point coordinates). The boxes should not overlap.
Now I want to find a free space for one more box. I need some tips for an algorithm to solve this. Any ideas?
There ought to be a scan line algorithm for this. You say the boxes are equally rotated, so you should be able to rotate the co-ordinate system, if necessary, so that the edges of the boxes are parallel to the x and y coordinates. I would then sort the boxes in order of y coordinate.
Now try placing a box in the lowest possible position. Read from the sorted boxes to find all the boxes low enough to interfere with your placement and create an ordered set (e.g. red-black tree or similar container class) of these boxes. Now scan along this set of boxes and see if there is a gap big enough to place a box. If not, use the original sorted list of boxes to find and remove the lowest box, so you can consider putting the new box in just above that lowest box, so it cannot interfere with this. Add more boxes from the sorted list to cover all boxes high enough to interfere with this new possible height of box. Keep track of where you have removed boxes from the list and check there to see if a gap big enough to hold a box has opened up. If not, repeat the exercise until you find a gap or run out of space at the top of the possible area.
This looks like cost N log N for the initial sort, and then a cost of at most log N per box to insert and delete boxes from the ordered set. Checking for gaps is no more expensive than this, because you only check for a gap in a location where you have just removed a box. So I think the total cost is N log N.