I am now looking for an elegant algorithm to recursively find neighbors of neighbors with the geohashing algorithm (http://www.geohash.org).
Basically take a central geohash, and then get the first 'ring' of same-size hashes around it (8 elements), then, in the next step, get the next ring around the first etc. etc.
Have you heard of an elegant way to do so?
Brute force could be to take each neighbor and get their neighbors simply ignoring the massive overlap. Neighbors around one central geohash has been solved many times (here e.g. in Ruby: http://github.com/masuidrive/pr_geohash/blob/master/lib/pr_geohash.rb)
Edit for clarification:
Current solution, with passing in a center key and a direction, like this (with corresponding lookup-tables):
def adjacent(geohash, dir)
base, lastChr = geohash[0..-2], geohash[-1,1]
type = (geohash.length % 2)==1 ? :odd : :even
if BORDERS[dir][type].include?(lastChr)
base = adjacent(base, dir)
end
base + BASE32[NEIGHBORS[dir][type].index(lastChr),1]
end
(extract from Yuichiro MASUI's lib)
I say this approach will get ugly soon, because directions gets ugly once we are in ring two or three. The algorithm would ideally simply take two parameters, the center area and the distance from 0 being the center geohash only (["u0m"] and 1 being the first ring made of 8 geohashes of the same size around it (=> [["u0t", "u0w"], ["u0q", "u0n"], ["u0j", "u0h"], ["u0k", "u0s"]]). two being the second ring with 16 areas around the first ring etc.
Do you see any way to deduce the 'rings' from the bits in an elegant way?
That depends on what you mean by Neighbor. I'm assuming this is being used in the context of a proximity search. In that case I would think that your best bet would be to search from the outermost ring inward.
Assume you can find the outermost set (longest proximity) in the searchable Universe. Store that as the new Universe and then find the next inner set in that Universe. This search should subtract that inner set from the Universe. Store the old Universe (the outermost ring) and repeat this process until you hit the center. Each search after the first one will reduce your search area and give you a ring.
Start by constructing the sides immediately around the central geohash i.e. top, right, bottom and left, initially each of these will only comprise a single geohash and a corner. Then recursively iterate the sides using the adjacent function with direction as corresponds that side (i.e. expand left for the left side) whilst maintaining an appropriate result set and the sides for the next iteration. You also need to handle the diagonal/corner geohash for each side (e.g. left-top for left, top-right for top, if using a clockwise association). For an example of the procedure see this implementation I did in Lua or Javascript (but with additional functionality), which start with a call to Grid().
Related
The Problem
I have a bit array which represents a 2-dimensional "map" of "tiles". This image provides a graphical example of the bits in the bit array:
I need to determine how many contiguous "areas" of bits exist in the array. In the example above, there are two such contiguous "areas", as illustrated here:
Tiles must be located directly N, S, E or W of a tile to be considered "contiguous". Diagonally-touching tiles do not count.
What I've Got So Far
Because these bit arrays can become relatively large (several MB in size), I have intentionally avoided using any sort of recursion in my algorithm.
The pseudo-code is as follows:
LET S BE SOURCE DATA ARRAY
LET C BE ARRAY OF IDENTICAL LENGTH TO SOURCE DATA USED TO TRACK "CHECKED" BITS
FOREACH INDEX I IN S
IF C[I] THEN
CONTINUE
ELSE
SET C[I]
IF S[I] THEN
EXTRACT_AREA(S, C, I)
EXTRACT_AREA(S, C, I):
LET T BE TARGET DATA ARRAY FOR STORING BITS OF THE AREA WE'RE EXTRACTING
LET F BE STACK OF TILES TO SEARCH NEXT
PUSH I UNTO F
SET T[I]
WHILE F IS NOT EMPTY
LET X = POP FROM F
IF C[X] THEN
CONTINUE
ELSE
SET C[X]
IF S[X] THEN
PUSH TILE NORTH OF X TO F
PUSH TILE SOUTH OF X TO F
PUSH TILE WEST OF X TO F
PUSH TILE EAST OF X TO F
SET T[X]
RETURN T
What I Don't Like About My Solution
Just to run, it requires two times the memory of the bitmap array it's processing.
While extracting an "area", it uses three times the memory of the bitmap array.
Duplicates exist in the "tiles to check" stack - which seems ugly, but not worth avoiding the way I have things now.
What I'd Like To See
Better memory profile
Faster handling of large areas
Solution / Follow-Up
I re-wrote the solution to explore the edges only (per #hatchet 's suggestion).
This was very simple to implement - and eliminated the need to keep track of "visited tiles" completely.
Based on three simple rules, I can traverse the edges, track min/max x & y values, and complete when I've arrived at the start again.
Here's the demo with the three rules I used:
One approach would be a perimeter walk.
Given a starting point anywhere along the edge of the shape, remember that point.
Start the bounding box as just that point.
Walk the perimeter using a clockwise rule set - if the point used to get to the current point was above, then first look right, then down, then left to find the next point on the shape perimeter. This is kind of like the simple strategy of solving a maze where you continuously follow a wall and always bear to the right.
Each time you visit a new perimeter point, expand the bounding box if the new point is outside it (i.e. keep track of the min and max x and y.
Continue until the starting point is reached.
Cons: if the shape has lots of single pixel 'filaments', you'll be revisiting them as the walk comes back.
Pros: if the shape has large expanses of internal occupied space, you never have to visit them or record them like you would if you were recording visited pixels in a flood fill.
So, conserves space, but in some cases at the expense of time.
Edit
As is so often the case, this problem is known, named, and has multiple algorithmic solutions. The problem you described is called Minimum Bounding Rectangle. One way to solve this is using Contour Tracing. The method I described above is in that class, and is called Moore-Neighbor Tracing or Radial Sweep. The links I've included for them discuss them in detail and point out a problem I hadn't caught. Sometimes you'll revisit the start point before you have traversed the entire perimeter. If your start point is for instance somewhere along a single pixel 'filament', you will revisit it before you're done, and unless you consider this possibility, you'll stop prematurely. The website I linked to talks about ways to address this stopping problem. Other pages at that website also talk about two other algorithms: Square Tracing, and Theo Pavlidis's Algorithm. One thing to note is that these consider diagonals as contiguous, whereas you don't, but that should just something that can be handled with minor modifications to the basic algorithms.
An alternative approach to your problem is Connected-component labeling. For your needs though, this may be a more time expensive solution than you require.
Additional resource:
Moore Neighbor Contour Tracing Algorithm in C++
I actually got a question like this in an interview once.
You can pretend the array is a graph and the connected nodes are the adjacent ones. My algo would involves going 1 to the right until you find a marked node. When you find one do a breadth first search which runs in O(n) and avoids recursion. When the BFS returns keep searching from where you left off and if the node has already been marked by one of the previous BFS's you obviously don't need to search. I wasn't sure if you wanted to actually return the number of objects found, but it's easy to keep track by just incrementing a counter when you hit the first marked square.
Generally when you do a flood fill type algorithm you are placed in a spot and asked to fill. Since this is finding all the filled regions one way you would want to optimize it is to avoid rechecking the already marked nodes from previous BFS's, unfortunately at the moment I cannot think of a way to do that.
One hacky way to reduce memory consumption would be too store a short[][] instead of a boolean. Then use this scheme to avoid making a whole second 2d-array
unmarked = 0, marked = 1, checked and unmarked = 3, checked and marked = 3
This way you can check the status of an entry by its value and avoid making a second array.
I am trying to develop an online map editing program for a game that I play.
The data for the map is a little large. A medium size map's data is close to 1 mb if I send the data for every square.
What I thought I could do was find the boundaries on the map and create polygons based off of that.
Currently I:
Find the northwestern most boundary and start there. For my sample map, it's (3,2)
Then, check North, East, South, then West and go to the first unvisited location that does not have 1 as its data.
If there are no unvisited locations, go to the location that is the furthest back in the history.
Steps Taken
This works fine, for northern areas. However, when I get to a southern area, it checks north and finds that it's an unvisited location and goes to that. The coordinates of where it messes up is at 13,11.
Obviously, this doesn't give me the boundary that I want and it doesn't walk the entire map. So, something needs to change.
I considered adding a boundary check in the same order of operations as before(NESW). However, it is possible to mess that up as well.
At (13,11) it would check to the north and see that it's an unvisited location. And this time, there is a boundary there, so it would think it's ok to go there.
What should I do to walk the entire outer boundary?
I did take a look at the convex hull algorithm that is mentioned here, but I don't think it would be what I need. I might be incorrect, but this is what I would expect the result of a convex hull to look like.
While that does reduce the size of the map by a little bit, there is still a lot of data I don't need. And when I need to get the internal borders of items in the map, the size reduction is lost because they would be irregular as well.
So, how would I ensure that I'm actually walking the outer border?
Here is the expansion of the answer I suggested in the comment.
Imagine if you're in a dark labyrinth. what do you do to make sure you
traverse the whole labyrinth? Simple, just feel the wall to your left;
turn left whether possible; turn right when forced to.
Ok, more precisely:
Find a starting point on the boundary of the map, which I think you already know how.
Make sure to represent the current facing of your agent. (up,down,left,right)
Prioritize the relative direction of movement like this: (left, forwards, right, backwards)
Move in the prioritized direction if possible.
While walking, note down the visited position as part of the answer. And also check if you have come back to a visited place or not in order to terminate the program.
EDIT: correct the priority. left before forwards, not the other way round.
Not sure if I got a problem right, but it looks like the outer border can be presented as an array of orthogonal normal vectors. Or rather sections. Imagine a grid in which map squares belong to cells. Lets start marking our sections from the top left starting with 0. In that notation the beginning of outer border for pic. 1 would be ((3, 2), (4, 2)), ((4,2), (5,2)) and so on.
Every section, which belongs two cells, one of each is map square "1" and other not is a border. You can traverse through the grid and simply collect them.
Then you would have to sort them into cycles. That's easy. If two sections have one common coordinate - they do belong to the single cycle. If two cycles have a common coordinate beneath their sections - they are the one cycle, you just concatinate ones data to another.
When you have a set of definitely different cycles, the longest one, the one which has the most sections, would be your outer border.
If of course the map is in one piece.
I have a fairly large set of 2D points (~20000) in a set, and for each point in the x-y plane want to determine which point from the set is closest. (Actually, the points are of different types, and I just want to know which type is closest. And the x-y plane is a bitmap, say 640x480.)
From this answer to the question "All k nearest neighbors in 2D, C++" I got the idea to make a grid. I created n*m C++ vectors and put the points in the vector, depending on which bin it falls into. The idea is that you only have to check the distance of the points in the bin, instead of all points. If there is no point in the bin, you continue with the adjacent bins in a spiralling manner.
Unfortunately, I only read Oli Charlesworth's comment afterwards:
Not just adjacent, unfortunately (consider that points in the cell two
to the east may be closer than points in the cell directly north-east,
for instance; this problem gets much worse in higher dimensions).
Also, what if the neighbouring cells happen to have less than 10
points in them? In practice, you will need to "spiral out".
Fortunately, I already had the spiraling code figured out (a nice C++ version here, and there are other versions in the same question). But I'm still left with the problem:
If I find a hit in a cell, there could be a closer hit in an adjacent cell (yellow is my probe, red is the wrong choice, green the actual closest point):
If I find a hit in an adjacent cell, there could be a hit in a cell 2 steps away, as Oli Charlesworth remarked:
But even worse, if I find a hit in a cell two steps away, there could still be a closer hit in a hit three steps away! That means I'd have to consider all cells with dx,dy= -3...3, or 49 cells!
Now, in practice this won't happen often, because I can choose my bin size so the cells are filled enough. Still, I'd like to have a correct result, without iterating over all points.
So how do I find out when to stop "spiralling" or searching? I heard there is an approach with multiple overlapping grids, but I didn't quite understand it. Is it possible to salvage this grid technique?
Since the dimensions of your bitmap are not large and you want to calculate the closest point for every (x,y), you can use dynamic programming.
Let V[i][j] be the distance from (i,j) to the closest point in the set, but considering only the points in the set that are in the "rectangle" [(1, 1), (i, j)].
Then V[i][j] = 0 if there is a point in (i, j), or V[i][j] = min(V[i'][j'] + dist((i, j), (i', j'))) where (i', j') is one of the three neighbours of (i,j):
i.e.
(i - 1, j)
(i, j - 1)
(i - 1, j - 1)
This gives you the minimum distance, but only for the "upper left" rectangle. We do the same for the "upper right", "lower left", and "lower right" orientations, and then take the minimum.
The complexity is O(size of the plane), which is optimal.
For you task usually a Point Quadtree is used, especially when the points are not evenly distributed.
To save main memory you als can use a PM or PMR-Quadtree which uses buckets.
You search in your cell and in worst case all quad cells surounding the cell.
You can also use a k-d tree.
A solution im trying
First make a grid such that you have an average of say 1 (more if you want larger scan) points per box.
Select the center box. Continue selecting neighbor boxes in a circular manner until you find at least one neighbor. At this point you can have 1 or 9 or so on boxes selected
Select one more layer of adjacent boxes
Now you have a fairly small list of points, usually not more than 10 which you can punch into the distance formula to find the nearest neighbor.
Since you have on average 1 points per box, you will mostly be selecting 9 boxes and comparing 9 distances. Can adjust grid size according to your dataset properties to achieve better results.
Also, if your data has a lot of variance, you can try 2 levels of grid (or even more) so if selection works and returns more than 50 points in a single query, start a next grid search with a grid 1/10th the size ...
One solution would be to construct multiple partitionings with different grid sizes.
Assume you create partitions at levels 1,2,4,8,..
Now, search for a point in grid size 1 (you are basically searching in 9 squares). If there is a point in the search area and if distance to that point is less than 1, stop. Otherwise move on to the next grid size.
The number of grids you need to construct is about twice as compared to creating just one level of partitioning.
Given:
(X,Y) coordinate, which is the position of a vehicle.
Array of (X,Y)'s, which are vertices in a polyline. Note that the polyline consists of straight segments only, no arcs.
What I want:
To calculate whether the vehicle is to the left or to the right of the polyline (or on top, ofcourse).
My approach:
Iterate over all line-segments, and compute the distance to each segment. Then for the closest segment you do a simple left-of test (as explained here for instance).
Possible issues:
When three points form an angle smaller than 90 degrees (such as shown in the image blow), a more complicated scenario arises. When the vehicle is in the red segment as shown below, the closest segment can be either one of the two. However, the left-of test will yield right if the first segment is chosen as the closest segment, and left otherwise. We can easily see (at least, I hope), that the correct result should be that the vehicle is left of the polyline.
My question:
How can I elegantly, but mostly efficiently take care of this specific situation?
My fix so far:
Compute for both segments a point on that segment, starting from the vertex point.
Compute the distance from the vehicle to both of the points, using Euclidian distance
Keep the segment for which the computed point is the closest.
I am not very happy with this fix, because I feel like I am missing a far more elegant solution, my fix feels rather "hacky". Efficiency is key though, because it is used on a realtime embedded system.
Existing codebase is in C++, so if you want to write in a specific language, C++ has my preference.
Thanks!
[edit]
I changed my fix, from a perpendicular point to a parallel point, as I think it is easier to follow the line segment than compute the outward normal.
This topic has been inactive for so long that I believe it's dead. I have a solution though.
However, the left-of test will yield right if the first segment is
chosen as the closest segment, and left otherwise.
You've used slightly ambiguous language. I'm gonna use segments to speak of the line segments in the polyline and quadrants to speak of the areas delimited by them. So in your case, you'd have a red quadrant which seems to be on the right of one segment and on the left of the other.
If the left-of test yields different answers for different segments, you should redo the test on the segments themselves. In your case, you'd have:
The quadrant is on the RIGHT of the first segment
The quadrant is on the LEFT of the second segment
Both segments disagree on where the quadrant lies, so you do two further disambiguation tests:
The second segment is on the RIGHT of the first segment
The first segment is on the RIGHT of the second segment
This allows us to conclude that the second segment is in between the first segment and the quadrant—since each of those two lies on a different side of the second segment. Therefore, the second segment is "closer" to the quadrant than the first and it's answer to the left-right test should be used as the correct one.
(I'm almost sure you can do with only one of the two disambiguation tests, I've put both in for clarity)
For the sake of completeness: I believe this solution also accounts for your demands of efficiency and elegance, since it uses the same method you've been using from the start (the left-of test), so it meets all the conditions specified: it's elegant, it's efficient and it takes care of the problem.
Let infinity = M where M is big enough. You can consider that everything is in the square [-M,M]x[-M,M], split the square with your polyline and you have now two polygons. Then checking if the car is in a given polygon can be done very simply with angles.
I consider that your first point and your last point have M in there coordinates. You may need to add some of these points to have a polygon: (-M,-M), (M,-M), (M,M) and (-M,M).
Once you have a polygon for the left of the polyline, sum the angles OĈP where O is a fixed point, C is the car and P is a point of the polygon. If the sum is 0 then the car is outside of the polygon, else it is inside.
Just a quick idea: would it be possible to connect the last and first vertex of your polyline, so that it would become a polygon? You could then do a simple inside/outside check do determine whether the vehicle is left/right of the line (this ofcourse depends on the direction of the polygon).
However, this method does assume that the polygon is still not self-intersecting after connecting the last and first vertex.
This is a standard sort of problem from computational geometry. Since you're looking to test whether a point (x0, y0) is left-of a given surface (your polyline), you need to identify which segment to test against by its height. One easy way to do this would be to build a tree of the lower point of each segment, and search in that for the test point's predecessor. Once you have that segment, you can do your left-of test directly: if it's left of both endpoints, or between them on the appropriate side, then you return true.
I am assuming here that you guarantee that the vertical extent of your polyline is greater than where you might find your query point, and that the line doesn't overlap itself vertically. The latter assumption might be quite poor.
Expansion in response to OP's comment:
Note that the angle example in the question contradicts the first assumption - the polyline does not reach the height of the search point.
One way to conceptualize my method is by sorting your segments vertically, and then iterating through them comparing your point's y-coordinate against the segments until your point is above the lower endpoint and below the higher endpoint. Then, use the endpoints of the segment to figure out the x-intercept at the given y. If the point has a lower x-cooordinate, it's to the left, and if it has a greater x-coordinate, it's to the right.
There are two ways to improve on this explanation in a real implementation, one of which I mentioned about:
Use a balanced search tree to find the right segment instead of iterating through a sorted list, to bring the time from O(n) to O(log n)
Reconceptualize the search as finding the intersection of the polyline and the horizontal line y = y0 through the search point. Then just compare the x value of the intersection against the search point.
There is a square grid with obstacles. On that grid, there are two members of a class Person. They face a certain direction (up, right, left or down). Each person has a certain amount of energy. Making the persons turn or making them move consumes energy (turning consumes 1 energy unit, moving consumes 5 energy units).
My goal is to make them move as close as possible next to each other (expressed as the manhattan distance), consuming the least amount of energy possible. Keep in mind there are obstacles on the grid.
How would I do this?
I would use a breadth-first search and count a minimal energy value to reach each square. It would terminate when the players meet or there is no more energy left.
I'll make assumptions and remove them later.
Assuming the grid is smaller than 1000x1000 and that you can't run out of energy..:
Assuming they can't reach each other:
for Person1,Person2, find their respective sets of reachable points, R1,R2.
(use Breadth first search for example)
sort R1 and R2 by x value.
Now go through R1 and R2 to find the pair of points that are closest together.
hint: we sorted the two arrays so we know when points are close in terms of their x coordinate. We never have to go further apart on the x coordinate than the current found minimum.
Assuming they can reach each other: Use BFS from person1 until you find person2 and record the path
If the path found using BFS required no turns, then that is the solution,
Otherwise:
Create 4 copies of the grid (call them right-grid, left-grid, up-grid, down-grid).
the rule is, you can only be in the left grid if you are moving left, you can only be in the right grid if you are moving right, etc. To turn, you must move from one grid to the other (which uses energy).
Create this structure and then use BFS.
Example:
Now the left grid assumes you are moving left, so create a graph from the left grid where every point is connected to the point on its left with the amount of energy to move forwards.
The only other option when in the left-grid is the move to the up-grid or the down-grid (which uses 1 energy), so connect the corresponding gridpoints from up-grid and left-grid etc.
Now you have built your graph, simply use breadth first search again.
I suggest you use pythons NetworkX, it will only be about 20 lines of code.
Make sure you don't connect squares if there is an obstacle in the way.
good luck.