Can anybody give me a hint about how to approach following task from Codility: https://codility.com/programmers/task/hilbert_maze/
I would be able to find the shortest path by generating the maze and searching for the shortest path using BFS, but since the worst-case time complexity is expected to be O(N) I don't think this would be the right way to go. Time complexity of BFS is O(|V| + |E]) where V is number of vertices and E the number of edges. For example if N = 3, we have a grid of size 17x17 and it's intuitively obvious that we can't find the path in only 3 steps.
So, either the indicated time complexity is wrong and should be something like M^2 or there is a quick trick to simply calculate the distance between two points without using graph algorithms. I found some algorithms for calculating Hilbert distance for 2 given points (if that's what is needed here), which use bit manipulations etc. but couldn't understand them at all. Moreover, I think that the goal of the task is to find out on your own how to calculate the distance and not using an existing formula.
Here is the solution I came up with:
Every points location can be defined by an array of quadrants and their orientation (it will have N elements) - each element representing the orientation in the previous quadrant. The whole maze having upwards orientations
You need to define this array for both points. For example: if N = 2 and the point is in the lower left quadrant then it will have the orientation to the left. We take this quadrant and we rotate our coordinate system so it will the same orientation. This way we define the next quadrant and orientation pair in our new system. So if we have our point in the lower left quadrant then it will have orientation to the left, but as this was relative to our previous orientation (which was also to the left) this will become an upwards orientation.
At this point we have all the quadrants and orientation down to the smallest maze that contains our point. From backwards (from the smallest maze) we need to solve them. Every maze can be solved by the following rules:
if our point in our current quadrant is on any of the extremes (meaning that any of the coordinate's components are either the lowest or highest of the quadrant) we leave it where it was, otherwise check next points
if our point is downwards or at the middle of the current quadrant then we move to the quadrants lowest middle point (these goes relative the previously defined orientation, i.e.: if our orientation is upwards then we will move our point at the topmost middle point)
if our point is upwards (in the relative direction) we will have to move it to the topmost middle point
Storing these moves, we check if we have any common elements in the two array belonging to the two points:
if not we calculate the distance between the two endpoints and the we add up all the distances from the two moves list (in this list every distance can be calculated as coordinate component subtractions, i.e.: abs(x1-x2) + abs(y1-y2))
if we have common elements then we delete every move after that including the common elements and we calculate the distance as mentioned at the point before
This solutions can be optimised, it is just meant to present and idea to start with.
Edit: Here is my implementation of the above presented solution in Swift3: https://codility.com/demo/results/training9WWFXU-EWC/


Algorithm to find best fitting point on a plane

I am working on a path finding system for my game that uses A* and i need to position the nodes in such a way that they would be within minimal distance from other points.
I wonder if there is an algorithm that would allow me to find best fitting point on a plane or a line (between neighboring points) as close as possible to the specified position, while maintaining minimal distance between the neighbors.
Basically i need an algorithm that given input (in pseudocode) min distance = 2, original position = 1, 1 and a set of existing points would do this:
In the example the shape is a triangle and the point can be calculated using Pythagoras theorem, but i need it to work for any shape.
Your problem seems uneasy. If you draw the "forbidden areas", they form a complex geometry made of the union of disks.
The there are two cases:
if the new point belongs to the allowed area, you are done;
otherwise you need to find the nearest allowed point.
It is easy to see if a point is allowed, by computing all distances. But finding the nearest allowed point seems more challenging. (By the way, this point could be very far.)
If the target point lies inside a circle, the nearest candidate location might be the orthogonal projection on a circle, or the intersection between two circles. Compute all these points and check if they are allowed. Then keep the nearest candidate.
In red, the allowed candidates. In black the forbidden candidates.
For N points, this is an O(N³) process. This can probably be reduced by a factor N by means of computational geometry techniques, but at the price of high complexity.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
Turns out that the method above is correct! I just need help implementing it.
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

FInd furthest point in O(1) time

Consider a set S of n points in the plane such that the farthest pair is having distance at most 1. I would like to find the farthest point of a given query point q (not in S) in O(1) time. How do I pre-process the points in S to achieve the desired query time bound?
Can this be possible?
It is not possible stricto sensu. This is a point location problem in a planar straight line graph, which is known to require O(log(N)) query time.
Anyway, it can be addressed approximately by gridding.
Overlay a square grid over the furthest point Voronoi diagram, and for every cell note the regions it covers. Make sure that the number of covered regions is bounded. This can be approximately achieved by taking a grid pitch smaller than the distance of the two closest vertices in the diagram.
For a query pixel, finding the containing cell is done in constant time. Then finding the region among a bounded number takes constant time as well.
Assuming there is no relation between the points, there is no single operation that will give you the furthest point. So, the only way to do it is to compute it in advance, so you need a simple mapping between each point and the point furthest from it.

Algorithm - Finding closest empty square a 2d grid

Given a starting square (y, x) on a 2d grid, I want to find the closest empty square to it. (Note: The 4 squares adjacent to the starting square should be considered closer than the 4 diagonal squares nearest it.)
The following image shows the order that I need to check the following cells on this grid:
The grid is bounded but can be quite large. In practice, the starting coordinate will be randomly located around the grid. (So, I don't think it's too important to worry about coordinates outside the bounds of the grid....)
What algorithm can I use to iterate around the circle in this manner?
A simple breadth-first search will do it. Push each neighbour to be examined onto a heap, prioritised by distance. You can probably get away with manhattan distance (dx + dy), but if not just use squared radial distance (dx2 + dy2). Whenever you pop an item it'll be the nearest. If it's empty, you found it. Otherwise push its neighbours onto the heap and keep popping.
I would probably use the square radial distance and only add adjacent squares (not diagonals). The diagonals will be considered later because they are immediately adjacent to other squares. You do need a way to keep track of which squares have already been considered so you don't add them again. There must be a clever dynamic-programming way of tracking this without having to clear a large grid of booleans each time you search... But saying that, a large grid of booleans will do quite nicely.
It can be solved with BFS (Breadth First Search). We have to process each square twice. First time we visit the still unvisited squares those share an edge with the current square and next time we visit the squares those share at least a point with the current square(Diagonally adjacent squares)
We can use two different queues to ensure that before processing a square 2nd time all squares with equal distance from the source to the current square have been processed at least once. :-)
Average Runtime: O(V*8) => O(V). Where V is the number of square inside the grid
If the content of the grid changes often, use methods described in previous answers, that is bread-first search.
If the content of your grid does changes seldom AND Manhattan distance is ok for your application, my advise is to compute the distance transform of the binarized grid (0 if empty, 1 else) distance transform is very simple for Manhattan distance, way more complicated for euclidian distance). This step can be done at a cost of 2*N*M (number of elements of the grid). Then, for each request, you can visit the neighborhood in a very straightforward manner, that is follow the path of min distance starting from the starting cell (like some gradient descent), it will stop at the nearest empty cell. May be really faster to search with this algorithm, as you don't look in the wrong way for an empty cell more than 1 cell far.

A star path finding in corridor

I am implementing A* for path finding of a mobile robot inside the corridor. As of now the path is produced inside the corridor but it slides over to the right following all the edges of the obstacles, but I prefer the path should lie in the middle of the corridor.
1.Is there any smoothing algo to do it?
2. How to include the steering constraints so that i can get realistic/feasible path?
3. How to give penalty for 'turn' so as to avoid zig-zag paths.
Since I am new to A* algo, I find difficlty in the above issues. Ref to any link, book is also welcome..
You can pre-form the field on which you run A* by say shrinking it by 1 tile, so that a cell that borders impassable cell in 4-way neighborhood will become impassable. Then your resultant A* path will be closer to the center of the corridor. Of course, several corridors might become impassable completely, but this is what's expected as we are practically simulating a 3x3 cross-shaped robot to walk around the grid, and 3x3 cross can't go through 2xN path.
About adding cost to turn - you have to add current direction to the array that holds A* data, and implement a two-argument function that will return a non-negative value for (old direction, new direction) pair of arguments. Say, "if old_direction is not equal to new direction, return 1, else return 0". Then add the result of that function to whatever cost you computed for each step of A* iteration.
You could simply limit the usable area to the middle of the corridor.
