Finding The Nearest Empty Spot On An Infinite Grid - algorithm

This was an interview question I have been asked. It involves a 2D grid that consists of empty and occupied spots (0s and 1s if you will) of infinite size. The input contains the coordinates of the occupied spots and it asks me to return the closest empty spot for each occupied spot. You can go through the occupied spots during traversal. This was a free-form coding challenge, it doesn't involve a method signature to begin with.
I am experienced in algorithms and have solved similar problems before, finding the nearest location questions usually involves BFS on the given graph. However, the grid being of infinite size complicated the matter, now that I know I won't be able to store the entire grid on a matrix structure. Nevertheless, I went with BFS method in the end. One problem emerged at this point was how to check whether the current spot is occupied or not. Going through each spot in the input for each visited node doesn't seem like a very good option since it is too slow. So I have suggested that if I can somehow map the occupied spots into a hashmap, then the check operation can be done in constant time. The interviewer told me to assume I have a hash function for the coordinates and my final solution was something like this:
for each occupied spot in the input
create a queue and push the current spot into the queue
while queue is not empty
get the first element from the queue
if the spot is not occupied and not marked
add the spot into result list and break the while loop
else if spot is not marked
push its neighbors into queue and mark it
The outer for loop takes O(n) time for each spot in the input. The BFS algorithm runs in O(v + e), the interviewer suggested that it can be represented as O(n). In the worst case scenario, it visits every occupied spot, so it is indeed O(n). My final algorithm runs in O(n^2).
As you can expect I failed the interview, otherwise I wouldn't be posting this question. Can you point me to my mistakes? I thought the interview went well and couldn't find anything on the internet search on how to solve the question on an infinite grid. Maybe I should have started by clarifying how to store an infinite grid but couldn't think at that moment.

I'm guessing that what they wanted you to do was loop in a spiral with the center starting from each known occupied spot.
So something like:
Distance = 1;
foreach(occupiedSpace)
{
occupied = true; // my own position is occupied by definition
While (!occupied) // loop in a spiral and check for an open space
{
for(possibleXPosition = -1* distance + occupiedSpace.x to distance + occupiedSpace.x)
for(possibleYPosition = (-1)+distance + occupiedSpace.y to distance + occupiedSpace.y ){
occupied = check( array[possibleXPosition, possibleYPosition])
if(!occupied){
output.print(closest position to (occupiedSpace is possibleXPosition,
possibleYPosition)
}
}
distance++;
}}
There are some real implementations here:
Algorithm for iterating over an outward spiral on a discrete 2D grid from the origin

Look at LeetCode #286 Walls and Gates. This is a variation of the BFS solution to that problem.
However, instead of adding gates when you initialize the queue, you would be adding occupied spots that are next to an unoccupied spot to it. You would be starting BFS at once from every one of those spots.
Hope this hint helps.

Related

Object stacking, dynamic programming

I'm working with a problem that is similar to the box stacking problem that can be solved with a dynamic programming algorithm. I read posts here on SO about it but I have a difficult time understanding the DP approach, and would like some explanation as to how it works. Here's the problem at hand:
Given X objects, each with its own weight 'w' and strength 's', how
many can you stack on top of each other? An object can carry its own
weight and the sum of all weights on top of it as long as it does not
exceed its strength.
I understand that it has an optimal substructure, but its the overlapping subproblem part that confuses me. I'm trying to create a recursion tree to see where it would calculate the same thing several times, but I can't figure out if the function would take one or two parameters for example.
The first step to solving this problem is proving that you can find an optimal stack with boxes ordered from highest to lowest strength.
Then you just have to sort the boxes by strength and figure out which ones are included in the optimal stack.
The recursive subproblem has two parameters: find the best stack you can put on top of a stack with X remaining strength, using boxes at positions >= Y in the list.
If good DP solution exists, it takes 2 params:
number of visited objects or number of unvisited objects
total weight of unvisited objects you can currently afford (weight of visited objects does not matter)
To make it work you have to find ordering, where placing object on top of next objects is useless. That is, for any solution with violation of this ordering there is another solution that follows this ordering and is better or equal.
You have to proof that selected ordering exists and define it clearly. I don't think simple sorting by strength, suggested by Matt Timmermans, is enough, since weight has some meaning. But it's the proofing part...

Mutually Overlapping Subset of Activites

I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.

Finding the horizontal cost (and vertical cost) of an element?

I was browsing Stack Overflow for a question on an algorithm in which to find the closest point to point A from a list of 2D points. I know the list must be sorted to get an optimal time, but I want it faster than O(N**2) which would be using brute force.
I found an answer that seems appealing here: Given list of 2d points, find the point closest to all other points, but I don't fully understand the answer. I am wondering about the part where he begins explaining about the horizontal/vertical cost and from that point onward. Could someone provide me an example of what to do in the case of these (random) points?
point A: (20, 40.6)
List of Points[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
If you can provide an alternative method to the linked post that would also be fine. I know it has something to do with sorting the list, and probably taking out the extremes, but again I'm not sure what method(s) to use.
EDIT: I understand now that this is not the euclidian distance that I am most interested in. Would a divide and conquer algorithm/method be the best bet here? I don't fully understand it yet, but it sounds like it solves what I want it to solve in O(N*log(N)). Would this approach be optimal, and if so would someone mind breaking it down to the basics as I am unable to understand it as most other sites have described it.
What you try to do is not possible if there is no structure in the list of points and they can really be random. Assume you have an algorithm that runs faster than in linear time, then in in your list of points there is one point B that is not read at all by the algorithm. Necessarily if I change B to another value the algorithm will run in the same way and return the same result. Now if the algorithm does not return a point of the list that is the same than A then I can change B to B=A and the correct solution to the problem would now be B (you can't get any closer than actually being the same point) and the algorithm would necessarily return a wrong result.
What the question you are referring to is trying to do is find a point A out of a list L such that sum of all distances between A and every point in L is minimal. The algorithm described in the answer runs in time O(n*log(n)) (where n). Note that n*log(n) actually grows faster than n so it is actually slower than looking at every element.
Also "distance" in the question does not refer to the euclidean distance. Where normally you would define the distance between a point (x_1,y_1) and a second point (x_2,y_2) to be sqrt((x_2-x_1)^2+(y_2-x_2)^2) the question refers to the "Taxicab distance" |x_2-x_1|+|y_2-x_2| where | | refers to the absolute value.
Re: edit
If you just want to find one point of the list that is closest to a fixed point A than you can linearly search for it in the list. See the following python code:
def distance(x,y):
#Manhatten distance
x1,x2=x
y1,y2=y
return abs(x1-y1)+abs(x2-y2)
def findClosest(a,listOfPoints):
minDist=float("inf")
minIndex=None
for index,b in enumerate(listOfPoints):
if distance(a,b) < minDist:
minDist=distance(a,b)
minIndex=index
return minDist,minIndex
a=(20, 40.6)
listOfPoints=[(-20,200),(12,47), (4,0), (-82,92), (40,15), (112, 97), (-203, 84)]
minDist,minIndex=findClosest(a,listOfPoints)
print("minDist:",minDist)
print("minIndex:",minIndex)
print("closest point:",listOfPoints[minIndex])
The challenge of the referenced question is that you don't want to minimize the distance to a fixed point, but that you want to find a point A out of the list L whose average distance to all other points in L is minimal.

Finding fastest cycle that gains at least x bonus points using backtracking algorithm

There's a map with points:
The green number next to each point is that point's ID and the red number is the bonus for that point. I have to find fastest cycle that starts and ends at the point #1 and that gains at least x (15 in this case) bonus points. I can use cities several times; however, I will gain bonus points only once.
I have to do this with the backtracking algorithm, but I don't really know where to start. I've stutied about it, but I can't see the connection between this and a backtracking.
The output would look like this:
(1,3,5,2,1) (11.813 length)
Backtracking is a technique applied to reduce the search space of a problem. So, you have a problem, you have a space with optimal and non-optimal solutions, and you have to pick up one optimal solution.
A simple strategy, in your problem, is to generate all the possible solutions. However, this solution would traverse the entire space of solutions, and, some times, being aware that no optimal solution will be found.
That's the main role of backtracking: you traverse the space of solutions and, when you reach a given point where you know no optimal answer will be achieved if the search continue on the same path, you can simply repent of the step taken, go back in the traversal, and select the step that comes right after the one you found to be helpless.
In your problem, since the nodes can be visited more than once, the idea is to maintain, for each vertex, a list of vertices sorted decreasingly by the distance from the vertex owner of the list.
Then, you can simply start in one of the vertices, and do the walk on the graph, vertex by vertex, always checking if the objective is still achievable, and backtracking in the solution whenever it's noticed that no solution will be possible from a certain point.
You can use a recursive backtracking algorithm to list all possible cycles and keep the best answer:
visitCycles(list<Int> cycleSoFar)
{
if cycle formed by closing (cycleSoFar) > best answer so far
{
best answer so far = cycle formed by closing (cycleSoFar)
}
if (cannot improve (cycleSoFar))
{
return
}
for each point that makes sense
{
add point to cycleSoFar
visitCycles(cycleSoFar)
remove point from cycleSoFar
}
}
To add a bit more detail:
1) A cycle is no good unless it has at least 15 bonus points. If it is any good, it is better than the best answer so far if it is shorter.
2) As you add more points to a cycle you only make it longer, not shorter. So if you have found a possible answer and cycleSoFar is already at least as long as that possible answer, then you cannot improve it and you might as well return.
3) Since you don't get any bonus points by reusing points already in the cycle, it doesn't make sense to try adding a point twice.
4) You may be able to speed up the program by iterating over "each point that makes sense" in a sensible order, for instance by choosing the closest point to the current point first. You might save time by pre-computing, for each point, a list of all the other points in ascending order of distance (or you might not - you might have to try different schemes by experiment).

Algorithms to find the number of Hamiltonian paths in a graph

I'm trying to solve a slightly modified version of the Hamiltonian Path problem. It is modified in that the start and end points are given to us and instead of determining whether a solution exists, we want to find the number of solutions (which could be 0).
The graph is given to us as a 2D array, with the nodes being the elements of the array. Also, we can only move horizontally or vertically, not diagonally. Needless to say, we can't go from one city to two cities because to do that we would need to visit a city twice.
I wrote a brute force solution that tries all 4 (3 or 2 for nodes on the edges) possible moves at each node and then counts the number of solutions (which is when it reaches goal and has seen all the other nodes too), but it ran for ridiculous amounts of time on inputs of modest size (like, say a 7x7 array).
I also thought of using bidirectional search since we know the goal, but this didn't really help, since we don't just want the fringes to meet, we want to also ensure that all the nodes have been visited. Moreover, it could be that when all nodes have been exhausted between the two fringes, they end in a way such that they can't be joined.
I feel like there is something I don't know that's leaving me with only a brute force solution. I know that the problem itself is NP-complete, but I'm wondering if there are any improvements over brute force. Can someone suggest something else?
--Edit--
I mentioned that using bidirectional search doesn't really help and I'd like to clarify why I thought so. Consider a 2x3 graph with the top left and bottom right nodes being the start and goal respectively. Let the fringes for bidirectional search move right from start and left from goal. After 2 moves, all the nodes would have been visited but there is no way to join the fringes, since we can only go in one direction from one node. However, it might be possible to make the algorithm work with some modifications, as David pointed out in his answer below.
According to Wolfram Alpha,
... the only known way to determine
whether a given general graph has a
Hamiltonian path is to undertake an
exhaustive search
I believe you would want to start by finding a single hamiltonian path, and then splitting it into two paths, making the split point one that clearly separates the two paths as much as possible. Then you can find the permutations in the subgraphs (and recurse, of course!)
I don't know the exact algorithm, but that sort of divide and conquer method is where I would start.
Someone asked a question very similar to yours over on Math Overflow at https://mathoverflow.net/questions/36368/efficient-way-to-count-hamiltonian-paths-in-a-grid-graph-for-a-given-pair-of-vert and (1) they didn't get a deluge of "here's how to do it efficiently" responses (which probably means there isn't an easy way), (2) Mathematica apparently takes 5 hours to count the paths between opposite corners on a 7x7 grid, so you may well not be doing anything very wrong, and (3) there are a few interesting pointers among the answers.
You could still use a bidirectional search, just add a constraint to the search so that previously seen nodes will not be candidates for searching.
Another approach you could take which would lend itself to a paralellizable solution is to break the search into smaller searches.
For example, try to solve your original problem by solving:
For each node, n, which is not a start or end node, find all paths from the start to n (set1) and from n to the end (set2).
After you find set1 and set2, you can discard all elements of their cross product which have a common node other than node n.
On a 7x7 array (i.e. a total of 7*7=49 nodes), having either a O(n!) algorithm or a O(2^n*n^2) algorithm will both take way too much time.
Perhaps there is some way speeding this up taking into account the special characteristics of this particular graph (e.g. each node has at most 4 edges), but a fast solution seems improbable (unless someone incidentally finds a polynomial time algorithm for the Hamiltonian Path problem itself).
It can be solved using DP with bitmasking for values of n upto 20 or a few more i guess. Create a 2d dp table. dp[i][j] represents the answer of case that you are on ith vertex and j stores the information about visited vertices.Here's a C++ code.
Macros used:
#define oncnt __builtin_popcount
typedef vector<int> vi;
Outside Main:
vi ad[21];
int n,m;
int dp[20][(1<<19)+1];
int sol(int i,int mask)
{
//cout<<i+1<<" "<<oncnt(mask)<<"\n";
if(i==n-1)
return 1;
if(mask&(1<<i))
return 0;
int &x=dp[i][mask];
if(x!=-1)
return x;
x=0;
for(int j=0;j<ad[i].size();j++)
{
int k=ad[i][j];
if(mask&(1<<k))
continue;
if(k==n-1&&oncnt(mask)!=n-2)
continue;
if(k!=n-1&&oncnt(mask)==n-2)
continue;
// The above two pruning statements are necessary.
x=madd(x,sol(k,mask|(1<<i)));
}
return x;
}
Inside Main:
cin>>n>>m;
for(int i=0;i<=n-1;i++)
{
for(int j=0;j<=(1<<(n-1));j++)
dp[i][j]=-1;
}
int a,b;
for(int i=1;i<=m;i++)
{
cin>>a>>b;
a--;
b--;
ad[a].pb(b);
}
cout<<sol(0,0);
I found this approach to be extremely fast, and I was able to generalize it to work on a hexagonal grid: https://hal.archives-ouvertes.fr/hal-00172308/document. The trick is to push a frontier through the grid while keeping track of the possible paths. My implementation handles 20x20 grids in a few seconds.

Resources