Greedy Algorithm: The Robot

Greedy Algorithm: The Robot - algorithm

Any ideas? I've tried drawing it out and I've narrowed down the minimum number of robots you'd need but I just don't know how to express it in a greedy algorithm or how to prove it. It's a bonus question from one of our lectures so we don't have to know how to do it but I feel like its a good exercise. Thanks in advance!

until robot is at the bottom row:
while there's a coin to the right on the same row:
go right
go down one step
go to the right corner
or perhaps more succinctly:
If there are coins to the right: go right
Otherwise: go down
Edit:
To see that the algorithm is optimal in the sense that it requires the fewest total number of robots to clear the board, observe that no robot ever makes a move that is worse than an optimal robot: "greedy stays ahead". Here's an attempt to make this argument more formal:
Let G be the greedy algorithm and R be any optimal algorithm.
From the perspective of a single robot, there is some set S of coins within reach. From the starting position, for example, all coins are within reach (though some coins may be mutually exclusive, of course). when a robot r makes a move, a subset V of S becomes unreachable for r. It is clear that for any single move, only one additional robot is needed to take all coins in V. Thus in some sense, the worst possible individual move will be such that V is not empty, and there is no way that a single move would cause the algorithm to require two or more additional robots.
For a robot in G, unless S is empty, V is always a true subset of S. In other words, G doesn't make any "obviously stupid" moves. Combined with the fact that G and R collect all coins, we see that the only interesting places where the robots differ are where they make different choices (down or right) after having taken the same coin.
Consider the robots r in R and g in G at a point where they differ. There are two possibilities:
g goes right and r goes down.
g goes down and r goes right.
In the first case, there is a coin to the right, and r goes down. Thus V is not empty for r at that step, and by the previous argument, g's decision can't be worse.
In the second case, there is no coin to the right, and g goes down. It is clear that V is empty for g, and r's decision can't be better than that.
We see that in any case where R and G differ, G is at least as good as R, which is optimal, so G must also be optimal.

I just don't know how to express it in a greedy algorithm
Use Manhattan/taxicab distance
Then a greedy heuristic may be
for any robot:
look right-down for the closest coin (using taxicab dist min(deltaX, deltaY))
collect it
from the current point, repeat from step 1.
Just "retire" a robot which see no coins at step 1.

The distance use Manhattan distance.
While robot cannot move:
if robot is at right edge:
nextStep = down
if robot is at bottom edge:
nextStep = right
nextStep = F(right,down)
F(right,down):
if(distanceAll(right)<distanceAll(down))
return right
else return down
distanceAll(location):
return the sum of distance between this location to remaining coins
distanceAll can be a heuristics function, maybe you can define a range to calculate the sum of distance. 5X5 or 3X3

Related

Find the closest set of points to another set

I have two sets, A and B, with N and M points in R^n respectively. I know that N < M always.
The distance between two points, P and Q, is denoted by d( P,Q ). As the problem is generic, this distance could be any function (e.g. Euclidean distance).
I want to find the closest subset of B to A. Mathematically I would say, I want to find the subset C of B with size N with the minimal global distance to A. The global distance between A and C is given by
D(A,C) = min([sum(d(P_i,Q_i),i=1,N) with P_i in A and Q_i in C* for C* in Permutations of C])
I've been thinking about this problem and I did an algorithm that get a local optimum, but not necessarily the optimal:
Step 1) Find the nearest point of each point of A in B. If there are no points repeated, I found the optimal subset and finish the algorithm. However, if there are points repeated, go to step 2.
Step 2) Compare their distances (of course I compare the distance between points with the same closest point). The point with the minimal distance keeps the point previously found and the others change their desired point for the "next" closest point that have not been selected for another point yet.
Step 3) Check if all the points are different. If they are, finish. If not, go back to step 2.
Any idea? Trying all the combinations is not a good one (I should calculate M!/(M-N)! global distances)

If M = N, this problem could be formulated as minimum-weight perfect matching in a bipartite graph or, in other words, an assignment problem. A well-known method for solving an assignment problem is the Hungarian algorithm.
To make Hungarian algorithm applicable in the case N < M, you could extend set A with (M-N) additional elements (each having zero distance to all elements of B).

Minimum spanning tree with two edges tied

I'd like to solve a harder version of the minimum spanning tree problem.
There are N vertices. Also there are 2M edges numbered by 1, 2, .., 2M. The graph is connected, undirected, and weighted. I'd like to choose some edges to make the graph still connected and make the total cost as small as possible. There is one restriction: an edge numbered by 2k and an edge numbered by 2k-1 are tied, so both should be chosen or both should not be chosen. So, if I want to choose edge 3, I must choose edge 4 too.
So, what is the minimum total cost to make the graph connected?
My thoughts:
Let's call two edges 2k and 2k+1 a edge set.
Let's call an edge valid if it merges two different components.
Let's call an edge set good if both of the edges are valid.
First add exactly m edge sets which are good in increasing order of cost. Then iterate all the edge sets in increasing order of cost, and add the set if at least one edge is valid. m should be iterated from 0 to M.
Run an kruskal algorithm with some variation: The cost of an edge e varies.
If an edge set which contains e is good, the cost is: (the cost of the edge set) / 2.
Otherwise, the cost is: (the cost of the edge set).
I cannot prove whether kruskal algorithm is correct even if the cost changes.
Sorry for the poor English, but I'd like to solve this problem. Is it NP-hard or something, or is there a good solution? :D Thanks to you in advance!

As I speculated earlier, this problem is NP-hard. I'm not sure about inapproximability; there's a very simple 2-approximation (split each pair in half, retaining the whole cost for both halves, and run your favorite vanilla MST algorithm).
Given an algorithm for this problem, we can solve the NP-hard Hamilton cycle problem as follows.
Let G = (V, E) be the instance of Hamilton cycle. Clone all of the other vertices, denoting the clone of vi by vi'. We duplicate each edge e = {vi, vj} (making a multigraph; we can do this reduction with simple graphs at the cost of clarity), and, letting v0 be an arbitrary original vertex, we pair one copy with {v0, vi'} and the other with {v0, vj'}.
No MST can use fewer than n pairs, one to connect each cloned vertex to v0. The interesting thing is that the other halves of the pairs of a candidate with n pairs like this can be interpreted as an oriented subgraph of G where each vertex has out-degree 1 (use the index in the cloned bit as the tail). This graph connects the original vertices if and only if it's a Hamilton cycle on them.
There are various ways to apply integer programming. Here's a simple one and a more complicated one. First we formulate a binary variable x_i for each i that is 1 if edge pair 2i-1, 2i is chosen. The problem template looks like
minimize sum_i w_i x_i (drop the w_i if the problem is unweighted)
subject to
<connectivity>
for all i, x_i in {0, 1}.
Of course I have left out the interesting constraints :). One way to enforce connectivity is to solve this formulation with no constraints at first, then examine the solution. If it's connected, then great -- we're done. Otherwise, find a set of vertices S such that there are no edges between S and its complement, and add a constraint
sum_{i such that x_i connects S with its complement} x_i >= 1
and repeat.
Another way is to generate constraints like this inside of the solver working on the linear relaxation of the integer program. Usually MIP libraries have a feature that allows this. The fractional problem has fractional connectivity, however, which means finding min cuts to check feasibility. I would expect this approach to be faster, but I must apologize as I don't have the energy to describe it detail.

I'm not sure if it's the best solution, but my first approach would be a search using backtracking:
Of all edge pairs, mark those that could be removed without disconnecting the graph.
Remove one of these sets and find the optimal solution for the remaining graph.
Put the pair back and remove the next one instead, find the best solution for that.
This works, but is slow and unelegant. It might be possible to rescue this approach though with a few adjustments that avoid unnecessary branches.
Firstly, the edge pairs that could still be removed is a set that only shrinks when going deeper. So, in the next recursion, you only need to check for those in the previous set of possibly removable edge pairs. Also, since the order in which you remove the edge pairs doesn't matter, you shouldn't consider any edge pairs that were already considered before.
Then, checking if two nodes are connected is expensive. If you cache the alternative route for an edge, you can check relatively quick whether that route still exists. If it doesn't, you have to run the expensive check, because even though that one route ceased to exist, there might still be others.
Then, some more pruning of the tree: Your set of removable edge pairs gives a lower bound to the weight that the optimal solution has. Further, any existing solution gives an upper bound to the optimal solution. If a set of removable edges doesn't even have a chance to find a better solution than the best one you had before, you can stop there and backtrack.
Lastly, be greedy. Using a regular greedy algorithm will not give you an optimal solution, but it will quickly raise the bar for any solution, making pruning more effective. Therefore, attempt to remove the edge pairs in the order of their weight loss.

Understanding ordering within a graph when doing traversals

I'm trying to understand depth first and breadth first traversals within the context of a graph. Most visual examples I've seen use trees to illustrate the difference between the two. The ordering of nodes within a tree is much more intuitive than in a graph (at least to me) and it makes perfect sense that nodes would be ordered top down, left to right from the root node.
When dealing with graphs, I see no such natural ordering. I've seen an example with various nodes labeled A though F, where the author explains traversals with nodes assuming the lexical order of their label. This would seem to imply that the type of value represented by a node must be inherently comparable. Is this the case? Any clarification would be much appreciated!

Node values in graphs need not be comparable.
An intuitive/oversimplified way to think about BFS vs DFS is this:
In DFS, you choose a direction to move, then you go as far as you can in that direction until you hit a dead end. Then you backtrack as little as possible, until you find a different direction you can go in. Follow that to its end, then backtrack again, and so on.
In BFS, you sequentially take one step in every possible direction. Then you take two steps in every possible direction, and so on.
Consider the following simple graph (I've deliberately chosen labels that are not A, B, C... to avoid the implication that the ordering of labels matters):
Q --> X --> T
| |
| |
v v
K --> W
A DFS starting at Q might proceed like this: Q to X to W (dead end), backtrack to X, go to T (dead end), backtrack to X and then to Q, go to K (dead end since W has already been visited).
A BFS starting at Q might proceed like this: Q, then X (one step away from Q), then K (one step away from Q), then W (two steps away from Q), then T (two steps away from Q).

Modified a-star pathfinding heuristic design

FIRST,
The ideal path was (in order of importance):
1. shortest
My heuristic (f) was:
manhattan distance (h) + path length (g)
This was buggy because it favored paths which veered towards the target then snaked back.
SECOND,
The ideal path was:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
My heuristic stayed the same. I checked for the second criteria at the end, after reaching the target. The heuristic was made slightly inefficient (to fix the veering problem) which also resulted in the necessary adjacent coordinates always being searched.
THIRD,
The ideal path:
1. shortest
2. approaches the destination from the same y coordinate (if of equal length)
3. takes the least number of turns
Now I tried making the heuristic (f):
manhattan distance (h) + path length (g) * number of turns (t)
This of course works for criteria #1 and #3, and fixes the veering problem inherently. Unfortunately it's now so efficient that testing for criteria #2 at the end is not working because the set of nodes explored is not large enough to reconstruct the optimal solution.
Can anyone advise me how to fit criteria #2 into my heuristic (f), or how else to tackle this problem?
CRITERIA 2 example: If the goal is (4,6) and the paths to (3,6) and (4,5) are of identical length, then the ideal solution should go through (3,6) because it approaches from the Y plane instead, of (4,5) which comes from the X plane. However if the length is not identical, then the shortest path must be favored regardless of what plane it approaches in.

You seem to be confusing the A* heuristic, what Russell & Norvig call h, with the partial path cost g. Together, these constitute the priority f = g + h.
The heuristic should be an optimistic estimate of how much it costs to reach the goal from the current point. Manhattan distance is appropriate for h if steps go up, down, left and right and take at least unit cost.
Your criterion 2, however, should go in the path cost g, not in h. I'm not sure what exactly you mean by "approaches the destination from the same y coordinate", but you can forbid/penalize entry into the goal node by giving all other approaches an infinite or very high path cost. There's strictly no need to modify the heuristic h.
The number of turns taken so far should also go in the partial path cost g. You may want to include in h an (optimistic) estimate of how many turns there are left to take, if you can compute such a figure cheaply.

Answering my own question with somewhat of a HACK. Still interested in other answers, ideas, comments, if you know of a better way to solve this.
Hacked manhattan distance is calculated towards the nearest square in the Y plane, instead of the destination itself:
dy = min(absolute_value(dy), absolute_value(dy-1));
Then when constructing heuristic (f):
h = hacked_manhattan_distance();
if (h < 2)
// we are beside close to the goal
// switch back to real distance
h = real_manhattan_distance();

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?

Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.

Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.

It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.

I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.

Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)

Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.

Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).

Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)

If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio