Number of closed regions created by a path - algorithm

Given a path P described by a list of positions in the xy-plane that are each connected by edges, compute the least number of edges that have to be removed from P such that P does not close off any regions in the xy-plane (i.e., it should be possible to go from any point to any other point). Every position will have integer coordinates, and each position will be one unit left, right, up, or down from the previous one.
For example, if P = {[0,0], [0,1], [1,1], [1,0], [0,0]}, then the path is a square starting and ending at (0,0). Any 1 of the 4 edges of the square could be removed, so the answer is 1.
Note that the same edge can be drawn twice. That is, if P = {[0,0], [0,1], [1,1], [1,0], [0,0], [0,1], [1,1], [1,0], [0,0]}, the answer would be 2, because now each side of the square has 2 edges, so at least 2 edges would have to be removed to "free" the square.
I've tried a naive approach where if any position is visited twice, there could be an enclosed region (not always, but my program relies on this assumption), so I add 1 to the minimum number of edges removed. In general if a vertex is visited N times I add N-1 to the number of edges removed. However, if, for example, P = {[0,0], [0,1], [0,0]}, there is no enclosed region whereas my program would think there is. Another case of where it breaks down: if P = {[0,0], [0,1], [1,1], [1,0], [0,0], [1,0]}, my program would output 2 (since (0,0) and (0,1) are each visited twice), whereas the correct answer is 1, since we can just remove any of the other three sides of the square.
It seems that there are two primary subtasks to solve this problem: first, given the path, figure out which positions are enclosed (i.e., figure out the regions that the path splits the graph into); second, use knowledge of the regions to identify which edges must be removed to prevent enclosures.
Any hints, pseudocode, or code would be appreciated.
Source: Princeton's advanced undergraduate class on algorithms.

Here are a few ideas that might help. I'm going to assume that you have n points.
You could first insert all of the edges in a set S so that duplicate edges are removed:
for(int i = 0; i < n-1; i++)
S.insert( {min(p[i], p[i+1), max(p[i], p[i+1])} );
Now iterate over the edges again and build a graph. Then find the longest simple path in this graph.
The resulting graph is bipartite (if a cycle exists it must have even length). This piece of information might help as well.

You could use a flood-fill algorithm to find the contiguous regions of the plane created by the path. One of these regions is infinite but it's easy to compute the perimeter with a scanline sweep, and that will limit the total size to be no worse than quadratic in the length of the path. If the path length is less than 1,000 then quadratic is acceptable. (Edit: I later realized that since it is only necessary to identify the regions adjacent to edges of the line, you can do this computation by sorting the segments and then applying a scanline sweep, resulting in O(n log n) time complexity.)
Every edge in the path is between two regions (or is irrelevant because the squares on either side are the same region). For the relevant edges, you can count repetitions and then find the minimum cost boundary between any pair of adjacent regions. All that is linear once you've identified the region id of each square.
Now you have a weighted graph. Construct a minimum spanning tree. That should be precisely the minimum collection of edges which need to be removed.
There may well be a cleverer solution. The flood-fill strikes me as brute-force and naive, but it's the best I can do in ten minutes.
Good luck.

Related

A maximum flow solution to a modified Knight travel problem

You are given an n x n chessboard with k knights (of the same color) on it. Someone has spilled superglue on k of the squares, and if a knight ever finishes his move on one of these glue squares, it becomes stuck forever. Additionally (and this is why we can't have nice things) someone has cut out some of the squares so the chessboard has holes in it. You are given an initial position of the knights. The knights move as they do in regular chess, but unlike regular chess, on each turn all the knights move at once (except of course the stuck ones). At the end of each move, a square cannot be occupied by more than 1 knight. Hole squares can't be occupied by knights either (but they do count as squares that the knight can jump over). Give an 0(t x poly(n))-time algorithm to determine whether you can use < t moves to move all the knights from their initial positions to new positions where they are each stuck at a glue square.
My initial thought is to formulate this problem into a maximum flow problem and use Ford-Fulkerson algorithm to solve it. But I am not sure what my nodes and edges should be. Any idea? Thanks!
The described problem can be modeled as a layered network problem as follows. The node set of the network consists of an artificial starting node s and an artificial terminal node t. The intermediate node set consists of k copies of the n * n chessboard, which means that there are
2 + k * n * n
nodes in total. Imagine s at the top, followed by the k layers of the chessboard copies. The terminal node t would be at the bottom.
Connect s to the initial knight positions in the first chessboard and connect t to all desired terminal positions of the knights in the k-th chessboard.
For every i in {1,...,k-1} connect each square in the i-th chessboard to every square in the i+1the chessboard if and only if it can be reached by a legal knight's move. Finally, delete all edges which leave a superglued square (except if t is its tail) and delete all edges which lead to a hole. Furthermore, every edge is constrained to permit a flow of at least 0 and at most 1. In total, the network has at most
2 * k + k * n * n = k * ( 2 + n * n )
edges. To furthermore take into account that every square is to be occupied by at most one knight, the flow in every intermediate node must also be constrained by 1. This can be done by expanding each intermediate node into two nodes and connecting them by an additional edge in which the flow is constrained by 1, which causes the set of nodes and edges to grow by a factor of at most 2.
The k knights can be moved from their initial positions to their terminal positions if and only if the network admits an s-t-flow of value k, where the sequence of knight's movements and the realizing network flows bijectively correspond.

Find a subset of k most distant point each other

I have a set of N points (in particular this point are binary string) and for each of them I have a discrete metric (the Hamming distance) such that given two points, i and j, Dij is the distance between the i-th and the j-th point.
I want to find a subset of k elements (with k < N of course) such that the distance between this k points is the maximum as possibile.
In other words what I want is to find a sort of "border points" that cover the maximum area in the space of the points.
If k = 2 the answer is trivial because I can try to search the two most distant element in the matrix of distances and these are the two points, but how I can generalize this question when k>2?
Any suggest? It's a NP-hard problem?
Thanks for the answer
One generalisation would be "find k points such that the minimum distance between any two of these k points is as large as possible".
Unfortunately, I think this is hard, because I think if you could do this efficiently you could find cliques efficiently. Suppose somebody gives you a matrix of distances and asks you to find a k-clique. Create another matrix with entries 1 where the original matrix had infinity, and entries 1000000 where the original matrix had any finite distance. Now a set of k points in the new matrix where the minimum distance between any two points in that set is 1000000 corresponds to a set of k points in the original matrix which were all connected to each other - a clique.
This construction does not take account of the fact that the points correspond to bit-vectors and the distance between them is the Hamming distance, but I think it can be extended to cope with this. To show that a program capable of solving the original problem can be used to find cliques I need to show that, given an adjacency matrix, I can construct a bit-vector for each point so that pairs of points connected in the graph, and so with 1 in the adjacency matrix, are at distance roughly A from each other, and pairs of points not connected in the graph are at distance B from each other, where A > B. Note that A could be quite close to B. In fact, the triangle inequality will force this to be the case. Once I have shown this, k points all at distance A from each other (and so with minimum distance A, and a sum of distances of k(k-1)A/2) will correspond to a clique, so a program finding such points will find cliques.
To do this I will use bit-vectors of length kn(n-1)/2, where k will grow with n, so the length of the bit-vectors could be as much as O(n^3). I can get away with this because this is still only polynomial in n. I will divide each bit-vector into n(n-1)/2 fields each of length k, where each field is responsible for representing the connection or lack of connection between two points. I claim that there is a set of bit-vectors of length k so that all of the distances between these k-long bit-vectors are roughly the same, except that two of them are closer together than the others. I also claim that there is a set of bit-vectors of length k so that all of the distances between them are roughly the same, except that two of them are further apart than the others. By choosing between these two different sets, and by allocating the nearer or further pair to the two points owning the current bit-field of the n(n-1)/2 fields within the bit-vector I can create a set of bit-vectors with the required pattern of distances.
I think these exist because I think there is a construction that creates such patterns with high probability. Create n random bit-vectors of length k. Any two such bit-vectors have an expected Hamming distance of k/2 with a variance of k/4 so a standard deviation of sqrt(k)/2. For large k we expect the different distances to be reasonably similar. To create within this set two points that are very close together, make one a copy of the other. To create two points that are very far apart, make one the not of the other (0s in one where the other has 1s and vice versa).
Given any two points their expected distance from each other will be (n(n-1)/2 - 1)k/2 + k (if they are supposed to be far apart) and (n(n-1)/2 -1)k/2 (if they are supposed to be close together) and I claim without proof that by making k large enough the expected difference will triumph over the random variability and I will get distances that are pretty much A and pretty much B as I require.
#mcdowella, I think that probably I don't explain very well my problem.
In my problem I have binary string and for each of them I can compute the distance to the other using the Hamming distance
In this way I have a distance matrix D that has a finite value in each element D(i,j).
I can see this distance matrix like a graph: infact, each row is a vertex in the graph and in the column I have the weight of the arc that connect the vertex Vi to the vertex Vj.
This graph, for the reason that I explain, is complete and it's a clique of itself.
For this reason, if i pick at random k vertex from the original graph I obtain a subgraph that is also complete.
From all the possible subgraph with order k I want to choose the best one.
What is the best one? Is a graph such that the distance between the vertex as much large but also much uniform as possible.
Suppose that I have two vertex v1 and v2 in my subgraph and that their distance is 25, and I have three other vertex v3, v4, v5, such that
d(v1, v3) = 24, d(v1, v4) = 7, d(v2, v3) = 5, d(v2, v4) = 22, d(v1, v5) = 14, d(v1, v5) = 14
With these distance I have that v3 is too far from v1 but is very near to v2, and the opposite situation for v4 that is too far from v2 but is near to v1.
Instead I prefer to add the vertex v5 to my subgraph because it is distant to the other two in a more uniform way.
I hope that now my problem is clear.
You think that your formulation is already correct?
I have claimed that the problem of finding k points such that the minimum distance between these points, or the sum of the distances between these points, is as large as possible is NP-complete, so there is no polynomial time exact answer. This suggests that we should look for some sort of heuristic solution, so here is one, based on an idea for clustering. I will describe it for maximising the total distance. I think it can be made to work for maximising the minimum distance as well, and perhaps for other goals.
Pick k arbitrary points and note down, for each point, the sum of the distances to the other points. For each other point in the data, look at the sum of the distances to the k chosen points and see if replacing any of the chosen points with that point would increase the sum. If so, replace whichever point increases the sum most and continue. Keep trying until none of the points can be used to increase the sum. This is only a local optimum, so repeat with another set of k arbitrary/random points in the hope of finding a better one until you get fed up.
This inherits from its clustering forebear the following property, which might at least be useful for testing: if the points can be divided into k classes such that the distance between any two points in the same class is always less than the distance between any two points in different classes then, when you have found k points where no local improvement is possible, these k points should all be from different classes (because if not, swapping out one of a pair of points from the same class would increase the sum of distances between them).
This problem is known as the MaxMin Diversity Problem (MMDP). It is known to be NP-hard. However, there are algorithms for giving good approximate solutions in reasonable time, such as this one.
I'm answering this question years after it was asked because I was looking for algorithms to solve the same problem, and had trouble even finding out what to call it.

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

maximum distribution with weighed edges

Imagine a graph where each vertex has a value (example, number of stones) and is connected through edges, that represents the cost of traversing that edge in stones. I want to find the largest possible amount of stones, such that each vertex Vn >= this value. Vertices can can exchange stones to others, but the value exchanged gets subtracted by the distance, or weight of the edges connecting them
I need to solve this as a greedy algorithm and in O(n) complexity, where n is the amount of vertices, but I have problems identifying the subproblems/greedy choice. I was hoping that someone could provide a stepping stone or some hints on how to accomplish this, much appreciated
Summary of Question
I am not sure I have understood the question correctly, so first I will summarize my understanding.
We have a graph with vertices v1,v2,..,vn and weighted edges. Let the weight between vi and vj be W[i,j]
Each vertex starts with a number of stones, let us call the number of stones on vertex vi equal to A[i]
You wish to perform multiple transfers in order to maximise the value of min(A[i] for i = 1..n)
x stones can be transferred between vi and vj if x>W[i,j], this operation transforms the values as:
A[i] -= x
A[j] += x-W[i,j] # Note fewer stones arrive than leave
Is this correct?
Response
I believe this problem is NP-hard because it can be used to solve 3-SAT, a known NP-complete problem.
For a 3-sat example with M clauses such as:
(A+B+!C).(B+C+D)
Construct a directed graph which has a node for each clause (with no stones), a node for each variable with 3M+1 stones, and two auxiliary nodes for each variable with 1 stone (one represents the variable having a positive value, and one represents the variable having a negative value.
Then connect the nodes as shown below:
This graph will have a solution with all vertices having value >= 1, if and only if the 3-sat is soluble.
The point is that each red node (e.g. for variable A) can only send stones to either A=1 or A=0, but not both. If we provide stones to the green node A=1, then this node can supply stones to all of the blue clauses which use that variable in a positive sense.
(Your original question does not involve a directed graph, but I doubt that this additional change will make a material difference to the complexity of the problem.)
Summary
I am afraid it is going to be very hard to get an O(n) solution to this problem.

Minimum manhattan distance with certain blocked points

The minimum Manhattan distance between any two points in the cartesian plane is the sum of the absolute differences of the respective X and Y axis. Like, if we have two points (X,Y) and (U,V) then the distance would be: ABS(X-U) + ABS(Y-V). Now, how should I determine the minimum distance between several pairs of points moving only parallel to the coordinate axis such that certain given points need not be visited in the selected path. I need a very efficient algorithm, because the number of avoided points can range up to 10000 with same range for the number of queries. The coordinates of the points would be less than ABS(50000). I would be given the set of points to be avoided in the beginning, so I might use some offline algorithm and/or precomputation.
As an example, the Manhattan distance between (0,0) and (1,1) is 2 from either path (0,0)->(1,0)->(1,1) or (0,0)->(0,1)->(1,1). But, if we are given the condition that (1,0) and (0,1) cannot be visited, the minimum distance increases to 6. One such path would then be: (0,0)->(0,-1)->(1,-1)->(2,-1)->(2,0)->(2,1)->(1,1).
This problem can be solved by breadth-first search or depth-first search, with breadth-first search being the standard approach. You can also use the A* algorithm which may give better results in practice, but in theory (worst case) is no better than BFS.
This is provable because your problem reduces to solving a maze. Obviously you can have so many obstacles that the grid essentially becomes a maze. It is well known that BFS or DFS are the only way to solve mazes. See Maze Solving Algorithms (wikipedia) for more information.
My final recommendation: use the A* algorithm and hope for the best.
You are not understanding the solutions here or we are not understanding the problem:
1) You have a cartesian plane. Therefore, every node has exactly 4 adjacent nodes, given by x+/-1, y+/-1 (ignoring the edges)
2) Do a BFS (or DFS,A*). All you can traverse is x/y +/- 1. Prestore your 10000 obstacles and just check if the node x/y +/-1 is visitable on demand. you don't need a real graph object
If it's too slow, you said you can do an offline calculation - 10^10 only requires 1.25GB to store an indexed obstacle lookup table. leave the algorithm running?
Where am I going wrong?

Resources