I have a question which is part of my program.
For a tree T=(V,E) we need to find the node v in the tree that minimize the length of the longest path from v to any other node.
so how we find the center of the tree? Is there can be only one center or more?
If anyone can give me good algorithm for this so i can get the idea on how i can fit into my program.
There are two approaches to do this (both runs in the same time):
using BFS (which I will describe here)
using FIFO queue.
Select any vertex v1 on your tree. Run BFS from this vertex. The last vertex (v2) you will proceed will be the furthest vertex from v1. Now run another BFS, this time from vertex v2 and get the last vertex v3.
The path from v2 to v3 is the diameter of the tree and your center lies somewhere on it. More precisely it lies in the middle of it. If the path has 2n + 1 points, there will be only 1 center (in the position n + 1). If the path has 2n points, there will be two centers at the positions n and n + 1.
You only use 2 BFS calls which runs in 2 * O(V) time.
Consider a tree with two nodes? Which is the center? Either one will suffice, ergo a tree can have more than one center.
Now, think about what it means to be the center. If all of the branches are the same height, the center is the root (all paths go through the root). If the branches are of different heights then the center must be either the root or in the branch with the greatest height otherwise the maximum path is greater than the height of the tallest branch and the root would be a better choice. Now, how far down the tallest branch do we need to look? Half the difference in height between the tallest branch (from the root) and the next tallest branch (if the difference is at most 1 the root will suffice). Why, because as we go down the tallest branch by one level we are lengthening the path to the deepest node of the next tallest branch by one and reducing the distance to the deepest node in the current branch by one. Eventually, they will be equal when you've traversed half the difference in the depth. Now as we go down the tallest branch, we simply need to choose at each level the tallest sub-branch. If more than one has the same height, we simply choose one arbitrarily.
Essentially, what you are doing is finding the longest path in the graph, which is the distance between the tallest branch of the tree and the next tallest branch, then finding the middle node on that branch. Because there may be multiple paths of the same length as the longest path and the length of the longest path may be even, you have multiple possible centers.
Rather than do this homework problem for you, I'm going to ask you through the thought process that gets the answer...
1) What would you do with the graph a-b-c (three vertices, two edges, and definitely acyclic)? Imagine for a moment that you have to put some labels on some of the vertices, you know you're going to get the minimum of the longest path on the "center" vertex. (b, with eventual label "1") But doing that in one step requires psychic powers. So ask yourself what b is 1 step away from. If the longest path to b is 1, and we've just stepped one step backwards along that path, what's the length of our path so far? (longest path = 1, -1 for the back one step. Aha: 0). So that must be the label for the leaves.
2) What does this suggest as a first cut for an algorithm? Mark the leaves "0", mark their upstreams "1", mark their upstreams "2" and so on. Marching in from the leaves and counting the distance as we go...
3) Umm... We have a problem with the graph a-b-c-d. (From now on, a labelled vertex will be replaced with its label.) Labeling the leaves "0" gives 0-b-c-0... We can't get two centers... Heck, what do we do in the simpler condition 0-b-1? We want to label b with both "1" and "2"... Handle those in reverse order...
In 0-b-1, if we extend the path from b's left by one, we get a path of length 1. If we extend the path from b's right, we get 2. We want to track "the length of the longest path from v to any other node", so we want to keep track of the longest path to b. That means we mark b with a "2".
0-b-1 -> 0-2-1
In 0-b-c-0, the computer doesn't actually simultaneously update b and c. It updates one of them, giving either 0-1-c-0 or 0-b-1-0, and the next update gives 0-1-2-0 or 0-2-1-0. Both b and c are "center"s of this graph since each of them meets the demand "the node v in the tree that minimize the length of the longest path from v to any other node." (That length is 2.)
This leads to another observation: The result of the computation is not to label a graph, it's to find the last vertex we label and/or the vertex that ends up with the largest label. (It's possible that we won't find a good way to order labels, so we'll end up needing to find the max at the end. Or maybe we will. Who knows.)
4) So now we have something like: Make two copies of the graph -- the labelled copy and the burn-down copy. The first one will store the labels and it's the one that will have the final answer in it. The burn-down copy will get smaller and smaller as we delete unlabeled vertices from it (to find new labelable vertices). (There are other ways to organize this so that only one copy of the graph is used. When you get to the end of understanding this answer, you should find a way to reduce this waste.) Outline:
label = 0
while the burndown graph is nonempty
collect all the leaves in the burndown-graph into the set X
for each leaf in the set X
if the leaf does not have a label
set the leaf's label (to the current value of label)
delete the leaf from the burn-down graph (these leafs are two copies of the same leaf in the input graph)
label = label+1
find the vertex with the largest label and return it
5) If you actually watch this run, you'll notice several opportunities for short-cutting. Including replacing the search on the last line of the outline with a much quicker method for recognizing the answer.
And now for general strategy tips for algorithm problems:
* Do a few small examples by hand. If you don't understand how to do the small cases, there's no way you can jump straight in and tell the computer how to do the large cases.
* If any of the above steps seemed unmotivated or totally opaque, you will need to study much, much harder to do well in Computer Science. It may be the Computer Science is not for you...
Related
I'm solving the problem described below and can't think of a better algorithm than trying every permutation of every vertex of every group with every.
I'm given a graph of vertices, along with a list of groups of specific vertices, the goal is to find the shortest path from a specific starting vertex to a specific ending vertex, and the path must pass through at least one vertex from each specified group of vertices.
There are also vertices in the graph that are not part of any given group.
Re-visiting vertices and edges is possible.
The graph data is specified as follows:
Vertex list - each vertex is identified by a sequence number (0 to the number of vertices -1 )
Edge list - list of vertex pairs (by vertex number)
Vertex group list - list of lists of vector numbers
A specific starting and ending vertex.
I would be grateful for any ideas for a better solution, thank you.
Summary:
We can use bitmasks to efficiently check which groups we have visited so far, and combine this with a traditional BFS/ Dijkstra's shortest-path algorithm.
If we assume E edges, V vertices, and K vertex-groups that have to be included, the below algorithm has a time complexity of O((V + E) * 2^K) and a space complexity of O(V * 2^K). The exponential 2^K term means it will only work for a relatively small K, say up to 10 or 20.
Details:
First, are the edges weighted?
If yes then a "shortest path" algorithm will usually be a variation of Dijkstra's algorithm, in which we keep a (min) priority queue of the shortest paths. We only visit a node once it's at the top of the queue, meaning that this must be the shortest path to this node. Any other shorter path to this node would already have been added to the priority queue and would come before the current iteration. (Note: this doesn't work for negative paths).
If no, meaning all edges have the same weight, then there is no need to maintain a priority queue with the shortest edges. We can instead just run a regular Breadth-first search (BFS), in which we maintain a deque with all nodes at the current depth. At each step we iterate over all nodes at the current depth (popping them from the left of the deque), and for each node we add all it's not-yet-visited neighbors to the right side of the deque, forming the next level.
The below algorithm works for both BFS and Dijkstra's, but for simplicity's sake for the rest of the answer I'll pretend that the edges have positive weights and we will use Dijkstra's. What is important to take away though is that for either algorithm we will only "visit" or "explore" a node for a path that must be the shortest path to that node. This property is essential for the algorithm to be efficient, since we know that we will at most visit each of the V nodes and E edges only one time, giving us a time complexity of O(V + E). If we use Dijkstra's we have to multiply this with log(V) for the priority queue usage (this also applies to the time complexity mentioned in the summary).
Our Problem
In our case we have the additional complexity that we have K vertex-groups, for each of which our shortest path has to contain at least one the nodes in it. This is a big problem, since it destroys our ability to simple go along with the "shortest current path".
See for example this simple graph. Notation: -- means an edge, start is that start node, and end is the end node. A vertex with value 0 does not have a vertex-group, and a vertex with value >= 1 belongs to the vertex-group of that index.
end -- 0 -- 2 -- start -- 1 -- 2
It is clear that the optimal path will first move right to the node in group 1, and then move left until the end. But this is impossible to do for the BFS and Dijkstra's algorithm we introduced above! After we move from the start to the right to capture the node in group 1, we would never ever move back left to the start, since we have already been there with a shorter path.
The Trick
In the above example, if the right-hand side would have looked like start -- 0 -- 0, where 0 means the vertex does not not belonging to a group, then it would be of no use to go there and back to the start.
The decisive reason of why it makes sense to go there and come back, although the path will get longer, is that it includes a group that we have not seen before.
How can we keep track of whether or not at a current position a group is included or not? The most efficient solution is a bit mask. So if we for example have already visited a node of group 2 and 4, then the bitmask would have a bit set at the position 2 and 4, and it would have the value of 2 ^ 2 + 2 ^ 4 == 4 + 16 == 20
In the regular Dijkstra's we would just keep a one-dimensional array of size V to keep track of what the shortest path to each vertex is, initialized to a very high MAX value. array[start] begins with value 0.
We can modify this method to instead have a two-dimensional array of dimensions [2 ^ K][V], where K is the number of groups. Every value is initialized to MAX, only array[mask_value_of_start][start] begins with 0.
The value we store at array[mask][node] means Given the already visited groups with bit-mask value of mask, what is the length of the shortest path to reach this node?
Suddenly, Dijkstra's resurrected
Once we have this structure, we can suddenly use Dijkstra's again (it's the same for BFS). We simply change the rules a bit:
In regular Dijkstra's we never re-visit a node
--> in our modification we differentiate by mask and never re-visit a node if it's already been visited for that particular mask.
In regular Dijkstra's, when exploring a node, we look at all neighbors and only add them to the priority queue if we managed to decrease the shortest path to them.
--> in our modification we look at all neighbors, and update the mask we use to check for this neighbor like: neighbor_mask = mask | (1 << neighbor_group_id). We only add a {neighbor_mask, neighbor} pair to the priority queue, if for that particular array[neighbor_mask][neighbor] we managed to decrease the minimal path length.
In regular Dijkstra's we only visit unexplored nodes with the current shortest path to it, guaranteeing it to be the shortest path to this node
--> In our modification we only visit nodes that for their respective mask values are not explored yet. We also only visit the current shortest path among all masks, meaning that for any given mask it must be the shortest path.
In regular Dijkstra's we can return once we visit the end node, since we are sure we got the shortest path to it.
--> In our modification we can return once we visit the end node for the full mask, meaning the mask containing all groups, since it must be the shortest path for the full mask. This is the answer to our problem.
If this is too slow...
That's it! Because time and space complexity are exponentially dependent on the number of groups K, this will only work for very small K (of course depending on the number of nodes and edges).
If this is too slow for your requirements then there might be a more sophisticated algorithm for this that someone smarter can come up with, it will probably involve dynamic programming.
It is very possible that this is still too slow, in which case you will probably want to switch to some heuristic, that sacrifices accuracy in order to gain more speed.
There is one thing about a star path finding algorithm that I do not understand. In the pseudocode; if the current node's (the node being analysed) g cost is less than the adjacent nodes g cost then recalculate the adjacent nodes g,h a f cost and reassign the parent node.
Why do you do this?
Why do you need to reevaluate the adjacent nodes costs and parent if it's gCost is greater than the current nodes gCost? I'm what instance would you need to do this?
Edit; I am watcing this video
https://www.youtube.com/watch?v=C0qCR18gXdU\
At at 8.19 he says: When you come across blocks (nodes) that have already been analysed, the question is should we change the properties of the block?
First a tip. You can actually add the time you want as a bookmark to get a video that starts right where you want. In this case https://www.youtube.com/watch?v=C0qCR18gXdU#t=08m19s is the bookmarked time link.
Now the quick answer to your question. We fill in a node the first time we find a path to it. But the first path we found to it might not be the cheapest one. We want the cheapest one, and if we find a cheaper one second, we want it.
Here is a visual metaphor. Imagine a running path with a fence next to it. The spot we want is on the other side of the fence. Actually draw this out, it will help.
The first path that our algorithm finds to it is run down the path, jump over the fence. The second path that we find is run part way down the path, go through the gate, then get to that spot. We don't want to throw away the idea of using the gate just because we already figured out that we could get there by jumping the fence!
Now in your picture put costs of moving from one spot to another that are reasonable for moving along a running path, an open field, through a gate, and jumping a fence. Run the algorithm by hand and you'll see that you figure out first that you can jump the fence, and then later that you really wanted to use the gate.
This guy is totally wrong because he says change the parent node however your succesors are based on your parent node and if you change parent Node then you you can't have a valid path because the path is simply by moving from parent to child.
Instead of changing parent, Pathmax function. It says that if a parent Node A have a child node whose cost (heuristic(A) <= heuristic(A) + accumulated(cost)) then set the cost of child equal to the cost of parent.
PathMax to ensure monotonicty:
Monotonicity: Every parent Node has cost greater or equal then the cost of it's child node.
A* has a property: It says that if your cost is monotonically increasing then the first (sub)path that A* finds is always the part of the final path. More precisely: Under monotonicity each node is reach first through the best path.
Do you see why?
Suppose you have a graph :(A,B) (B,C) (A,E) ,(E,D) here every tuple means they are connected. Suppose cost is monotonically increasing and your algortihm chooses (A,B),(B,C) and at this point you know your algorithm have chosen best path till now and everyother path which can reach this node,must have cost higher but if the cost is not monotonically increasing then it can be the case that (A,E) is cost greater than your current cost and from (E,D) it's zero. so you have better path there.
This algorithm relies on it's heuristic function , if it's underustimated then it gets corrected by accumulated cost but if it's over-estimated then it can explore extra node and i leave it for you why this happends.
Why do you need to re-evaluate an adjacent node that's already in the open list if it has a lower g cost to the current node?
Don't do this because it's just extra work.
Corollary: if you later come the same node from a node p with same cost then simply remove that node from queue. Do not extend it.
The following graph sample is a portion of a directed acyclic graph which is to be layered and cleaned up so that only edges connecting consecutive layers are kept.
So what I need is to eliminate edges that form "shortcuts", that is, that jump between non-consecutive layers.
The following considerations apply:
The bluish ring layering is valid because, starting at 83140 and ending at 29518, both branches have the same amount (3) of intermediary nodes, and there is no path that is longer between start and end node;
The green ring, starting at 94347 and ending at 107263, has an invalid edge (already red-crossed), because the left branch encompasses only one intermediary node, while the right branch encompasses three intermediary nodes; Besides, since the first edge of that branch is already valid - we know it pertains to the valid blue ring - it is possible to know which is the right edge to cross-out - otherwise it would be impossible to know which layer should be assigned to node 94030 and so it should be eliminated;
If we consider the pink ring after considering the green one, we know that the lower red-crossed edge is to be removed.
BUT if we consider only the yellow ring, both branches seem to be right (they contain the same number of inner nodes), but actually they only seem right because they contain symmetric errors (shortcuts jumping the same amount of nodes on both branches). If we take this ring locally, at least one of the branches would end up in wrong layers, so it is necessary to use more global data to avoid this error.
My questions are:
What typical concepts and operations are involved in the formulation and possible solution of this problem?
Is there an algorithm for that?
First, topologically sort the graph.
Now from the beginning of sorted array, start breadth first search and try to find the proper "depth" (i.e distance from root) of every node. Since a node can have multiple parents, for a node x, depth[x] is maximum of depth of all it's parents, plus one. We initialize depth for all nodes as -1.
Now in bfs traversal, when we encounter a node p, we try to update the depth of all it's childs c, where depth[c] = max(depth[c],depth[p]+1). Now there are two ways we can detect a child with shortcut.
if depth[p]+1 < depth[c], it means c has a parent with higher depth than p. So edge p to c must be a shortcut.
if depth[p]+1 > depth[c] and depth[c]!=-1, it means c have a parent with lower depth than p. So p is a better parent, and that other parent of c must have a shortcut with p.
In both cases, we mark c as problematic.
Now our goal is for every 'problematic' node x, we check all it's parent, whose depth should be depth[x]-1. If any of them have depth that is lower than that, that one have a shortcut edge with x that needs to be removed.
Since the graph can have multiple roots, we should have a variable to mark visited nodes, and repeat the above thing for any that's left unvisited.
This will sort the yellow ring problem, because before we visit any node, all it's predecessors has already been visited and properly ranked. This is ensured by the topological sort.
(Note : we can do this by just one pass. Instead of marking problematic nodes, we can maintain a parent variable for all nodes, and delete edge with the old parent whenever case 2 occurs. case 1 should be obvious)
I came across this interview question. The first part is as follows:
You are given matrix containing letters 'U', 'D', 'L', 'R' and 'X'. 'U' allow you to move up, 'L' allows to move left, etc. 'X' is the
destination. Check if it is possible to reach the destination from the
top left corner.
This part is easily done with DFS. I struggle with the second part though:
How many edits (e.g. change 'U' -> 'L') you need to make in order to
reach X
I assume that it can be done with a modified BFS counting the number of times we went against specified direction. Could someone give a hint?
In essence, you want to compute what might be called the "edit distance" from the top left corner to each element of the matrix (and in particular, to the element marked X), by which I mean the number of edits you'd have to make in the matrix in order to be able to reach a given element from the top left corner.
To do this, you can start by finding all the elements where the "edit distance" is zero, meaning, the top-left corner and all elements that are already reachable from it. (You mention that you can use depth-first search for this, but it's not even really a "search", since each element tells you exactly which element comes after it. So there's just a single path from the top left corner, that either ends at X, ends by "walking off" an edge, or gets into a cycle.)
You can then use breadth-first search, starting from the set of elements you just found. For any given d, the elements with edit distance = d + 1 are all the elements that:
don't have edit distance ≤ d;
and either:
are neighbors of an element with edit distance = d
or are reachable from such a neighbor
You can stop as soon as you've computed the edit distance of the square marked X.
This approach visits each element at most once, so if the matrix has N elements, this approach is in O(N) time and space.
The matrix represents a directed graph: the nodes are locations in the matrix, and the (directed) edges are represented by the letter in the node.
Then, there's a path from the top-left corner to X if there's a path in the graph from the top-left corner to X. You can determine this by using Dijkstra's shortest path algorithm. (You can also use DFS, but we'll need Dijkstra in the second part anyway).
To determine the smallest number of edits to make a path possible, one can turn the matrix into a weighted, directed graph. Between any two adjacent nodes a and b, add a weighted edge between a and b with weight 0 if a already contains the direction of b, and weight 1 otherwise. The edge of weight 0 represents following the direction already in the matrix, and the edge of weight 1 represents editing the node.
Then, use Dijkstra's algorithm find the lowest-weight path from the top-left corner to X in this graph. The weight of this shortest path is the number of edits needed.
There is only one path, so iterate it from the beginning and mark every square you visit as "reachable from start". The go to the end and do the same thing, running backwards. Because you are running backwards,, there are potentially many paths (an UP and a DOWN could both lead into the same square). Anyway, mark those square as "end reachable from". If the first set contains a square adjacent to the second set, you need one edit. Otherwise you have to mark the separating squares, and do a distance transform, followed by a feature transform (see my binary image library). The "narrowest neck" has to be edited to drive a path from set one to set two).
Here is the full title I would have posted, but it happens to be too long:
Given a source node, dest node, and intermediate nodes, how does one detect if the shortest Manhattan Distance is blocked by the intermediate nodes?
I've drawn a diagram to make it more clear. On the left side, "u" is the source node and "v" is the destination node. The nodes labeled 1 through 6 are the intermediate nodes. The shortest Manhattan Distance from u -> v would be 12, but the intermediate nodes form a wall blocking it. The diagram on the right, with u' being the source, and v' being the destination, shows that the intermediate nodes 1 through 5 do not block the shortest Manhattan distance from u' to v'.
I'm trying to find an algorithm that won't require me to actually do a graph search (e.g. BFS), because the distance between u and v could potentially be very large.
If all you want to do is detect whether the shortest path (one consisting of moves that monotonically take you in the right direction) is blocked, then you are trying to check whether the blocking nodes cut the rectangle whose corners are given by the source and destination node into two different regions that are disconnected. If no shortest path from the source to the destination is possible, then every path must have some point in it that's blocked.
Let's suppose for simplicity that your start point is below and to the left of the destination point. In that case, find, in O(n), all of the other points that are obstacle points contained in the bounding box holding the start and end point. You now want to see if there is some subset of those nodes that cuts the rectangle into two pieces, one containing the bottom-left corner and one containing the top-right corner. This is possible iff there is a path of the blocking nodes from the left side to the right side, from the left side to the bottom side, from the top side to the right side, or from the top side to the bottom side. Thus we just need to check if any of these are possible.
Fortunately, this can be done efficiently by modeling the problem as a graph search in a graph that has size O(n), where n is the number of blocking points, and has nothing to do with the size of the bounding box. That is, no matter how far apart the test points are, the size of the graph to search depends solely on the number of blocking points.
The graph you want to construct has two parts. First, build a graph where each blocking point is connected to each other blocking point in the 3x3 square surrounding it. These edges link together blocking points that could be part of the same barrier, in that no path from the source to the target could pass between two blocking points joined by an edge. Now, add in four new nodes representing the top wall, left wall, right wall, and bottom wall and connect them to each node that is adjacent to the appropriate wall. That way, for example, a path from the left wall node to the right wall node would represent a series of blocking nodes that make it impossible to get from the bottom-left node to the top-right node.
This graph has size O(n), where n is the number of blocking nodes, since there are O(n) nodes and each node can have at most 12 edges - one for each of the 8 neighbors and potentially one for each of the four walls. You could construct it in at worst quadratic time by scanning over each node and, for each other node, seeing if they are adjacent. There is probably a better way to do this, but nothing comes to me at the moment.
Now that you have the graph, for each of the pairs of walls that, if connected, would disconnect the graph, run a graph search in this graph between those two wall nodes. If a path exists, report that the shortest path is blocked. If not, report that some shortest path is unblocked. These searches could be done with a simple DFS, or since you're running multiple searches and just want to know if they're connected, using a strongly connected components algorithm once and checking if any pair of important nodes are in the same SCC. Either approach takes time O(n).
Thus the time to solve this problem is at most O(n2), with the bottleneck being the time required to construct the graph.
Hope this helps!
Here's my idea:
I'll describe the case when the destination is upper and to right from the source, for other cases, rotate. (For simple cases where the nodes have the same x/y coordinate, just checks whether there's a blocking node directly between them)
Take the matrix with source and destination in corners. Now, a column at a time, from left to right and inside the column, bottom up, mark blocked nodes. A node B is blocked iff any of following is true:
B is an intermediate node
the nodes left to B and bottom from B are both blocked (both were already checked given the order of processing) or outside the bounds of the matrix
In the end, if destination is blocked, there's no free shortest path.
The time required is O(m*n), where m, n are the lengths of sides of the matrix. So when you'll only have several intermediate nodes, templatetypedef's solution may be more appropriate.
EDIT: Got it a little wrong previously, now I hope I didn't miss anything