Related
I've some areas which contains 1 or more polygons. Each polygon is represented in GL_TRIANGLE_STRIP format, where each vertex is a pair of (lat, long). Is there any way to get the contours of the area?
Some specs:
Contours must be in counter clockwise order.
Any 2 polygons can have one common edge.
Polygons can be concave
A polygon can have maximum 1 'gap' inside of it, which will be represented by another contour, in clockwise order.
I'm looking for an algorithm which complexity is around O(N*logN), where N = number of vertices.
EDIT: I tried solutions like going 2 by 2 until I reach the end of the dataset and then going backwards, but this algorithm works bad on polygons with gaps, for example
this polygon where input is: A B C D E F G H I J, where I = A and J = B, doing that, the output will be A C E G I J H F D B and that should be A C E G and B H F G (order is inverted because it was easier to draw like that).
Another solution was considering points an undirected graph and edges between them (according to GL_TRIANGLE_STRIP format) where I applied a DFS in order to take out connected components. After that I computed the area of each component, and I considered the maximum area polygon as the counter clockwise contour, and the rest as the clockwise contour. That doesn't work because adjacency list requires some sorting which will make the algorithm inefficient.
Another solution I tried was some tweaked convex hull, but a convex hull is still a convex hull and did not work on concave polygons.
I also read about concave hull, but that seems to not give always precise results.
Thank you for your answers!
Let's start by converting a triangle strip to a polygon. We take the following strip as an example:
(Courtesy of Wikimedia Commons)
Your strip definition would be:
A B C D E F
Converting that into a polygon is very simple. Just go through the list and use every second vertex. When you are at the end, return backwards and use the other vertices. In this case:
A C E (reached the end, now return) F D B
This can be done in O(N), where N is the number of vertices of the strip. Whether this is clockwise or counter-clockwise depends on the orientation of the strip.
So, we can turn every strip into a polygon. All that remains is to remove shared edges. Let's say we have two polygons
A C E F D B
W X E C Y Z
Note that any edge that is shared (in this case C E) will appear in opposite directions (C E in the first polygon, E C in the second one). To find the area contour, we simply need to find matching edges and merge the two polygons.
To find matching edges, it is enough to write all the polygon edges into a hash map (store what polygon they belong to and where they are in the polygon). This can be done in O(E), where E is the total number of polygon edges.
To finally merge polygons, it is actually simpler to create new ones. Modification is definitely possible, but a bit more delicate. To do that we just need to walk along our polygon edges. As long as we are on an edge whose inverse is not in the hash map, then write this edge to the output polygon. If it is, ignore it and continue on the other polygon. Mark the edges that you visited and stop as soon as you are back to an edge that you visited before. Do this until all edges are visited (or both directions are in the hash map). This whole process can be done in O(E), where E is the total number of polygon edges.
Here is how that would look like in our example:
Start at polygon 1
Start at edge (A, C)
(A, C) is neither visited nor is its inverse in the hash map
Create new output area = [A, C]
the inverse of (C, E) is in the hash map, continue to polygon 2
(C, Y) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y]
(Y, Z) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z]
(Z, W) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W]
(W, X) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X]
(X, E) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X, E]
the inverse of (E, C) is in the hash map, continue to polygon 1
(E, F) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X, E, F]
(F, D) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X, E, F, D]
(D, B) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X, E, F, D, B]
(B, A) is neither visited nor is its inverse in the hash map
Append it to the area = [A, C, Y, Z, W, X, E, F, D, B, A]
(A, C) is already visited
Close the area contour, continue to check other unvisited edges
If you want, you can then also group the generated contours by the polygons they were created from to find connected areas that are bounded by multiple contours. A disjoint set over the polygons would be helpful for that task. If you need, you could also try to classify the contours into holes and outer contours. But be aware that this notion is highly ambiguous on the sphere (imagine a sphere and an area that is a band along the equator - which of the two contours is the hole and which is outside?) For comparatively small areas, you could use the area for this classification.
Consider a unit square containing n 2D points. We say that two points p and q are independent in a square, if the Euclidean distance between them is greater than 1. A unit square can contain at most 3 mutually independent points. I would like to find those 3 mutually independent points in the given unit square in O(n log n). Is it possible? Please help me.
Can this problem be solved in O(n^2) without using any spatial data structures such as Quadtree, kd-tree, etc?
Use a spatial data structure such as a Quadtree to store your points. Each node in the quadtree has a bounding box and a set of 4 child nodes, and a list of points (empty except for the leaf nodes). The points are stored in the leaf nodes.
The point quadtree is an adaptation of a binary tree used to represent two-dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order in which data is processed. It is often very efficient in comparing two-dimensional, ordered data points, usually operating in O(log n) time.
For each point, maintain a set of all points that are independent of that point.
Insert all your points into the quadtree, then iterate through the points and use the quadtree to find the points that are independent of each:
main()
{
for each point p
insert p into quadtree
set p's set to empty
for each point p
findIndependentPoints(p, root node of quadtree)
}
findIndependentPoints(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point q in n
if distance between p and q > 1
find intersection r of q's set and p's set
if r is non-empty then
p, q, r are the 3 points -> ***SOLVED***
add p to q's set of independent points
add q to p's set of independent points
for each subnode m of n (up 4 of them)
findIndependentPoints(p, m)
}
You could speed up this:
find intersection r of q's set and p's set
by storing each set as a quadtree. Then you could find the intersection by searching in q's quadtree for a point independent of p using the same early-out technique:
// find intersection r of q's set and p's set:
// r = findMututallyIndependentPoint(p, q's quadtree root)
Point findMututallyIndependentPoint(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point r in n
if distance between p and r > 1
return r
for each subnode m of n (up 4 of them)
findMututallyIndependentPoint(p, m)
}
An alternative to using Quadtrees is using K-d trees, which produces more balanced trees where each leaf node is a similar depth from the root. The algorithm for finding independent points in that case would be the same, except that there would only be up to 2 and not 4 child nodes for each node in the data structure, and the bounding boxes at each level would be of variable size.
You might want to try this out.
Pick the top left point (Y) with coordinate (0,1). Calculate distance from each point from the List to point Y.
Sort the result in increasing order into SortedPointList (L)
If the first point (A) and the last point (B) in list L are independent:
Foreach point P in list L:
if P is independent to both A and B:
Return A, B, P
Pick the top right point (X) with coordinate (1,1). Calculate distance from each point from the List to point X.
Sort the result in increasing order into SortedPointList (S)
If the first point (C) and the last point (D) in list L are independent:
Foreach point O in list S:
if P is independent to both C and D:
Return C, D, O
Return null
This is a wrong solution. Kept it just for comments. If one finds another solution based on smallest enclosing circle, please put a link as a comment.
Solve the Smallest-circle problem.
If diameter of a circle <= 1, return null.
If the circle is determined by 3 points, check which are "mutually independent". If there are only two of them, try to find the third by iteration.
If the circle is determined by 2 points, they are "mutually independent". Try to find the third one by iteration.
Smallest-sircle problem can be solved in O(N), thus the whole problem complexity is also O(N).
How would you output all the possible topological sorts for a directed acyclic graph? For example, given a graph where V points to W and X, W points to Y and Z, and X points to Z:
V --> W --> Y
W --> Z
V --> X --> Z
How do you topologically sort this graph to produce all possible results? I was able to use a breadth-first-search to get V, W, X, Y, Z and a depth-first search to get V, W, Y, Z, X. But wasn't able to output any other sorts.
An algorithm for generating all topological sorts for a given DAG (aka generating all linear extensions of a partial order) is given in the paper "Generating Linear Extensions Fast" by Pruesse and Ruskey. The algorithm has an amortized running time that is linear in the output (e.g.: if it outputs M topological sorts, it runs in time O(M)).
Note that in general you can't really have anything that has a runtime that's efficient with respect to the size of the input since the size of the output can be exponentially larger than the input. For example, a completely disconnected DAG of N nodes has N! possible topological sorts.
It might be possible to count the number of orderings faster, but the only way to actually generate all orderings that I can think of is with a full brute-force recursion. (I say "brute force", but this is still much better than the brutest-possible brute force approach of testing every possible permutation :) )
Basically, at every step there is a set S of vertices remaining (i.e. which have not been added to the order yet), and a subset X of these can be safely added in the next step. This subset X is exactly the set of vertices that have no in-edges from vertices in S.
For a given partial solution L consisting of some number of vertices that are already in the order, the set S of remaining vertices, and the set X of vertices in S that have no in-edges from other vertices in S, the call Generate(L, X, S) will generate all valid topological orders beginning with L.
Generate(L, X, S):
If X is empty:
Either L is already a complete solution, in which case it contains all n vertices and S is also empty, or the original graph contains a cycle.
If S is empty:
Output L as a solution.
Otherwise:
Report that a cycle exists. (In fact, all vertices in S participate in some cycle, though there may be more than one.)
Otherwise:
For each x in X:
Let L' be L with x added to the end.
Let X' be X\{x} plus any vertices whose only in-edge among vertices in S came from x.
Let S' = S\{x}.
Generate(L', X', S')
To kick things off, find the set X of all vertices having no in-edges and call Generate((), X, V). Because every x chosen in the "For each" loop is different, every partial solution L' generated by the iterations of this loop must also be distinct, so no solution is generated more than once by any call to Generate(), including the top-level call.
In practice, forming X' can be done more efficiently than the above pseudocode suggests: When we choose x, we can delete all out-edges from x, but also add them to a temporary list of edges, and by tracking the total number of in-edges for each vertex (e.g. in an array indexed by vertex number) we can efficiently detect which vertices now have 0 in-edges and should thus be added to X'. Then at the end of the loop iteration, all the edges that we deleted can be restored from the temporary list.
So this approach is flawed! Unsure if it can be salvaged, I'll leave it a little while, if anyone can spot how to fix it, either grab what you can and post a new answer or edit mine.
Specifically, I used the below algorithm on the example from the comment and it will not output the example given, so it is clearly flawed.
The way I've learned to do a topological sort is the following:
Create a list of all the elements with no arrows pointing into it
Create a dictionary of element -> number, where element here is any element in the original collection that has an arrow into it, and the number is how many elements point to it.
Create a dictionary of element -> list, where element here is any element in the original collection that has an arrow out of it, and the list is all the elements those arrows point to
In your example, the two dictionaries and the list would be like this:
D1 D2 List
W: 1 V: W, X V
Y: 1 W: Y, Z
Z: 2 X: Z
X: 1
Then, start a loop where on each iteration you do the following:
Output all elements of the list, these currently have no arrows pointing into them. Make a temporary copy of the list, and clear the list, preparing it for the following iteration
Loop through the temporary copy, and find each element (if it exists) in the dictionary that is element -> list
For each element in those lists, decrement the corresponding number in the element -> number dictionary by 1 (removing 1 arrow). Once a number for an element here reaches 0, add that element to the list (it has no arrows left)
If the list is non-empty, redo the iteration loop
If you reach this point, and the dictionary with element -> number still has any elements left in it with a number above 0 (if you want to, you can remove the elements as you go in the above iteration once their numbers reach zero to make this part easier), then you have a cycle, since the above loop should not terminate until all arrows have been removed.
For your example, each iteration would output the following:
V
W, X (2nd iteration output both W and X)
Y, Z
If you want to know how I arrived at this solution, simply go through my iteration description step by step using the above dictionaries and list as the starting point.
Now, to specifically answer your question, how to output all combinations. The only places where "combinations" comes into play is per iteration. Basically, all the elements that you output in the first step of the iteration (the ones you made a temporary copy of) are considered "equivalent" and any internal ordering between these would have no impact on the topological sort.
So, do this:
In the first point in the iteration, place those elements into a list, and add that to another list, giving you a list of lists
This lists of lists will now contain each iteration as one element, and one element will be yet another list with the elements output in that iteration
Now, combine all permutations of the first list with all the permutations of the second list with all the permutations of the third list, and so on
This means taking this output:
V
W, X
Y, Z
Which gives you 1 * 2 * 2 = 4 permutations in total and you would combine all permutations of the 1st iteration (which is 1) with all the permutations of the 2nd iteration (which is 2, W, X and X, W) with all the permutations of the 3rd iteration (which is 2, Y, Z and Z, Y).
The final list of permutations that are valid topological sorts would be this:
V, W, X, Y, Z
V, X, W, Y, Z
V, W, X, Z, Y
V, X, W, Z, Y
Here is the example from the comment:
A and B with no in-edges. Both A and B have an edge to C, but only A has an edge to D. Neither C nor D has any out-edges.
Which gives:
A --> C
A --> D
B --> C
Dictionaries and list:
D1 D2 List
C: 2 A: C, D A
D: 1 B: C B
Iterations would output:
A, B
D, C
All permutations (2 * 2 = 4):
A, B, D, C
A, B, C, D
B, A, D, C
B, A, C, D
The following pseudo-code is from the first chapter of an online preview version of The Algorithm Design Manual (page 7 from this PDF).
The example is of a flawed algorithm, but I still really want to understand it:
[...] A different idea might be to repeatedly connect the closest pair of
endpoints whose connection will not create a problem, such as
premature termination of the cycle. Each vertex begins as its own
single vertex chain. After merging everything together, we will end up
with a single chain containing all the points in it. Connecting the
final two endpoints gives us a cycle. At any step during the execution
of this closest-pair heuristic, we will have a set of single vertices
and vertex-disjoint chains available to merge. In pseudocode:
ClosestPair(P)
Let n be the number of points in set P.
For i = 1 to n − 1 do
d = ∞
For each pair of endpoints (s, t) from distinct vertex chains
if dist(s, t) ≤ d then sm = s, tm = t, and d = dist(s, t)
Connect (sm, tm) by an edge
Connect the two endpoints by an edge
Please note that sm and tm should be sm and tm.
First of all, I don't understand what "from distinct vertex chains" would mean. Second, i is used as a counter in the outer loop, but i itself is never actually used anywhere! Could someone smarter than me please explain what's really going on here?
This is how I see it, after explanation of Ernest Friedman-Hill (accepted answer):
So the example from the same book (Figure 1.4).
I've added names to the vertices to make it clear
So at first step all the vertices are single vertex chains, so we connect A-D, B-E and C-F pairs, b/c distance between them is the smallest.
At the second step we have 3 chains and distance between A-D and B-E is the same as between B-E and C-F, so we connect let's say A-D with B-E and we left with two chains - A-D-E-B and C-F
At the third step there is the only way to connect them is through B and C, b/c B-C is shorter then B-F, A-F and A-C (remember we consider only endpoints of chains). So we have one chain now A-D-E-B-C-F.
At the last step we connect two endpoints (A and F) to get a cycle.
1) The description states that every vertex always belongs either to a "single-vertex chain" (i.e., it's alone) or it belongs to one other chain; a vertex can only belong to one chain. The algorithm says at each step you select every possible pair of two vertices which are each an endpoint of the respective chain they belong to, and don't already belong to the same chain. Sometimes they'll be singletons; sometimes one or both will already belong to a non-trivial chain, so you'll join two chains.
2) You repeat the loop n times, so that you eventually select every vertex; but yes, the actual iteration count isn't used for anything. All that matters is that you run the loop enough times.
Though question is already answered, here's a python implementation for closest pair heuristic. It starts with every point as a chain, then successively extending chains to build one long chain containing all points.
This algorithm does build a path yet it's not a sequence of robot arm movements for that arm starting point is unknown.
import matplotlib.pyplot as plot
import math
import random
def draw_arrow(axis, p1, p2, rad):
"""draw an arrow connecting point 1 to point 2"""
axis.annotate("",
xy=p2,
xytext=p1,
arrowprops=dict(arrowstyle="-", linewidth=0.8, connectionstyle="arc3,rad=" + str(rad)),)
def closest_pair(points):
distance = lambda c1p, c2p: math.hypot(c1p[0] - c2p[0], c1p[1] - c2p[1])
chains = [[points[i]] for i in range(len(points))]
edges = []
for i in range(len(points)-1):
dmin = float("inf") # infinitely big distance
# test each chain against each other chain
for chain1 in chains:
for chain2 in [item for item in chains if item is not chain1]:
# test each chain1 endpoint against each of chain2 endpoints
for c1ind in [0, len(chain1) - 1]:
for c2ind in [0, len(chain2) - 1]:
dist = distance(chain1[c1ind], chain2[c2ind])
if dist < dmin:
dmin = dist
# remember endpoints as closest pair
chain2link1, chain2link2 = chain1, chain2
point1, point2 = chain1[c1ind], chain2[c2ind]
# connect two closest points
edges.append((point1, point2))
chains.remove(chain2link1)
chains.remove(chain2link2)
if len(chain2link1) > 1:
chain2link1.remove(point1)
if len(chain2link2) > 1:
chain2link2.remove(point2)
linkedchain = chain2link1
linkedchain.extend(chain2link2)
chains.append(linkedchain)
# connect first endpoint to the last one
edges.append((chains[0][0], chains[0][len(chains[0])-1]))
return edges
data = [(0.3, 0.2), (0.3, 0.4), (0.501, 0.4), (0.501, 0.2), (0.702, 0.4), (0.702, 0.2)]
# random.seed()
# data = [(random.uniform(0.01, 0.99), 0.2) for i in range(60)]
edges = closest_pair(data)
# draw path
figure = plot.figure()
axis = figure.add_subplot(111)
plot.scatter([i[0] for i in data], [i[1] for i in data])
nedges = len(edges)
for i in range(nedges - 1):
draw_arrow(axis, edges[i][0], edges[i][1], 0)
# draw last - curved - edge
draw_arrow(axis, edges[nedges-1][0], edges[nedges-1][1], 0.3)
plot.show()
TLDR: Skip to the section "Clarified description of ClosestPair heuristic" below if already familiar with the question asked in this thread and the answers contributed thus far.
Remarks: I started the Algorithm Design Manual recently and the ClosestPair heuristic example bothered me because of what I felt like was a lack of clarity. It looks like others have felt similarly. Unfortunately, the answers provided on this thread didn't quite do it for me--I felt like they were all a bit too vague and hand-wavy for me. But the answers did help nudge me in the direction of what I feel is the correct interpretation of Skiena's.
Problem statement and background: From page 5 of the book for those who don't have it (3rd edition):
Skiena first details how the NearestNeighbor heuristic is incorrect, using the following image to help illustrate his case:
The figure on top illustrates a problem with the approach employed by the NearestNeighbor heuristic, with the bottom figure being the optimal solution. Clearly a different approach is needed to find this optimal solution. Cue the ClosestPair heuristic and the reason for this question.
Book description: The following description of the ClosestPair heuristic is outlined in the book:
Maybe what we need is a different approach for the instance that proved to be a bad instance for the nearest-neighbor heuristic. Always walking to the closest point is too restrictive, since that seems to trap us into making moves we didn't want.
A different idea might repeatedly connect the closest pair of endpoints whose connection will not create a problem, such as premature termination of the cycle. Each vertex begins as its own single vertex chain. After merging everything together, we will end up with a single chain containing all the points in it. Connecting the final two endpoints gives us a cycle. At any step during the execution of this closest-pair heuristic, we will have a set of single vertices and the end of vertex-disjoint chains available to merge. The pseudocode that implements this description appears below.
Clarified description of ClosestPair heuristic
It may help to first "zoom back" a bit and answer the basic question of what we are trying to find in graph theory terms:
What is the shortest closed trail?
That is, we want to find a sequence of edges (e_1, e_2, ..., e_{n-1}) for which there is a sequence of vertices (v_1, v_2, ..., v_n) where v_1 = v_n and all edges are distinct. The edges are weighted, where the weight for each edge is simply the distance between vertices that comprise the edge--we want to minimize the overall weight of whatever closed trails exist.
Practically speaking, the ClosestPair heuristic gives us one of these distinct edges for every iteration of the outer for loop in the pseudocode (lines 3-10), where the inner for loop (lines 5-9) ensures the distinct edge being selected at each step, (s_m, t_m), is comprised of vertices coming from the endpoints of distinct vertex chains; that is, s_m comes from the endpoint of one vertex chain and t_m from the endpoint of another distinct vertex chain. The inner for loop simply ensures we consider all such pairs, minimizing the distance between potential vertices in the process.
Note (ties in distance between vertices): One potential source of confusion is that no sort of "processing order" is specified in either for loop. How do we determine the order in which to compare endpoints and, furthermore, the vertices of those endpoints? It doesn't matter. The nature of the inner for loop makes it clear that, in the case of ties, the most recently encountered vertex pairing with minimal distance is chosen.
Good instance of ClosestPair heuristic
Recall what happened in the bad instance of applying the NearestNeighbor heuristic (observe the newly added vertex labels):
The total distance covered was absurd because we kept jumping back and forth over 0.
Now consider what happens when we use the ClosestPair heuristic. We have n = 7 vertices; hence, the pseudocode indicates that the outer for loop will be executed 6 times. As the book notes, each vertex begins as its own single vertex chain (i.e., each point is a singleton where a singleton is a chain with one endpoint). In our case, given the figure above, how many times will the inner for loop execute? Well, how many ways are there to choose a 2-element subset of an n-element set (i.e., the 2-element subsets represent potential vertex pairings)? There are n choose 2 such subsets:
Since n = 7 in our case, there's a total of 21 possible vertex pairings to investigate. The nature of the figure above makes it clear that (C, D) and (D, E) are the only possible outcomes from the first iteration since the smallest possible distance between vertices in the beginning is 1 and dist(C, D) = dist(D, E) = 1. Which vertices are actually connected to give the first edge, (C, D) or (D, E), is unclear since there is no processing order. Let's assume we encounter vertices D and E last, thus resulting in (D, E) as our first edge.
Now there are 5 more iterations to go and 6 vertex chains to consider: A, B, C, (D, E), F, G.
Note (each iteration eliminates a vertex chain): Each iteration of the outer for loop in the ClosestPair heuristic results in the elimination of a vertex chain. The outer for loop iterations continue until we are left with a single vertex chain comprised of all vertices, where the last step is to connect the two endpoints of this single vertex chain by an edge. More precisely, for a graph G comprised of n vertices, we start with n vertex chains (i.e., each vertex begins as its own single vertex chain). Each iteration of the outer for loop results in connecting two vertices of G in such a way that these vertices come from distinct vertex chains; that is, connecting these vertices results in merging two distinct vertex chains into one, thus decrementing by 1 the total number of vertex chains left to consider. Repeating such a process n - 1 times for a graph that has n vertices results in being left with n - (n - 1) = 1 vertex chain, a single chain containing all the points of G in it. Connecting the final two endpoints gives us a cycle.
One possible depiction of how each iteration looks is as follows:
ClosestPair outer for loop iterations
1: connect D to E # -> dist: 1, chains left (6): A, B, C, (D, E), F, G
2: connect D to C # -> dist: 1, chains left (5): A, B, (C, D, E), F, G
3: connect E to F # -> dist: 3, chains left (4): A, B, (C, D, E, F), G
4: connect C to B # -> dist: 4, chains left (3): A, (B, C, D, E, F), G
5: connect F to G # -> dist: 8, chains left (2): A, (B, C, D, E, F, G)
6: connect B to A # -> dist: 16, single chain: (A, B, C, D, E, F, G)
Final step: connect A and G
Hence, the ClosestPair heuristic does the right thing in this example where previously the NearestNeighbor heuristic did the wrong thing:
Bad instance of ClosestPair heuristic
Consider what the ClosestPair algorithm does on the point set in the figure below (it may help to first try imagining the point set without any edges connecting the vertices):
How can we connect the vertices using ClosestPair? We have n = 6 vertices; thus, the outer for loop will execute 6 - 1 = 5 times, where our first order of business is to investigate the distance between vertices of
total possible pairs. The figure above helps us see that dist(A, D) = dist(B, E) = dist(C, F) = 1 - ɛ are the only possible options in the first iteration since 1 - ɛ is the shortest distance between any two vertices. We arbitrarily choose (A, D) as the first pairing.
Now are there are 4 more iterations to go and 5 vertex chains to consider: (A, D), B, C, E, F. One possible depiction of how each iteration looks is as follows:
ClosestPair outer for loop iterations
1: connect A to D # --> dist: 1-ɛ, chains left (5): (A, D), B, C, E, F
2: connect B to E # --> dist: 1-ɛ, chains left (4): (A, D), (B, E), C, F
3: connect C to F # --> dist: 1-ɛ, chains left (3): (A, D), (B, E), (C, F)
4: connect D to E # --> dist: 1+ɛ, chains left (2): (A, D, E, B), (C, F)
5: connect B to C # --> dist: 1+ɛ, single chain: (A, D, E, B, C, F)
Final step: connect A and F
Note (correctly considering the endpoints to connect from distinct vertex chains): Iterations 1-3 depicted above are fairly uneventful in the sense that we have no other meaningful options to consider. Even once we have the distinct vertex chains (A, D), (B, E), and (C, F), the next choice is similarly uneventful and arbitrary. There are four possibilities given that the smallest possible distance between vertices on the fourth iteration is 1 + ɛ: (A, B), (D, E), (B, C), (E, F). The distance between vertices for all of the points above is 1 + ɛ. The choice of (D, E) is arbitrary. Any of the other three vertex pairings would have worked just as well. But notice what happens during iteration 5--our possible choices for vertex pairings have been tightly narrowed. Specifically, the vertex chains (A, D, E, B) and (C, F), which have endpoints (A, B) and (C, F), respectively, allow for only four possible vertex pairings: (A, C), (A, F), (B, C), (B, F). Even if it may seem obvious, it is worth explicitly noting that neither D nor E were viable vertex candidates above--neither vertex is included in the endpoint, (A, B), of the vertex chain of which they are vertices, namely (A, D, E, B). There is no arbitrary choice at this stage. There are no ties in the distance between vertices in the pairs above. The (B, C) pairing results in the smallest distance between vertices: 1 + ɛ. Once vertices B and C have been connected by an edge, all iterations have been completed and we are left with a single vertex chain: (A, D, E, B, C, F). Connecting A and F gives us a cycle and concludes the process.
The total distance traveled across (A, D, E, B, C, F) is as follows:
The distance above evaluates to 5 - ɛ + √(5ɛ^2 + 6ɛ + 5) as opposed to the total distance traveled by just going around the boundary (the right-hand figure in the image above where all edges are colored in red): 6 + 2ɛ. As ɛ -> 0, we see that 5 + √5 ≈ 7.24 > 6 where 6 was the necessary amount of travel. Hence, we end up traveling about
farther than is necessary by using the ClosestPair heuristic in this case.
When Traversing a Tree/Graph what is the difference between Breadth First and Depth first? Any coding or pseudocode examples would be great.
These two terms differentiate between two different ways of walking a tree.
It is probably easiest just to exhibit the difference. Consider the tree:
A
/ \
B C
/ / \
D E F
A depth first traversal would visit the nodes in this order
A, B, D, C, E, F
Notice that you go all the way down one leg before moving on.
A breadth first traversal would visit the node in this order
A, B, C, D, E, F
Here we work all the way across each level before going down.
(Note that there is some ambiguity in the traversal orders, and I've cheated to maintain the "reading" order at each level of the tree. In either case I could get to B before or after C, and likewise I could get to E before or after F. This may or may not matter, depends on you application...)
Both kinds of traversal can be achieved with the pseudocode:
Store the root node in Container
While (there are nodes in Container)
N = Get the "next" node from Container
Store all the children of N in Container
Do some work on N
The difference between the two traversal orders lies in the choice of Container.
For depth first use a stack. (The recursive implementation uses the call-stack...)
For breadth-first use a queue.
The recursive implementation looks like
ProcessNode(Node)
Work on the payload Node
Foreach child of Node
ProcessNode(child)
/* Alternate time to work on the payload Node (see below) */
The recursion ends when you reach a node that has no children, so it is guaranteed to end for
finite, acyclic graphs.
At this point, I've still cheated a little. With a little cleverness you can also work-on the nodes in this order:
D, B, E, F, C, A
which is a variation of depth-first, where I don't do the work at each node until I'm walking back up the tree. I have however visited the higher nodes on the way down to find their children.
This traversal is fairly natural in the recursive implementation (use the "Alternate time" line above instead of the first "Work" line), and not too hard if you use a explicit stack, but I'll leave it as an exercise.
Understanding the terms:
This picture should give you the idea about the context in which the words breadth and depth are used.
Depth-First Search:
Depth-first search algorithm acts as if it wants to get as far away
from the starting point as quickly as possible.
It generally uses a Stack to remember where it should go when it reaches a dead end.
Rules to follow: Push first vertex A on to the Stack
If possible, visit an adjacent unvisited vertex, mark it as visited, and push it on the stack.
If you can’t follow Rule 1, then, if possible, pop a vertex off the stack.
If you can’t follow Rule 1 or Rule 2, you’re done.
Java code:
public void searchDepthFirst() {
// Begin at vertex 0 (A)
vertexList[0].wasVisited = true;
displayVertex(0);
stack.push(0);
while (!stack.isEmpty()) {
int adjacentVertex = getAdjacentUnvisitedVertex(stack.peek());
// If no such vertex
if (adjacentVertex == -1) {
stack.pop();
} else {
vertexList[adjacentVertex].wasVisited = true;
// Do something
stack.push(adjacentVertex);
}
}
// Stack is empty, so we're done, reset flags
for (int j = 0; j < nVerts; j++)
vertexList[j].wasVisited = false;
}
Applications: Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.
Breadth-First Search:
The breadth-first search algorithm likes to stay as close as possible
to the starting point.
This kind of search is generally implemented using a Queue.
Rules to follow: Make starting Vertex A the current vertex
Visit the next unvisited vertex (if there is one) that’s adjacent to the current vertex, mark it, and insert it into the queue.
If you can’t carry out Rule 1 because there are no more unvisited vertices, remove a vertex from the queue (if possible) and make it the current vertex.
If you can’t carry out Rule 2 because the queue is empty, you’re done.
Java code:
public void searchBreadthFirst() {
vertexList[0].wasVisited = true;
displayVertex(0);
queue.insert(0);
int v2;
while (!queue.isEmpty()) {
int v1 = queue.remove();
// Until it has no unvisited neighbors, get one
while ((v2 = getAdjUnvisitedVertex(v1)) != -1) {
vertexList[v2].wasVisited = true;
// Do something
queue.insert(v2);
}
}
// Queue is empty, so we're done, reset flags
for (int j = 0; j < nVerts; j++)
vertexList[j].wasVisited = false;
}
Applications: Breadth-first search first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex.
Hopefully that should be enough for understanding the Breadth-First and Depth-First searches. For further reading I would recommend the Graphs chapter from an excellent data structures book by Robert Lafore.
Given this binary tree:
Breadth First Traversal:
Traverse across each level from left to right.
"I'm G, my kids are D and I, my grandkids are B, E, H and K, their grandkids are A, C, F"
- Level 1: G
- Level 2: D, I
- Level 3: B, E, H, K
- Level 4: A, C, F
Order Searched: G, D, I, B, E, H, K, A, C, F
Depth First Traversal:
Traversal is not done ACROSS entire levels at a time. Instead, traversal dives into the DEPTH (from root to leaf) of the tree first. However, it's a bit more complex than simply up and down.
There are three methods:
1) PREORDER: ROOT, LEFT, RIGHT.
You need to think of this as a recursive process:
Grab the Root. (G)
Then Check the Left. (It's a tree)
Grab the Root of the Left. (D)
Then Check the Left of D. (It's a tree)
Grab the Root of the Left (B)
Then Check the Left of B. (A)
Check the Right of B. (C, and it's a leaf node. Finish B tree. Continue D tree)
Check the Right of D. (It's a tree)
Grab the Root. (E)
Check the Left of E. (Nothing)
Check the Right of E. (F, Finish D Tree. Move back to G Tree)
Check the Right of G. (It's a tree)
Grab the Root of I Tree. (I)
Check the Left. (H, it's a leaf.)
Check the Right. (K, it's a leaf. Finish G tree)
DONE: G, D, B, A, C, E, F, I, H, K
2) INORDER: LEFT, ROOT, RIGHT
Where the root is "in" or between the left and right child node.
Check the Left of the G Tree. (It's a D Tree)
Check the Left of the D Tree. (It's a B Tree)
Check the Left of the B Tree. (A)
Check the Root of the B Tree (B)
Check the Right of the B Tree (C, finished B Tree!)
Check the Right of the D Tree (It's a E Tree)
Check the Left of the E Tree. (Nothing)
Check the Right of the E Tree. (F, it's a leaf. Finish E Tree. Finish D Tree)...
Onwards until...
DONE: A, B, C, D, E, F, G, H, I, K
3) POSTORDER:
LEFT, RIGHT, ROOT
DONE: A, C, B, F, E, D, H, K, I, G
Usage (aka, why do we care):
I really enjoyed this simple Quora explanation of the Depth First Traversal methods and how they are commonly used:
"In-Order Traversal will print values [in order for the BST (binary search tree)]"
"Pre-order traversal is used to create a copy of the [binary search tree]."
"Postorder traversal is used to delete the [binary search tree]."
https://www.quora.com/What-is-the-use-of-pre-order-and-post-order-traversal-of-binary-trees-in-computing
I think it would be interesting to write both of them in a way that only by switching some lines of code would give you one algorithm or the other, so that you will see that your dillema is not so strong as it seems to be at first.
I personally like the interpretation of BFS as flooding a landscape: the low altitude areas will be flooded first, and only then the high altitude areas would follow. If you imagine the landscape altitudes as isolines as we see in geography books, its easy to see that BFS fills all area under the same isoline at the same time, just as this would be with physics. Thus, interpreting altitudes as distance or scaled cost gives a pretty intuitive idea of the algorithm.
With this in mind, you can easily adapt the idea behind breadth first search to find the minimum spanning tree easily, shortest path, and also many other minimization algorithms.
I didnt see any intuitive interpretation of DFS yet (only the standard one about the maze, but it isnt as powerful as the BFS one and flooding), so for me it seems that BFS seems to correlate better with physical phenomena as described above, while DFS correlates better with choices dillema on rational systems (ie people or computers deciding which move to make on a chess game or going out of a maze).
So, for me the difference between lies on which natural phenomenon best matches their propagation model (transversing) in real life.