Proof by contradiction for edge in tree - algorithm

I have a problem from my textbook that goes like the following; Assume that i have a shortest path matrix S that could look like the following:
And a tree T that consist of the shortest paths constructed from the shortest path matrix S (like a minimum spanning tree).
The tree has the following properties;
n - 1 edges, all nodes are connected with each other.
The task is then to prove by contradiction, that if the entry S_{ij} has the minimum value, then that entry must be an edge in the tree T. I don't quite understand what there is to proof. The way i see it is that if we assume that T does not contain the smallest element from S, then we will have a contradiction at the end, since there will be a path that is larger than the one chosen with the smallest element. This doesn't seem like much of a proof to me, and even if it is, i don't see how I could generalize the proof.

Since T is a tree, there exists only one path between every pair of nodes. If nodes i and j are not connected by an edge, than path connecting them has to have at least one more node, call it k. Than for S_{ij} (length of path between i and j holds):
S_{ij} = S_{ik} + S_{kj} >= S_{ij} + S_{ij} = 2 * S_{ij}
Which is a contradiction.

Related

How to find longest increasing subsequence among all simple paths of an unweighted general graph?

Let G = (V, E) be an unweighted general graph in which every vertex v has a weight w(v).
An increasing subsequence of a simple path p in G is a sequence of vertices of p in which the weights of all vertices along this sequence increase. The simple paths can be closed paths.
A longest increasing subsequence (LIS) of a simple path p is an increasing subsequence of p such that has maximum number of vertices.
The question is that, how to find a longest increasing subsequence among all simple paths of G?
Note that the graph is undirected, therefore it is not a directed acyclic graph (DAG).
Here's a very fast algorithm for solving this problem. The longest increasing subsequence in the graph is a subsequence of a path in the graph, and each path must belong purely to a single connected component. So if we can solve this problem on connected components, we can solve it for the overall graph by finding the best solution across all connected components.
Next, think about the case where you're solving this problem for a connected graph G. In that case, the longest increasing subsequence you could find would be formed by sorting the nodes by their weight, then traversing from the lowest-weight node to the second, then to the third, then to the fourth, etc. If there are any ties or duplicates, you can just skip them. In other words, you can solve this problem by
Sorting all the nodes by weight,
Discarding all but one node of each weight, and
Forming an LIS by visiting each node in sequence.
This leads to a very fast algorithm for the overall problem. In time O(m + n), find all connected components. For each connected component, use the preceding algorithm in time O(Sort(n)), where Sort(n) is the time required to sort n elements (which could be Θ(n log n) if you use heapsort, Θ(n + U) for bucket sort, Θ(n lg U) for radix sort, etc.). Then, return the longest sequence you find.
Overall, the runtime is O(m + n + &Sort(n)), which beats my previous approach and should be a lot easier to code up.
I had originally posted this answer, which I'll leave up because I think it's interesting:
Imagine that you pick a simple path out of the graph G and look at the longest increasing subsequence of that path. Although the path walks all over the graph and might have lots of intermediary nodes, the longest increasing subsequence of that path really only cares about
the first node on the path that's also a part of the LIS, and
from that point, the next-largest value in the path.
As a result, we can think about forming an LIS like this. Start at any node in the graph. Now, travel to any node in the graph that (1) has a higher value than the current node and (2) is reachable from the current node, then repeat this process as many times as desired. The goal is to do so in a way that gives the longest possible sequence of increasing values.
We can model this process as finding a longest path in a DAG. Each node in the DAG represents a node in the original graph G, and there's an edge from a node u to a node v if
there's a path from u to v in G, and
w(u) < w(v).
This is a DAG because of that second condition, even though the original graph isn't a DAG.
So we can solve this overall problem in a two-step process. First, build the DAG described above. To do so:
Find the connected components of the original graph G and label each node with its connected component number. Time: O(m + n).
For each node u in G, construct a corresponding node u' in a new DAG D. Time: O(n).
For each node u in G, and for each node v in G that's in the same SCC as u, if w(u) < w(v), add an edge from u' to v'. Time: Θ(n2) in the worst-case, Θ(n) in the best case.
Find the longest path in D. This path corresponds to the longest increasing subsequence of any simple path in G. Time: O(m + n).
Overall runtime: Θ(n2) in the worst-case, Θ(m + n) in the best-case.

undirected graphs algorithms

Suppose that we have an n-node, m-edge undirected graph G = (V; E) and we have two distinct nodes
called s and t. Suppose that the distance between s and t is strictly greater than n/2
. Show that there
is a node v which is dierent from s and t such that every path from s to t goes through v.
Give an algorithm with running time O(n + m) to nd such a vertex. You do not have to
prove that your algorithm is correct but you must give a proof that a vertex like v exists.
I can not figure out an exact answer to this past paper question, help me out!
Suppose there are two paths between s and t, that don't share a node. Since distance between s and t is > n/2, than each path has >= n/2 nodes between s and t. That means that graph has >= n+2 nodes, what is a contradiction.
For algorithm it is enough to find any path and than see where sub-graph that is connected to one side (s) without using path nodes finish. In more details:
if s is connected only to one node than that node we are looking for.
if not, make BFS from s
find path s-t
find nodes connected to s without using edges going from nodes of path s-t
last node on path s-t that is in connected part is node we are looking for.

How to prove that finding a successor n-1 times in the BST from the minimum node is O(n)?

How to prove that finding a successor n-1 times in the BST from the minimum node is O(n)?
The questions is that we can create sorted order by
1) let the node = minimum node of the BST.
2) From that node, we recursively call find a successor.
I was told that the result is O(n) but I do not understand and do not know how to prove it.
Should not it be O(n*log n) instead? Because for the step 1, it is O(log n), for the step 2, it is also O(log n) but it is called n-1 times. Therefore, it will be O(n*log n)
Please clarify my doubt. Thank you! :)
You are correct that any individual operation might take O(log n) time, so if you perform those operations n times, you should get a runtime of O(n log n). This bound is correct, but it's not tight. The actual runtime is Θ(n).
One way to see this is to look at any individual edge in the tree. How many times will you visit each edge if you start at the leftmost node and repeatedly perform a successor query? If you look closely at how the operations work, you'll discover that every edge is visited exactly twice: once downward and once upward. Since all the work done is done traversing up and down edges, this means that the total amount of work done is proportional to twice the number of edges. In any tree, the number of edges is the number of nodes minus one, and so the total work done is Θ(n).
To formalize this as a proof, try showing that you never descend down the same edge twice and that when you ascend up an edge, you never descend down that edge again. Once you've done this, the conclusion that the runtime is Θ(n) follows from the above logic.
Hope this helps!
I wanted to post this as a comment on templatetypedef's answer, but it's too long.
His answer is right in that the easiest way to see that this is linear is because every edge is visited exactly twice, and the number of edges in a tree is always one less than the number of nodes (because every node has one parent, except the root!).
The issue is that the way he phrases the formal proof uses words that seem to imply contradiction as the way to go. In general, mathematicians frown on using contradiction because it often produces proofs with superfluous content. For instance:
Proof that 2 + 2 != 5:
Assume for contradiction that 2 + 2 = 5 (<- Remove this line)
Well 2 + 2 = 4
And 4 != 5
Contradiction! (<- Remove this line)
Contradiction tends to be verbose, and sometimes it can even obfuscate the idea behind the proof! There are times when contradiction seems pretty much necessary, but it's relatively rare and that's a separate discussion.
In this case, I don't see a proof by contradiction being any easier than a direct proof. On the other hand, regardless of proof technique, this proof is pretty ugly to do formally. Here's an attempt:
1) The succ(n) algorithm traverses one of two paths
In the first case every edge is visited on the simple path from a node to the leftmost node of its right subtree
In the other case, the node n has no right child in which case we go up its ancestors p_1, p_2, p_3, ..., p_k such that p_(k-1) is the first ancestor which is the left child of it's parent. All of those edges are visited in that simple path
We want to show that an arbitrary edge is traversed in precisely two succ() calls, once for the first case of succ() and once for the second case of succ(). Well, this is true for every edge other than the rightmost branch, but you can handle those edge cases separately. Alternatively we could prove the simpler argument where we return to the root after visiting the last element
This is two-fold because for a given edge e we have to find the n1 and n2 such that succ(n1) traverses e and succ(n2) also traverses e, as well as prove that every other succ() generates a path which does not include e.
2) First we actually prove that for each type of path that succ() visits, no two paths overlap (i.e. if succ(n) and succ(n') both traverse paths of the same type, those paths share no edges)
In the first case, the simple path is precisely defined as follows. Start at node n and go one edge to the right to r. Then traverse the left branch of the subtree rooted at r. Now consider any other such path that starts at some other node n' (note, we don't assume that n != n'). It must go right one node to r'. Then it traverses the leftmost branch of the subtree rooted at r'. If the paths overlap then pick one of the edges that overlap. If it's (n,r) = (n',r') then we have n = n' and so it's the same path. If it's some e = e' in both leftmost branches then you can show, again, that n = n' (you can trace the leftmost branches and show that every edge is the same, then finally reach the conclusion that r = r' => n = n' because for a tree the parent is unique. You'll see this tracing argument below). Thus we know that for any n and n', if their paths overlap, they are actually the same node! The contrapositive says this: if they are different nodes, then their paths don't overlap. That's exactly what we want (and the contrapositive is always equally true to the original statement).
In the second case we define the simple path starting at node n and go up the ancestors p_1, p_2, ..., p_k = g until we reach the first node p_k such that p_(k-1) is to the left of p_k. Consider some other path of the same type that starts at node n' where n != n'. Similarly it visits p_1', p_2', ..., p_k' = g'. Because it's a tree, none of those ancestors are the same as the first set. Because none of the nodes on the two paths are the same, none of the edges can be the same and hence succ(n) and succ(n') do not traverse any of the same edges
3) Now we just need to show that at least one path of each type exists for a given edge. Well take any such edge e = (c,p) (note here I am ignoring the special edges on the rightmost branch which are technically only visited once and I am also ignoring the special edges on the leftmost branch which are technically visited once by find_min() and then once by succ() calls)
If it's from a left child c to its parent p then succ(c) will cover the second type of path. To find the other path, keep going up p's ancestors p_1, p_2, ..., p_k such that p_(k-1) is to the right of p_k. succ(p_k) will traverse a path containing e by definition (since e is on the leftmost branch of the subtree of p_(k-1) which is p_k's right child).
A similar argument holds for symmetric case when c is the right child of p
To summarize the proof we've shown that succ() generates two types of path. For each type of path, all of the paths of those types do not overlap. Furthermore, for any edge we have at least one of each of those types of paths. Since we call succ() on every node we can finally conclude that each edge is traversed twice (and hence the algorithm is Theta(n)).
Despite how long this proof was, it isn't actually complete (even ignoring the points when I explicitly said I was skipping details!). There are cases where I said something exists without proving it exists. You can figure out those details if you want and it is actually really satisfying to get it completely right (in my opinion at least. Maybe when you're a genius you'll find it tedious, heh)
Hope this helped. Let me know if you want me to clarify some steps

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Using a minimum spanning tree algorithm

Suppose I have a weighted non-directed graph G = (V,E). Each vertex has a list of elements.
We start in a vertex root and start looking for all occurances of elements with a value x. We wish to travel the least amount of distance (in terms of edge weight) to uncover all occurances of elements with value x.
The way I think of it, a MST will contain all vertices (and hence all vertices that satisfy our condition). Therefore the algorithm to uncover all occurances can just be done by finding the shortest path from root to all other vertices (this will be done on the MST of course).
Edit :
As Louis pointed out, the MST will not work in all cases if the root is chosen arbitrarily. However, to make things clear, the root is part of the input and therefore there will be one and only one MST possible (given that the edges have distinct weights). This spanning tree will indeed have all minimum-cost paths to all other vertices in the graph starting from the root.
I don't think this will work. Consider the following example:
x
|
3
|
y--3--root
| /
5 3
| /
| /
x
The minimum spanning tree contains all three edges with weight 3, but this is clearly not the optimum solution.
If I understand the problem correctly, you want to find the minimum-weight tree in the graph which includes all vertices labeled x. (That is, the correct answer would have total weight 8, and would be the two edges drawn vertically in this drawing.) But this does not include your arbitrarily selected root at all.
I am pretty confident that the following variation on Prim's algorithm would work. Not sure if it's optimal, though.
Let's say the label we are looking for is called L.
Use an all-pairs shortest path algorithm to compute d(v, w) for all v, w.
Pick some node labeled L; call this the root. (We can be sure that this will be in the result tree, since we are including all nodes labeled L.)
Initialize a priority queue with the root initialized to 0. (The priority queue will consist of vertices labeled L, and their minimum distance from any node in the tree, including vertices not labeled L.)
While the priority queue is nonempty, do the following:
Pick out the top vertex in the queue; call it v, and its distance from the tree d.
For each vertex w on the path from v to the tree, v inclusive, find the nearest L-labeled node x to w, and add x to the priority queue, or update its priority. Add w to the tree.
The answer is no, if I'm understanding correctly. Finding the minimum spanning tree will contain all vertices V, but you only want to find the vertices with value x. Thus, your MST result may have unneeded vertices adding extra path length and therefore be sub-optimal.
An example has been given where the MST M1 from Root differs from an MST M2 containing all x nodes but not containing Root.
Here's an example where Root is in both MST's: Let graph G contain nodes R,S,T,U,V (R=Root), and a clockwise path R-S-T-U-V-R, with edge weights 1,1,3,2,2 going clockwise, and x at R, S, T, U. The first MST, M1, will have subtrees S-T and V-U below R, with cost 6 = 2+4, and cost-3 edge T-U not included in M1. But M2 has subtree S-T-U (only) below R, at cost 5.
Negative. If the idea is to find for every node that contains 'x' a separate path from root to it, and minimize the total cost of the paths, then you can just use simple shortest-path calculation separately for every node starting from the root, and put the paths together.
Some of those shortest paths will not be in the minimum spanning tree, so if this is your goal, the MST solution does not work. MST optimizes the cost of the tree, not the sum of costs of paths from root to the nodes.
If your idea is to find one path that starts from root and traverses through all nodes that contain 'x', then this is the traveling salesman problem and it is an NP-complete optimization problem, i.e. very hard.

Resources