Does a subsequence in a tree certificate guarantees it contains given tree? - algorithm

I am using an algorithm for tree certificates described for example here (p. 24-29).
Let's say I have two trees: A and B, and each tree has it's certificate produced by the algorithm above (C1 and C2).
Is it true, that if C1 contains C2 (exact sequence anywhere), it means A contains B as a subtree (B can be basically concentrated and considered as a leaf node of A)? If not, could you state a counter-example?
--edit--
Algorithm: (please take a look at the linked document for examples):
Label all vertices with string 01
While there are more than 2 vertices in G:
for each non-leaf x do:
let Y be the set of labels of the leaves adjacent to X and the label of x with initial 0 and trailing 1 deleted from x.
Replace the label of x with the concentration of the labels in Y, sorted in increasing lexicographic orher, with a 0 prepended and a 1 appended.
Remove all leaves adjacent to x.
If there is only one vertex x left, report x's label as the certificate.
If there are 2 vertices x and y left, concentrate x and y in increasing lexicographic order, and report it as the cerfificate.

Yes, it is true.
Assuming the certificate is correct, there is no possibility a certificate would contain another certificate and it wouldn't be it's subtree.

Related

Deletion of the leaf node from B-tree

What are the rules of deleting nodes from a leaf nodes in a B tree. I have given an example below. I need to delete keys, J,K,U from leaf nodes. the 't' of the B tree is 3. so the minimum number of keys in a node should be 2.
J can be deleted without any issue.
But when J is deleted, the remaining would be K,L. Next when deleting K, since the node contains 2 nodes, K cannot be deleted directly.
Since its sibling node, which is N,O also contains its minimum nodes what should I perform here? Is it a merge?
How can I delete K and also U.
Please help.
I referred this book Introduction-to-algorithms-3rd-edition by Thomas H Cormen and he explained it very well.
Here are the 3 steps that include all the cases.Hope it helps.
If the key k is in node x and x is a leaf, delete the key k from x.
If the key k is in node x and x is an internal node, do the following:
a. If the child y that precedes k in node x has at least t keys, then find the predecessor k' of k in the subtree rooted at y. Recursively delete k0,and
replace k by k' in x. (We can find k0 and delete it in a single downward
pass.)
b. If y has fewer than t keys, then, symmetrically, examine the child z that
follows k in node x. If z has at least t keys, then find the successor k' of k in the subtree rooted at z. Recursively delete k' ,and replace k by k' in x. (We
can find k' and delete it in a single downward pass.)
c. Otherwise, if both y and z have only t-1 keys, merge k and all of z into y,
so that x loses both k and the pointer to z, and y now contains 2t-1 keys.
Then free z and recursively delete k from y.
If the key k is not present in internal node x, determine the root x.ci of the appropriate subtree that must contain k, if k is in the tree at all. If x.ci has only t-1 keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least t keys. Then finish by recursing on the appropriate child of x.
a. If x.ci has only t-1 keys but has an immediate sibling with at least t keys, give x.ci an extra key by moving a key from x down into x.ci, moving a
key from x.ci’s immediate left or right sibling up into x, and moving the
appropriate child pointer from the sibling into x.ci.
b. If x.ci and both of x.ci’s immediate siblings have t-1 keys, merge x.ci with one sibling, which involves moving a key from x down into the new
merged node to become the median key for that node.

Encoding directed graph as numbers

Let's say that I have a directed graph, with a single root and without cycles. I would like to add a type on each node (for example as an integer with some custom ordering) with the following property:
if Node1.type <= Node2.type then there exists a path from Node1 to Node2
Note that topological sorting actually satisfies the reversed property:
if there exists a path from Node1 to Node2 then Node1.type <= Node2.type
so it cannot be used here.
Now note that integers with natural ordering cannot be used here because every 2 integers can be compared, i.e. the ordering of integers is linear while the tree does not have to be.
So here's an example. Assume that the graph has 4 nodes A, B, C, D and 4 arrows:
A->B, A->C, B->D, C->D
So it's a diamond. Now we can put
A.type = 00
B.type = 01
C.type = 10
D.type = 11
where on the right side we have integers in binary format. The comparison is defined bitwise:
(X <= Y) if and only if (n-th bit of X <= n-th bit of Y for all n)
So I guess such ordering could be used, the question is how to construct values from a given graph? I'm not even sure if the solution always exists. Any hints?
UPDATE: Since there is some misunderstanding about terminology I'm using let me be more explicite: I'm interested in directed acyclic graph such that there is exactly one node without predecessors (a.k.a. the root) and there's at most one arrow between any two nodes. The diamond would be an example. It does not have to have one leaf (i.e. the node without successors). Each node might have multiple predecessors and multiple successors. You might say that this is a partially ordered set with a smallest element (i.e. a unique globally minimal element).
You call the relation <=, but it's necessarily not complete (that is: it may be that for a given pair a and b, neither a <= b nor b <= a).
Here's one idea for how to define it.
If your nodes are numbered 0, 1..., N-1, then you can define type like this:
type(i) = (1 << i) + sum(1 << (N + j), for j such that Path(i, j))
And define <= like this:
type1 <= type2 if (type1 >> N) & type2 != 0
That is, type(i) encodes the value of i in the lowest N bits, and the set of all reachable nodes in the highest N bits. The <= relation looks for the target node in the encoded set of reachable nodes.
This definition works whether or not there's cycles in the graph, and in fact just encodes an arbitrary relation on your set of nodes.
You could make the definition a little more efficient by using ceil(log2(N)) bits to encode the node number (for a total of N + ceil(log2(N)) bits per type).
For any DAG, you can define x <= y as "there's a path from x to y". This relation is a partial order. I take it that the question is how to represent this relation efficiently.
For each vertex X, define ¡X to be the set of vertices reachable from X (including X itself). The two statements
¡X is a subset of ¡Y
X is reachable from Y
are equivalent.
Encode these sets as bitsets (N-bit binary numbers), and you are set.
The question said (and continues to say) that the input is a tree, but a later edit contradicted this with an example of a diamond graph. In such non-tree cases, my algorithm below won't apply.
The existing answers work for general relations on general directed graphs, which inflates their representation sizes to O(n) bits for n vertices. Since you have a tree, a shorter O(log n)-bit representation is possible.
In a tree directed away from the root, for any two vertices u and v, the sets of leaves L(u) and L(v) reachable from u and v, respectively, must either be disjoint, or one must be a subset of the other. If they are disjoint, then u is not reachable from v (and vice versa); if one is a proper subset of the other, the one with the smaller set is reachable from the other (and in this case, the one with the smaller set will necessarily have strictly greater depth). If L(u) = L(v), then u is reachable from v if and only if depth(v) < depth(u), where depth(u) is the number of edges on the path from the root to u. (In particular, if L(u) = L(v) and depth(u) = depth(v), then u = v.)
We can encode this relationship concisely by noticing that all leaves reachable from a given vertex v occupy a contiguous segment of the leaves output by an inorder traversal of the tree. For any given vertex v, this set of leaves can therefore be represented by a pair of integers (first, last), with first identifying the first leaf (in inorder traversal order) and last the last. The test for whether a path exists from i to j is then very simple -- in pseudo-C++:
bool doesPathExist(int i, int j) {
return x[i].first <= x[j].first && x[i].last >= x[j].last && depth[i] <= depth[j];
}
Note that if every non-leaf vertex in the tree has at least 2 children, then you don't need to bother with depths, since L(u) = L(v) implies u = v in this case. (My original version of the post made this assumption; I've now fixed it to work even when this is not the case.)

Finding the biggest subset of elements that does not correlate

I have a set of integers and I want to find the largest subset in which the elements does not correlate with each other in a specific way. For example a subset in which if any of the elements is multiplied by 13 the result is not in the subset.
My first thought is to iterate through all the possible subsets, filter out these that don't meet the condition and then find the largest one, but this is too slow and I don't know how to generate all possible subsets.
I'll be answering this one (from comments). In general there's no good solution for any "correlation"
Relationship is the following : if you multiple any of the elements in the subset by some number the resulting number does not have to be in the subset.
If your number is m
You can generate all chains x, x*m, x*m*m, ...., such that all number in chain are in the set, x/m is not
Remove every second element, i.e x*m^2, x*m^4 from original set. Elements left are your target set.
A better way is to build a graph and find the vertexes with most edges and remove them until you get rid of all edges. Complexity is about O(N^2).
Here is a detailed algorithm:
for each possible pair (x, y) from the source set
begin
if x = y * 13 or y = x * 13 then make edge between x and y
end
while graph has edges
begin
let V = find: a vertex with maximum count of edges (it can be 1 or 2)
remove V from the graph
end
result: the remaining vertexes in the graph

Topological Sorting of a directed acyclic graph

How would you output all the possible topological sorts for a directed acyclic graph? For example, given a graph where V points to W and X, W points to Y and Z, and X points to Z:
V --> W --> Y
W --> Z
V --> X --> Z
How do you topologically sort this graph to produce all possible results? I was able to use a breadth-first-search to get V, W, X, Y, Z and a depth-first search to get V, W, Y, Z, X. But wasn't able to output any other sorts.
An algorithm for generating all topological sorts for a given DAG (aka generating all linear extensions of a partial order) is given in the paper "Generating Linear Extensions Fast" by Pruesse and Ruskey. The algorithm has an amortized running time that is linear in the output (e.g.: if it outputs M topological sorts, it runs in time O(M)).
Note that in general you can't really have anything that has a runtime that's efficient with respect to the size of the input since the size of the output can be exponentially larger than the input. For example, a completely disconnected DAG of N nodes has N! possible topological sorts.
It might be possible to count the number of orderings faster, but the only way to actually generate all orderings that I can think of is with a full brute-force recursion. (I say "brute force", but this is still much better than the brutest-possible brute force approach of testing every possible permutation :) )
Basically, at every step there is a set S of vertices remaining (i.e. which have not been added to the order yet), and a subset X of these can be safely added in the next step. This subset X is exactly the set of vertices that have no in-edges from vertices in S.
For a given partial solution L consisting of some number of vertices that are already in the order, the set S of remaining vertices, and the set X of vertices in S that have no in-edges from other vertices in S, the call Generate(L, X, S) will generate all valid topological orders beginning with L.
Generate(L, X, S):
If X is empty:
Either L is already a complete solution, in which case it contains all n vertices and S is also empty, or the original graph contains a cycle.
If S is empty:
Output L as a solution.
Otherwise:
Report that a cycle exists. (In fact, all vertices in S participate in some cycle, though there may be more than one.)
Otherwise:
For each x in X:
Let L' be L with x added to the end.
Let X' be X\{x} plus any vertices whose only in-edge among vertices in S came from x.
Let S' = S\{x}.
Generate(L', X', S')
To kick things off, find the set X of all vertices having no in-edges and call Generate((), X, V). Because every x chosen in the "For each" loop is different, every partial solution L' generated by the iterations of this loop must also be distinct, so no solution is generated more than once by any call to Generate(), including the top-level call.
In practice, forming X' can be done more efficiently than the above pseudocode suggests: When we choose x, we can delete all out-edges from x, but also add them to a temporary list of edges, and by tracking the total number of in-edges for each vertex (e.g. in an array indexed by vertex number) we can efficiently detect which vertices now have 0 in-edges and should thus be added to X'. Then at the end of the loop iteration, all the edges that we deleted can be restored from the temporary list.
So this approach is flawed! Unsure if it can be salvaged, I'll leave it a little while, if anyone can spot how to fix it, either grab what you can and post a new answer or edit mine.
Specifically, I used the below algorithm on the example from the comment and it will not output the example given, so it is clearly flawed.
The way I've learned to do a topological sort is the following:
Create a list of all the elements with no arrows pointing into it
Create a dictionary of element -> number, where element here is any element in the original collection that has an arrow into it, and the number is how many elements point to it.
Create a dictionary of element -> list, where element here is any element in the original collection that has an arrow out of it, and the list is all the elements those arrows point to
In your example, the two dictionaries and the list would be like this:
D1 D2 List
W: 1 V: W, X V
Y: 1 W: Y, Z
Z: 2 X: Z
X: 1
Then, start a loop where on each iteration you do the following:
Output all elements of the list, these currently have no arrows pointing into them. Make a temporary copy of the list, and clear the list, preparing it for the following iteration
Loop through the temporary copy, and find each element (if it exists) in the dictionary that is element -> list
For each element in those lists, decrement the corresponding number in the element -> number dictionary by 1 (removing 1 arrow). Once a number for an element here reaches 0, add that element to the list (it has no arrows left)
If the list is non-empty, redo the iteration loop
If you reach this point, and the dictionary with element -> number still has any elements left in it with a number above 0 (if you want to, you can remove the elements as you go in the above iteration once their numbers reach zero to make this part easier), then you have a cycle, since the above loop should not terminate until all arrows have been removed.
For your example, each iteration would output the following:
V
W, X (2nd iteration output both W and X)
Y, Z
If you want to know how I arrived at this solution, simply go through my iteration description step by step using the above dictionaries and list as the starting point.
Now, to specifically answer your question, how to output all combinations. The only places where "combinations" comes into play is per iteration. Basically, all the elements that you output in the first step of the iteration (the ones you made a temporary copy of) are considered "equivalent" and any internal ordering between these would have no impact on the topological sort.
So, do this:
In the first point in the iteration, place those elements into a list, and add that to another list, giving you a list of lists
This lists of lists will now contain each iteration as one element, and one element will be yet another list with the elements output in that iteration
Now, combine all permutations of the first list with all the permutations of the second list with all the permutations of the third list, and so on
This means taking this output:
V
W, X
Y, Z
Which gives you 1 * 2 * 2 = 4 permutations in total and you would combine all permutations of the 1st iteration (which is 1) with all the permutations of the 2nd iteration (which is 2, W, X and X, W) with all the permutations of the 3rd iteration (which is 2, Y, Z and Z, Y).
The final list of permutations that are valid topological sorts would be this:
V, W, X, Y, Z
V, X, W, Y, Z
V, W, X, Z, Y
V, X, W, Z, Y
Here is the example from the comment:
A and B with no in-edges. Both A and B have an edge to C, but only A has an edge to D. Neither C nor D has any out-edges.
Which gives:
A --> C
A --> D
B --> C
Dictionaries and list:
D1 D2 List
C: 2 A: C, D A
D: 1 B: C B
Iterations would output:
A, B
D, C
All permutations (2 * 2 = 4):
A, B, D, C
A, B, C, D
B, A, D, C
B, A, C, D

An algorithm to get all connected subgraphs from graph, is it correct?

I try to find an quick algorithm to obtain all connected subgraphs form an undirected graph with subgraphs length restricted. Simple methods, such as BFS or DFS from every vertex generate huge amount of equals subgraphs, so in every algorithm iteration we have to prune subgraphs set. I have found in russian mathematical forum an algorithm:
Procedure F(X,Y)
//X set of included vertices
//Y set of forbidden vertices to construct new subgraph
1.if |X|=k, then return;
2.construct a set T[X] of vertices that adjacent to vertices from X (If X is a empty set, than T[X]=V), but not belong to the sets X,Y;
3.Y1=Y;
4.Foreach v from T[X] do:
__4.1.X1=X+v;
__4.2.show subgraph X1;
__4.3.F(X1,Y1);
__4.4.Y1=Y1+v;
Initial call F(X,Y):
X, Y = empty set;
F(X,Y);
The main idea of this algorithm is using "forbidden set" so that, this one doesn't require pruning, author of this algorithm said that it is 300 times more quickly than solution based on pruning equals subgraphs. But I haven't found any proofs that this algorithm is correct at all.
UPDATE:
More efficient solution was found here
Here is an Python implementation of what I believe to be your original algorithm:
from collections import defaultdict
D=defaultdict(list)
def addedge(a,b):
D[a].append(b)
D[b].append(a)
addedge(1,2)
addedge(2,3)
addedge(3,4)
V=D.keys()
k=2
def F(X,Y):
if len(X)==k:
return
if X:
T = set(a for x in X for a in D[x] if a not in Y and a not in X)
else:
T = V
Y1=set(Y)
for v in T:
X.add(v)
print X
F(X,Y1)
X.remove(v)
Y1.add(v)
print 'original method'
F(set(),set())
F generates all connected subgraphs of size <=k where the subgraph must include vertices in X (a connected subgraph itself), and must not include vertices in Y.
We know that to include another vertex in the subgraph we must use a connected vertex so we can recurse based on the identity of the first connected vertex v that is inside the final subgraph. The forbidden set means that we ensure that a second copy of subgraph cannot be generated as this copy would have to use v, but v is in the forbidden set so cannot be used again.
So at this superficial level of analysis, this algorithm appears efficient and correct.
You did not describe the algorithm well. We dont know that k is or what V is in this algorithm. I just assume k is the restricted length on the sub-graph and V is some root vertex.
If that is true than it looks to me that this algorithm is incorrect. Suppose we have a graph with only two connected vertices v1, v2 and the restricted on the sub graph k = 1.
In the first iteration: X, Y = empty, T(X) = {v1}, X1 = {V1}, Y1 = empty we show X1.
Then we recursively call F(X1, Y1), and it should return immediately because |X| = |{v1}| = 1
Back to the 1st iteration now Y(1) = v1. The loop ends and the initial call also ends here. So we are printing out only X1. We suppose to print out X1, X2.
By the way do not "test" an algorithm - there is no way to test it (the number of possible test case is infinite). You should indeed formally prove it.

Resources