Determine if u is an ancestor of v - binary-tree

Heres the problem, and my attempted solution.
My solution:
1. Run a topological sort on the tree, which runs in linear time BigTheta(E+V) where E is the number of edges and V the number of vertices. This puts it in a linked list which also takes constant time.
2. A vertex u would be an ancestor if it has a higher finishing time than vertex v.
3. Look at the 2 vertice's in the linked list and compare their finishing time and return true or false depending on the result from step 2.
Does this sound correct or am i missing something?

I don't think your understanding of what "constant time" means is quite correct. "...time BigTheta(E+V) where E is the number of edges and V the number of vertices" is linear time, not constant time.
Granted, you are allowed to take linear time for the pre-processing, so that's ok, but how are you going to do your step 3 ("Look at the 2 vertice's in the linked list") in constant time?

Here is an approach that will work for any tree (not only binary). The pre-processing step is to perform an Euler Tour of the tree (this is just a DFS traversal) and create a list out of this tour. When you visit a node for the first time you append it to the list and when you visit it the last time you append it to the list.
Example:
x
/ \
y z
The list will look like: [b(x), b(y), e(y), b(z), e(z), e(x)]. Here b(x) means enter x and e(x) means leave x. Now once you have this list, you can answer the query is x an ancestor of y by performing the test b(x) is before b(y) and e(y) is before e(x) in the list.
The question is how can you do this in constant time?
For static trees (which is the case for you), you can use a lookup table (aka array) to store the b/e, now the test takes constant time. So this solves your problem.

Related

Number of walks from source to sink with exactly h hops

Given an un-directed graph, a starting vertex and ending vertex. Find the number of walks (so a vertex can be visited more than once) from the source to the sink that involve exactly h hops. For example, if the graph is a triangle, the number of such paths with h hops is given by the h-th Jakobstahl number. This can be extended to a fully connected k-node graph, producing the recurrence (and closed form solution) here.
When the graph is an n-sided polygon, the accepted answer here expresses the number of walks as a sum of binomial terms.
I assume there might be an efficient algorithm for finding this number for any given graph? We can assume the graph is provided in adjacency matrix or adjacency list or any other convenient notation.
A solution to this would be to use a modified BFS with two alternating queues and a per-node counter for paths to this node of a certain length:
paths(start, end, n):
q = set(start)
q_next = set()
path_ct = map()
path_ct_next = map()
path_ct[start] = 1
for i in [0, n): # counting loop
for node in q: # queue loop
for a in adjacent(node): # neighbor-loop
path_ct_next[a] += path_ct[node]
q_next.add(a)
q = q_next
q_next = set()
path_ct = path_ct_next
path_ct_next = map()
return path_ct_next[end]
The basic assumption here is that map() produces a dictionary that returns zero, if the entry doesn't yet exist. Otherwise it returns the previously set value. The counting-loop simply takes care of doing exactly as many iterations as hops as required. The queue loop iterates over all nodes that can be reached using exactly i hops. In the neighbor-loop finally all nodes that can be reached in i + 1 hops are found. In this loop the adjacent nodes will be stored into a queue for the next iteration of counting-loop. The number of possible paths to reach such a node is the sum of the number of paths to reach it's predecessors. Once this is done for each node of the current iteration of counting-loop, the queues and tables are swapped/replaced by empty instances and the algorithm can start over.
If you take the adjacency matrix of a graph and raise it to the nth power, the resulting matrix counts the number of paths from each node to each other that uses exactly n edges. That would provide one way of computing the number you want - plus many others you aren’t all that interested in. :-)
Assuming the number of paths is “small” (say, something that fits into a 64-bit integer), you could use exponentiation by squaring to compute the matrix in O(log n) multiplies for a total cost of O(|V|ω log n), where ω is the exponent of the fastest matrix multiplication algorithm. However, if the quantity you’re looking for doesn’t fit into a machine word, then the cost of this approach will depend on how big the answer is as the multiplies will take variable amounts of time. For most graphs and small n this won’t be an issue, but if n is large and there are other parts of the graph that are densely connected this will slow down a bit.
Hope this helps!
You can make an algorithm that keep searching all possible paths , but with a variable that will contain your number of hops
For each possible path , each hop will decrement that variable and when arriving to zero your algorithm goes to trying another path , and if ever a path arrives to the target before making variable reachs zero , this path will be added to the list of your desired paths

Queries on Tree Path with Modifications

Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).
Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).

Top k-best paths in HMM with k > number of hidden states

I have implemented a k-best Viterbi algorithm in order to extract k-best paths through an HMM as described here. However, I get an error in case k is greater than the number of hidden states.
Consider the following: At the first observation at time t, every k for each state j is the same (i.e. all paths to that state are the same, since it's the first observation). I then want to compute the k-best paths for a state i at time t+1. In order to do that, I extract the k-best predecessor paths at time t. However, since all paths for each state at t are the same, I end up with the same best predecessor state k times for my state i (the same applies for all states at time t+1). This effectively results in all paths being the same path (1st-best).
As suggested in the literature, I disregarded paths that have already been taken when looking for k-best predecessor states. However, that effectively leaves me with N different paths at time t, with N referring to the number of hidden states. So, choosing k to be bigger than N results in an error when looking for k-best predecessor paths at time t.
I hope the point I am trying to make got through. Obviously, I am missing something here, but I cannot figure out what.

Time cost analysis of trees

I am having trouble calculating the time analysis of for the following algorithm on any arbitrary tree of size N.
Question is:
Consider the following algorithm,
which makes the following assumptions. x and y are the roots of two binary
trees, Tx and Ty. Left(z) is a pointer to the left child of node z in either
tree, and Right(z) points to the right child. If the node doesn't have a
left or right child, the pointer returns \NIL". Each node z also has a eld
Size(z) which returns the number of nodes in the sub-tree rooted at z.
Size(NIL) is defined to be 0. The algorithm SameTree(x; y) returns a
boolean answer that says whether or not the trees rooted at x and y are
the same if you ignore the difference between left and right pointers.
Program: SameTree(x,y: Nodes): Boolean;
IF Size(x) 6= Size(y) THEN return False; halt.
IF x = NIL THEN return T rue; halt.
IF (SameTree(Left(x); Left(y)) AND SameTree(Right(x); Right(y)))
OR (SameTree(Right(x); Left(y)) AND SameTree(Left(x); Right(y)))
THEN return T rue; halt.
Return False; halt
Give the time analysis to run the above algorithm on any arbitrary tree of size N. I got O(nlog2^3) for dense graphs and O(n) for less dense graphs. Am I right? Can someone help me determine the time costs please?
Well let's use the Master principle. We shell consider the worst case where line 4 checks the condition before the OR and then checks the condition after it on EACH recursive call.
We will also simplify it by assuming the binaries trees are less or more balanced (has almost the same amount of nodes in each son of each node in the tree).
You have:
T(n) = 4*T(n/2)+2.
Look at http://en.wikipedia.org/wiki/Master_theorem to understand what I will do next:
We have case 1 from the Master theorem.
log in base 2 of 4 is 2. so the correct answer is O(n^2). This is the analysis for the General Case. If you wish a more precise analysis, you need to tell us much more on the odds for your tree to be balanced, unbalanced and what is the chance of it built in such a way that line 4 will be activating both conditions in each recursive call.
Average cases are much more complicated.

Checking if A is a part of binary tree B

Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.
My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.
Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.
Thanks in advance!
EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?
I would approach this problem in two steps:
Find the root of A in B (either BFS of DFS)
Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)
function checkTrees(node a, node b) returns boolean
if a does not exist or b does not exist then
// base of the recursion
return false
else if a is different from b then
// compare the current nodes
return false
else
// check the children of a
boolean leftFound = true
boolean rightFound = true
if a.left exists then
// try to match the left child of a with
// every possible neighbor of b
leftFound = checkTrees(a.left, b.left)
or checkTrees(a.left, b.right)
or checkTrees(a.left, b.parent)
if a.right exists then
// try to match the right child of a with
// every possible neighbor of b
leftFound = checkTrees(a.right, b.left)
or checkTrees(a.right, b.right)
or checkTrees(a.right, b.parent)
return leftFound and rightFound
About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).
While writing down my assumption, I start to wonder whether that's not oversimplifying your case...
you could compare the trees in bottom-up as follows:
for each leaf in tree A, identify the corresponding node in tree B.
start a parallel traversal towards the root in both trees from the nodes just matched.
specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
mark all nodes visited in A.
you succeed, if you haven't failed ;-).
the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.
EDIT:
re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.

Resources