Suppose a Node (in a BST) is defined as follows (ignore all the setters/getters/inits).
class Node
{
Node parent;
Node leftChild;
Node rightChild;
int value;
// ... other stuff
}
Given some a reference to some Node in a BST (called startNode) and another Node (called target), one is to check whether the tree containing startNode has any node whose value is equal to target.value.
I have two algorithms to do this:
Algorithm #1:
- From `startNode`, trace the way to the root node (using the `Node.parent` reference) : O(n)
- From the root node, do a regular binary search for the target : O(log(n))
T(n) = O(log(n) + n)
Algorithm #2: Basically perform a DFS
(Psuedo-code only)
current_node = startnode
While the root has not been reached
go up one level from the current_node
perform a binary-search from this node downward (excluding the branch from which we just go up)
What is the time-complexity of this algorithm?
The naive answer would be O(n * log(n)), where n is for the while loop, as there are at most n nodes, and log(n) is for the binary-search. But obviously, that is way-overestimating!
The best (partial) answer I could come up with was:
Suppose each sub-branch has some m_i nodes and that there are k
sub-branches.
In other words, k is the number of nodes between startNode and the root node
The total time would be
.
T(n) = log(m1) + log(m2) + ... + log(mk)
= log(m1 * m2 * ... * mk)
Where m1 + m2 + ... + mk = n (the total number of nodes in the tree)
(This is the best estimation I could get as I forgot most of my maths to do any better!)
So I have two questions:
0) What is the time-complexity of algorithm #2 in terms of n
1) Which algorithm does better in term of time-complexity?
Ok, after digging through my old Maths books, I was able to find that the upper bound of a product of k numbers whose sum is n is p <= (n /k) ^k.
With that said, the T(n) function would become:
T(n) = O(f(n, k))
Where
f(n, k) = log((n/k)^k)
= k * log(n/k)
= k * log(n) - k * log(k)
(Remember, k is the number nodes between the startNode and the root, while n is the total number of node)
How would I go from here? (ie., how do I simplify the f(n, k)? Or is that good enough for Big-O analysis? )
Related
I have written the following algorithm that given a node x in a Binary Search Tree T, will set the field s for all nodes in the subtree rooted at x, such that for each node, s will be the sum of all odd keys in the subtree rooted in that node.
OddNodeSetter(T, x):
if (T.x == NIL):
return 0;
if (T.x.key mod 2 == 1):
T.x.s = T.x.key + OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
else:
T.x.s = OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
I've thought of using the master theorem for this, with the recurrence
T(n) = T(k) + T(n-k-1) + 1 for 1 <= k < n
however since the size of the two recursive calls could vary depending on k and n-k-1 (i.e. the number of nodes in the left and right subtree of x), I can't quite figure out how to solve this recurrence though. For example in case the number of nodes in the left and right subtree of x are equal, we can express the recurrence in the form
T(n) = 2T(n/2) + 1
which can be solved easily, but that doesn't prove the running time in all cases.
Is it possible to prove this algorithm runs in O(n) with the master theorem, and if not what other way is there to do this?
The algorithm visits every node in the tree exactly once, hence O(N).
Update:
And obviously, a visit takes constant time (not counting the recursive calls).
There is no need to use the Master theorem here.
Think of the problem this way: what is the maximum number of operations you have do for each node in the tree? It is bounded by a constant. And what the is the number of nodes in the tree? It is n.
The multiplication of constant with n is still O(n).
Given a tree with n nodes (n can be as large as 2 * 10^5), where each node has a cost associated with it, let us define the following functions:
g(u, v) = the sum of all costs on the simple path from u to v
f(n) = the (n + 1)th Fibonacci number (n + 1 is not a typo)
The problem I'm working on requires me to compute the sum of f(g(u, v)) over all possible pairs of nodes in the tree modulo 10^9 + 7.
As an example, let's take a tree with 3 nodes.
without loss of generality, let's say node 1 is the root, and its children are 2 and 3
costs[1] = 2, cost[2] = 1, cost[3] = 1
g(1, 1) = 2; f(2) = 2
g(2, 2) = 1; f(1) = 1
g(3, 3) = 1; f(1) = 1
g(1, 2) = 3; f(3) = 3
g(2, 1) = 3; f(3) = 3
g(1, 3) = 3; f(3) = 3
g(3, 1) = 3; f(3) = 3
g(2, 3) = 4; f(4) = 5
g(3, 2) = 4; f(4) = 5
Summing all of the values, and taking the result modulo 10^9 + 7 gives 26 as the correct answer.
My attempt:
I implemented an algorithm to compute g(u, v) in O(log n) by finding the lowest common ancestor using a sparse table.
For the finding of the appropriate Fibonacci values, I tried two approaches, namely using exponentiation on the matrix form and another by noticing that the sequence modulo 10^9 + 7 is cyclical.
Now comes the extremely tricky part. No matter how I do the above computations, I still end up going to up to O(n^2) pairs when calculating the sum of all possible f(g(u, v)). I mean there's the obvious improvement of only going up to n * (n - 1) / 2 pairs but that's still quadratic.
What am I missing? I've been at it for several hours, but I can't see a way to get that sum without actually producing a quadratic algorithm.
To know how many times the cost of a node X is to be included in the total sum, we divide the other nodes into 3 (or more) groups:
the subtree A connected to the left of X
the subtree B connected to the right of X
(subtrees C, D... if the tree is not binary)
all other nodes Y, connected through X's parent
When two nodes belong to different groups, their simple path goes through X. So the number of simple paths that go through X is:
#Y + #A × (N - #A) + #B × (N - #B)
So by counting the total number of nodes N, and the size of the subtrees under X, you can calculate how many times the cost of node X should be included in the total sum. Do this for every node and you have the total cost.
The code for this could be straightforward. I'll assume that the total number of nodes N is known, and that you can add properties to the nodes (both of these assumptions simplify the algorithm, but it can be done without them).
We'll add a child_count to store the number of descendants of the node, and a path_count to store the number of simple paths that the node is part of; both are initialised to zero.
For each node, starting from the root:
If not all children have been visited, go to an unvisited child.
If all children have been visited (or node is leaf):
Increment child_count.
Increase path_count with N - child_count.
Add this node's path_count × cost to the total cost.
If the current node is the root, we're done; otherwise:
Increase the parent node's child_count with this node's child_count.
Increase the parent node's path_count with this node's child_count × (N - child_count).
Go to the parent node.
The below algorithm's running time is O(n^3).
Tree is a strongly connected graph without loops. So when we want to get all possible pairs' costs, we are trying to find the shortest paths for all pairs. Thus, we can use Dijkstra's idea and dynamic programming approach for this problem (I took it from Weiss's book). Then we apply Fibonacci function to the cost, assuming that we already have a table to look up.
Dijkstra's idea: We start from the root and search all simple paths from the root to all other nodes and then do that for other vertices on the graph.
Dynamic programming approach: We use a 2D matrix D[][] to represent the lowest path/cost (They could be used exchangeably.) between node i and node j. Initially, D[i][i] is already set. If node i and node j is parent/child, D[i][j] = g(i, j), which is the cost between them. If node k is on the path which has lower cost for node i and node j, we can update the D[i][j], i.e., D[i][j] = D[i][k] + D[k][j] if D[i][j] < D[i][k] + D[k][j] else D[i][j].
When done, we check D[][] matrix and apply Fibonacci function to each cell and add them up, and also apply modulo operation.
I am doing a problem in binary trees, and when I came across a problem find the right most node in the last level of a complete binary tree and the issue here is we have to do it in O(n) time which was a stopping point, Doing it in O(n) is simple by traversing all the elements, but is there a way to do this in any complexity less than O(n), I have browsed through internet a lot, and I couldn't get anything regarding the thing.
Thanks in advance.
Yes, you can do it in O(log(n)^2) by doing a variation of binary search.
This can be done by first going to the leftest element1, then to the 2nd leftest element, then to the 4th leftest element, 8th ,... until you find there is no such element.
Let's say the last element you found was the ith, and the first you didn't was 2i.
Now you can simply do a binary search over that range.
This is O(log(n/2)) = O(logn) total iterations, and since each iteration is going down the entire tree, it's total of O(log(n)^2) time.
(1) In here and the followings, the "x leftest element" is referring only to the nodes in the deepest level of the tree.
I assume that you know the number of nodes. Let n such number.
In a complete binary tree, a level i has twice the number of nodes than the level i - 1.
So, you could iteratively divide n between 2. If there remainder then n is a right child; otherwise, is a left child. You store into a sequence, preferably a stack, whether there is remainder or not.
Some such as:
Stack<char> s;
while (n > 1)
{
if (n % 2 == 0)
s.push('L');
else
s.push('R');
n = n/2; // n would int so division is floor
}
When the while finishes, the stack contains the path to the rightmost node.
The number of times that the while is executed is log_2(n).
This is the recursive solution with time complexity O(lg n* lg n) and O(lg n) space complexity (considering stack storage space).
Space complexity can be reduced to O(1) using Iterative version of the below code.
// helper function
int getLeftHeight(TreeNode * node) {
int c = 0;
while (node) {
c++;
node = node -> left;
}
return c;
}
int getRightMostElement(TreeNode * node) {
int h = getLeftHeight(node);
// base case will reach when RightMostElement which is our ans is found
if (h == 1)
return node -> val;
// ans lies in rightsubtree
else if ((h - 1) == getLeftHeight(node -> right))
return getRightMostElement(node -> right);
// ans lies in left subtree
else getRightMostElement(node -> left);
}
Time Complexity derivation -
At each recursion step, we are considering either left subtree or right subtree i.e. n/2 elements for maximum height (lg n) function calls,
calculating height takes lg n time -
T(n) = T(n/2) + c1 lgn
= T(n/4) + c1 lgn + c2 (lgn - 1)
= ...
= T(1) + c [lgn + (lgn-1) + (lgn-2) + ... + 1]
= O(lgn*lgn)
Since it's a complete binary tree, going over all the right nodes until you reach the leaves will take O(logN), not O(N). In regular binary tree it takes O(N) because in the worst case all the nodes are lined up to the right, but since it's a complete binary tree, it can't be
We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?
The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.
In another question about finding an algorithm to compute the diameter of a binary tree the following code is provided as a possible answer to the problem.
public static int getDiameter(BinaryTreeNode root) {
if (root == null)
return 0;
int rootDiameter = getHeight(root.getLeft()) + getHeight(root.getRight()) + 1;
int leftDiameter = getDiameter(root.getLeft());
int rightDiameter = getDiameter(root.getRight());
return Math.max(rootDiameter, Math.max(leftDiameter, rightDiameter));
}
public static int getHeight(BinaryTreeNode root) {
if (root == null)
return 0;
return Math.max(getHeight(root.getLeft()), getHeight(root.getRight())) + 1;
}
In the comments section it's being said that the time complexity of the above code is O(n^2). At a given call of the getDiameter function, the getHeight and the getDiameter functions are called for the left and right subtrees.
Let's consider the average case of a binary tree. Height can be computed at Θ(n) time (true for worst case too). So how do we compute the time complexity for the getDiameter function?
My two theories
Τ(n) = 4T(n/2) + Θ(1) = Θ(n^2), height computation is considered
(same?) subproblem.
T(n) = 2T(n/2) + n + Θ(1) = Θ(nlogn), n = 2*n/2 for height computation?
Thank you for your time and effort!
One point of confusion is that you think the binary tree is balanced. Actually, it can be a line. In this case, we need n operations from the root to the leaf to find the height, n - 1 from the root's child to the leaf and so on. This gives O(n^2) operations to find the height alone for all nodes.
The algorithm could be optimised if the height of each node was calculated independently, before finding the diameter. Then we would spend O(n) time for finding all heights. Then the complexity of finding the diameter would be of the following type:
T(n) = T(a) + T(n - 1 - a) + 1
where a is the size of the left subtree. This relation would give linear time for finding diameter also. So the total time would be linear.