Complexity analysis exercise on RB-Trees - algorithm

BLACK_PATH(T,x)
if x==NIL
then return TRUE
if COLOR(x)==BLACK
then return BLACK_PATH(T,left(x)) || BLACK_PATH(T,right(x))
return FALSE
The exercises asks to analyse the complexity of this procedure. I believe the reccurrence is the following
T(n)<=2T(2n/3)+O(1)
Using the recursion tree I obtain T(n)=O(n). Is this correct?

The complexity of this method is linear (O(n)) in the worst case with regards to the number of elements in the tree.
Using the master theorem in terms of the total number of nodes here is difficult because it does not take into account the properties of a red black tree. While it is true in general for heaps that every subtree of a tree with n nodes has max 2n/3 nodes, it is also true that for red black trees every subtree has at max n/2 black nodes. This is because red black trees are balanced with respect to black nodes (every path downwards to a leaf node from an arbitrary node has the same number of black nodes).
Most importantly: because the number of total nodes is not asymptotically higher than the number of black nodes you can, by analyzing the complexity purely with regards to the total number of black nodes, implicitly analyze the complexity with regards to the total number of nodes.
So rather than using T(n)<=2T(2n/3)+O(1) you should use T(m)<=T(m/2)+O(1) where m is the number of black nodes which gives you O(m) and because, as previously discussed, O(m)==O(n), we have O(n).
Another way to think about it: So long as you can understand that this algorithm is O(n) when all the nodes in the tree are black, you should be able to understand that it could only possibly require fewer operations if some of the nodes in the tree are red, since regardless of where the red node is every node in the subtree rooted at that red node will be ignored and not visited by this recursive algorithm. So it can only be O(n) or better, establishing O(n) as your worst case.

Related

The time complexity for a 1-NN (nearest neighbor) search using a KD tree (i.e. balanced binary tree) is in what range?

The time complexity for 1-NN search using a KD tree (i.e. balanced binary tree) is in what range?
Assume there are N points in the dataset
There's a hint here, but still can't figure it out: https://www.coursera.org/lecture/ml-clustering-and-retrieval/complexity-of-nn-search-with-kd-trees-BkZTg
A) O(N2) - O(N3)
B) O(log N) - O(N)
C) O(N logN) - O(N2)
D) None of the above
TLDR. The best and average case for finding the nearest neighbor with a kd-tree is going to be O(logβ‚‚(N)). But the worst case can be closer to O(N). Hence the answer is B) O(log N) - O(N)
Assuming the kd-tree is already built, each level of the tree normally bifurcates the entire range of points across one of the given dimensions. Just like a binary tree. So if the tree is perfectly balanced and each leaf node consists of just one point, there will be approximately N leaves across at the bottom of a tree that is logβ‚‚(N) levels high. Hence, to find the original point in the tree is always logβ‚‚(N).
But you aren't looking for the original point, you're looking for it's closest neighbor. So that's where it gets complicated. In practice, your leaf nodes don't contain just one point. They contain some reasonable number of points (~ logβ‚‚(N) or some small number like "10") that are considered to be in the same box or "cluster".
So when you find the initial point, you can immediately do a distance computation on the other points in the leaf node cluster to find an initial candidate for nearest neighbor. Hence, the highest probability is that the nearest neighbor is in the same cluster with a lesser probability its in one of the adjacent leaf node clusters. As you recurse up the tree, you have to decide if recursing the other child node is needed based on the dimension and mid-point value of each node. But if you've already found the nearest neighbor, you probably won't do too many more recursions down the tree.
But it's theoretically possible, that the initial bifurcation of the set of points split the original point with it closest neighbor. And in some crazy layout of points, you wind up having to do a distance computation with most of the points in the tree. Hence, O(N). Try making a 2-d graph of points and creating this scenario yourself.

Time Complexity of traversing n-ary tree seems to have multiple correct answers, which is most correct?

Ignoring space complexity, assuming each node in the tree is touched exactly once and considering DFS and BFS traversal time equivalent, what is the time complexity of traversing an n-ary tree?
Given that Big O notation is an asymptotic measure, meaning that we are looking a function that gives us a line or curve that best fits the problem as it is extended to more levels or branches.
My intuition tells me that for tree structures in general we would want a function of the sort b^l where b is the branching factor and l is the number of levels in the tree (for a full and complete tree).
However for a partial tree, it would make sense to take some sort of average of b and l, perhaps AVG(b)^AVG(l).
In looking for this answer I find many people are saying it is O(n) where n is the number of verticies in the tree (nodes -1). See:
What is the time complexity of tree traversal? and
Complexity of BFS in n-ary tree
But a linear solution in my mind does not model the cost (in time) that the algorithm will take as the tree adds additional levels (on average). Which is what I understand Big O notation is intended to do.
The height or branching factor of a tree are not the determining factors for the complexity of a complete traversal (whether it be BFS or DFS). It is only the number of vertices that is the determining factor.
If you want to express the complexity in terms of branching factor and height of the tree, then you are really asking what the relation is between these measures and the number of nodes in a tree.
As you already indicate, if the branching factor is 100 but it is only rarely that a node has more than 3 children, then the branching factor is not really telling us much. The same can be said about the height of the tree. A tree of height 4 and branching factor 2 can have between 5 and 31 nodes.
What to do then? Often, the worst case will be taken (maximising the number of nodes). This means that you'll translate branching factor and height to a tree that is perfect, i.e. where each node has the maximum number of children, except for the leaves of the tree, which are all on the same level.
The number of nodes 𝑛 is then π‘β„Ž+1-1, where 𝑏 is the branching factor, and β„Ž the height (number of edges on the longest path from root to leaf).
That means the worst case time complexity for a given 𝑏 and β„Ž is O(π‘β„Ž+1)
Working with average is not really practical. The distribution of the branching factor may not be linear, and the distribution of leaf-depths might not be linear either, so working with averages is not going to give much insight.
As to the cost of adding a node: Once the insertion point is determined, the time complexity for adding a node is constant. It does not matter which that insertion increases the branching factor or increases the height of the tree.
There is some variation when it comes to finding a node if the tree is a search tree (like a BST or B-tree). In that case the height of the tree becomes important, and a search would cost O(β„Ž). If however the tree is self-balancing (like AVL or B-tree), this variation is limited and this complexity would be O(log𝑛)

Time/Space Complexity of Depth First Search

I've looked at various other StackOverflow answer's and they all are different to what my lecturer has written in his slides.
Depth First Search has a time complexity of O(b^m), where b is the
maximum branching factor of the search tree and m is the maximum depth
of the state space. Terrible if m is much larger than d, but if search
tree is "bushy", may be much faster than Breadth First Search.
He goes on to say..
The space complexity is O(bm), i.e. space linear in length of action
sequence! Need only store a single path from the root to the leaf
node, along with remaining unexpanded sibling nodes for each node on
path.
Another answer on StackOverflow states that it is O(n + m).
Time Complexity: If you can access each node in O(1) time, then with branching factor of b and max depth of m, the total number of nodes in this tree would be worst case = 1 + b + b2 + … + bm-1. Using the formula for summing a geometric sequence (or even solving it ourselves) tells that this sums to = (bm - 1)/(b - 1), resulting in total time to visit each node proportional to bm. Hence the complexity = O(bm).
On the other hand, if instead of using the branching factor and max depth you have the number of nodes n, then you can directly say that the complexity will be proportional to n or equal to O(n).
The other answers that you have linked in your question are similarly using different terminologies. The idea is same everywhere. Some solutions have added the edge count too to make the answer more precise, but in general, node count is sufficient to describe the complexity.
Space Complexity: The length of longest path = m. For each node, you have to store its siblings so that when you have visited all the children, and you come back to a parent node, you can know which sibling to explore next. For m nodes down the path, you will have to store b nodes extra for each of the m nodes. That’s how you get an O(bm) space complexity.
The complexity is O(n + m) where n is the number of nodes in your tree, and m is the number of edges.
The reason why your teacher represents the complexity as O(b ^ m), is probably because he wants to stress the difference between Depth First Search and Breadth First Search.
When using BFS, if your tree has a very large amount of spread compared to it's depth, and you're expecting results to be found at the leaves, then clearly DFS would make much more sense here as it reaches leaves faster than BFS, even though they both reach the last node in the same amount of time (work).
When a tree is very deep, and non-leaves can give information about deeper nodes, BFS can detect ways to prune the search tree in order to reduce the amount of nodes necessary to find your goal. Clearly, the higher up the tree you discover you can prune a sub tree, the more nodes you can skip.
This is harder when you're using DFS, because you're prioritize reaching a leaf over exploring nodes that are closer to the root.
I suppose this DFS time/space complexity is taught on an AI class but not on Algorithm class.
The DFS Search Tree here has slightly different meaning:
A node is a bookkeeping data structure used to represent the search
tree. A state corresponds to a configuration of the world. ...
Furthermore, two different nodes can contain the same world state if
that state is generated via two different search paths.
Quoted from book 'Artificial Intelligence - A Modern Approach'
So the time/space complexity here is focused on you visit nodes and check whether this is the goal state. #displayName already give a very clear explanation.
While O(m+n) is in algorithm class, the focus is the algorithm itself, when we store the graph as adjacency list and how we discover nodes.

Why the Red Black Tree is kept unbalanced after insertion?

Here is a red black tree which seems unbalanced. If this is the case, Someone please explain why it is unbalanced?.
The term "balanced" is a bit ambiguous, since different kinds of balanced trees have different constraints.
A red-black tree ensures that every path to a leaf has the same number of black nodes, and at least as many black nodes as red nodes. The result is that the longest path is at most twice as long as the shortest path, which is good enough to guarantee O(log N) time for search, insert, and delete operations.
Most other kinds of balanced trees have tighter balancing constraints. An AVL tree, for example, ensures that the lengths of the longest paths on either side of every node differ by at most 1. This is more than you need, and that has costs -- inserting or deleting in an AVL tree (after finding the target node) takes O(log N) operations on average, while inserting or deleting in a red-black tree takes O(1) operations on average.
If you wanted to keep a tree completely balanced, so that you had the same number of descendents on either side of every node, +/- 1, it would be very expensive -- insert and delete operations would take O(N) time.
Yes it is balanced. The rule says, counting the black NIL leaves, the longest possible path should consists maximum of 2*B-1 nodes where B is black nodes in shortest possible path from the root to any leaf. In your example shortest path has 2 black nodes so B = 2 so longest path can have upto 3 black nodes but it is just 2.

Complexity of a tree labeling algorithm

I have a generic weighted tree (undirected graph without cycles, connected) with n nodes and n-1 edges connecting a node to another one.
My algorithm does the following:
do
compute the actual leaves (nodes with degree 1)
remove all the leaves and their edges from the tree labelling each parent with the maximum value of the cost of his connected leaves
(for example if an internal node is connected to two leaf with edges with costs 5,6 then we label the internal node after removing the leaves with 6)
until the tree has size <= 2
return the node with maximum cost labelled
Can I say that the complexity is O(n) to compute the leaves and O(n) to eliminate each edge with leaf, so I have O(n)+O(n) = O(n)?
You can easily do this in O(n) with a set implemented as a simple list, queue, or stack (order of processing is unimportant).
Put all the leaves in the set.
In a loop, remove a leaf from the set, delete it and its edge from the graph. Process the label by updating the max of the parent. If the parent is now a leaf, add it to the set and keep going.
When the set is empty you're done, and the node labels are correct.
Initially constructing the set is O(n). Every vertex is placed on the set, removed and its label processed exactly once. That's all constant time. So for n nodes it is O(n) time. So we have O(n) + O(n) = O(n).
It's certainly possible to do this process in O(n), but whether or not your algorithm actually does depends.
If either "compute the actual leaves" or "remove all the leaves and their edges" loops over the entire tree, that step would take O(n).
And both the above steps will be repeated O(n) times in the worst case (if the tree is greatly unbalanced), so, in total, it could take O(n2).
To do this in O(n), you could have each node point to its parent so you can remove the leaf in constant time and maintain a collection of leaves so you always have the leaves, rather than having to calculate them - this would lead to O(n) running time.
As your tree is an artitary one. It can also be a link list in which case you would eliminate one node in each iteration and you would need (n-2) iterations of O(n) to find the leaf.
So your algorithm is actually O(N^2)
Here is an better algorithm that does that in O(N) for any tree
deleteLeaf(Node k) {
for each child do
value = deleteLeaf(child)
if(value>max)
max = value
delete(child)
return max
}
deleteLeaf(root) or deleteLeaf(root.child)

Resources