Is the height of all binary trees with average depth of O(logn) for all nodes, also O(logn)? - binary-tree

If the height of all the binary trees with average depth of O(logn), is also O(logn) is true, I need to prove it. If it isn't correct I'm asked to find a group of trees which have an average depth of O(logn) but their height is not O(logn). Since O is calculated asymptoticaly the group of trees must be general and not specific (for expample not trees with 10 nodes).

If the height of all the binary trees with average depth of O(logn),
is also O(logn)
No, that's not true.
If it isn't correct I'm asked to find a group of trees which have an
average depth of O(logn) but their height is not O(logn).
Basically the group of randomly generated BST could suite your needs.
If c is a positive integer, n is sufficiently large, and T is a randomly
constructed BST with size n, then the probability that
height(T) ≥ 3c log N
is less than 1/c.

Related

Complete binary tree time complexity

If someone wants to generates a complete binary tree. This tree has h levels where h can be any positive integer and as an input to the algorithm. What complexity will it lie in and why?
A complete binary tree is tree where all levels are full of nodes except the last level, we can define the time complexity in terms of upper bound.
If we know the height of the tree is h, then the maximum number of possible nodes in the tree are 2h - 1.
Therefore, time complexity = O(2h - 1).
To sell your algorithm in the market, you need tight upper bounds to prove that your algorithm is better than the others'.
A slightly tight upper bound for this problem can be defined after knowing exactly how many nodes are there in the tree. Let's say there are N.
Then, the time complexity = O(N).

Big O for Height of Balanced Binary Tree

Perhaps a dumb question. In a balanced binary tree where n is the total number of nodes, I understand why the height is equal to log(n). What I don't understand is what people mean when they refer to the height as being O(log(n)). I've only seen Big O used in the context of algorithms, where if an algorithm runs in O(n) and if the input doubles then the running time doubles. But height isn't an algorithm. How does this apply to the height of a tree? What does it mean for the height to be O(log(n))?
This is because a complete binary tree of n nodes does not have height log(n).
Consider a complete binary tree of height k. Such a tree has 2k leaf nodes. How many nodes does it have in total? If you look at each level, you will find that it has 1 + 2 + 4 + 8 + ... + 2k nodes, or 20 + 21 + 22 + 23 + ... 2k.
After some math, you will find that this series equals 2k+1 - 1.
So, if your tree has n nodes, what is its height? If you solve the equation n = 2k+1 - 1 with respect to k, you obtain k = log2(n+1) - 1.
This expression is slightly less nice than log2(n), and it is certainly not the same number. However, by the properties of big-O notation,
log2(n+1) - 1 = O(log(n)).
In the source you are reading, emphasis is given on that the height grows as fast as log(n). They belong to the same complexity class. This information can be useful when designing algorithms, since you know that doubling the input will increase the tree height only by a constant. This property gives tree structures immense power. So even if the expression for the exact height of the tree is more complicated (and if your binary tree is not complete, it will look more complicated still), it is of logarithmic complexity with respect to n.
To add to Berthur's excellent answer, Big-Oh notation is not specific to analysis of algorithms; it applies to any functions. In analysis f algorithms we care about the function T(n) which gives the (typically worst-case) runtime for input size n, and we want to know an upper bound (Big-Oh) on that function's rate of growth. Here, there is a function that gives the true height of a tree with whatever property, and we want an upper bound on that function's rate of growth. We could find upper bounds on arbitrary functions devoid of any context at all like f(n) = n!^(1/2^n) or whatever.
I think they mean it takes O(log(n)) to traverse the tree

Why isn't the time complexity of building a binary heap by insertion O(n)?

The background
According to Wikipedia and other sources I've found, building a binary heap of n elements by starting with an empty binary heap and inserting the n elements into it is O(n log n), since binary heap insertion is O(log n) and you're doing it n times. Let's call this the insertion algorithm.
It also presents an alternate approach in which you sink/trickle down/percolate down/cascade down/heapify down/bubble down the first/top half of the elements, starting with the middle element and ending with the first element, and that this is O(n), a much better complexity. The proof of this complexity rests on the insight that the sink complexity for each element depends on its height in the binary heap: if it's near the bottom, it will be small, maybe zero; if it's near the top, it can be large, maybe log n. The point is that the complexity isn't log n for every element sunk in this process, so the overall complexity is much less than O(n log n), and is in fact O(n). Let's call this the sink algorithm.
The question
Why isn't the complexity for the insertion algorithm the same as that of the sink algorithm, for the same reasons?
Consider the actual work done for the first few elements in the insertion algorithm. The cost of the first insertion isn't log n, it's zero, because the binary heap is empty! The cost of the second insertion is at worst one swap, and the cost of the fourth is at worst two swaps, and so on. The actual complexity of inserting an element depends on the current depth of the binary heap, so the complexity for most insertions is less than O(log n). The insertion cost doesn't even technically reach O(log n) until after all n elements have been inserted [it's O(log (n - 1)) for the last element]!
These savings sound just like the savings gotten by the sink algorithm, so why aren't they counted the same for both algorithms?
Actually, when n=2^x - 1 (the lowest level is full), n/2 elements may require log(n) swaps in the insertion algorithm (to become leaf nodes). So you'll need (n/2)(log(n)) swaps for the leaves only, which already makes it O(nlogn).
In the other algorithm, only one element needs log(n) swaps, 2 need log(n)-1 swaps, 4 need log(n)-2 swaps, etc. Wikipedia shows a proof that it results in a series convergent to a constant in place of a logarithm.
The intuition is that the sink algorithm moves only a few things (those in the small layers at the top of the heap/tree) distance log(n), while the insertion algorithm moves many things (those in the big layers at the bottom of the heap) distance log(n).
The intuition for why the sink algorithm can get away with this that the insertion algorithm is also meeting an additional (nice) requirement: if we stop the insertion at any point, the partially formed heap has to be (and is) a valid heap. For the sink algorithm, all we get is a weird malformed bottom portion of a heap. Sort of like a pine tree with the top cut off.
Also, summations and blah blah. It's best to think asymptotically about what happens when inserting, say, the last half of the elements of an arbitrarily large set of size n.
While it's true that log(n-1) is less than log(n), it's not smaller by enough to make a difference.
Mathematically: The worst-case cost of inserting the i'th element is ceil(log i). Therefore the worst-case cost of inserting elements 1 through n is sum(i = 1..n, ceil(log i)) > sum(i = 1..n, log i) = log 1 + log 1 + ... + log n = log(1 × 2 × ... × n) = log n! = O(n log n).
Ran into the same problem yesterday. I tried coming up with some form of proof to satisfy myself. Does this make any sense?
If you start inserting from the bottom, The leaves will have constant time insertion- just copying it into the array.
The worst case running time for a level above the leaves is:
k*(n/2h)*h
where h is the height (leaves being 0, top being log(n) ) k is a constant( just for good measure ). So (n/2h) is the number of nodes per level and h is the MAXIMUM number of 'sinking' operations per insert
There are log(n) levels,
Hence, The total running time will be
Sum for h from 1 to log(n): n* k* (h/2h)
Which is k*n * SUM h=[1,log(n)]: (h/2h)
The sum is a simple Arithmetico-Geometric Progression which comes out to 2.
So you get a running time of k*n*2, Which is O(n)
The running time per level isn't strictly what i said it was but it is strictly less than that.Any pitfalls?

What is exactly mean log n height?

I came to know the height of Random-BST/Red-Black trees and some other trees are O(log n).
I wonder, how this can be. Lets say I have a tree like this
The height of the tree is essentially the depth of the tree, which is in this case will be 4 (leaving the parent depth). But how could people say that the height can be represented by O(log n) notion?
I'm very to algorithms, and this point is confusing me a lot. Where I'm missing the point?
In algorithm complexity the variable n typically refers to the total number of items in a collection or involved in some calculation. In this case, n is the total number of nodes in the tree. So, in the picture you posted n=31. If the height of the tree is O(log n) that means that the height of the tree is proportional to the log of n. Since this is a binary tree, you'd use log base 2.
⌊log₂(31)⌋ = 4
Therefore, the height of the tree should be about 4—which is exactly the case in your example.
As I explained in a comment, a binary tree can have multiple cases:
In the degenerate case, a binary tree is simply a chain, and its height is O(n).
In the best case (for most search algorithms), a complete binary tree has the property that for any node, the height of the subtrees are the same. In this case the length will be the floor of log(n) (base 2, or base k, for k branches). You can prove this by induction on the size of the tree (structural induction in the constructors)
In the general case you will have a mix of these, a tree constructed where any node has subtress with possibly different height.

In Big-O notation for tree structures: Why do some sources refer to O(logN) and some to O(h)?

In researching complexity for any algorithm that traverses a binary search tree, I see two different ways to express the same thing:
Version #1: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(h).
Version #2: The traversal algorithm at worst case compares once per height of the tree; therefore complexity is O(logN).
It seems to me that the same logic is at work, yet different authors use either logN or h. Can someone explain to me why this is the case?
The correct value for the worst-case time to search is tree is O(h), where h is the height of a tree. If you are using a balanced search tree (one where the height of the tree is O(log n)), then the lookup time is worst-case O(log n). That said, not all trees are balanced. For example, here's a tree with height n - 1:
1
\
2
\
3
\
...
\
n
Here, h = O(n), so the lookup is O(n). It's correct to say that the lookup time is also O(h), but h ≠ O(log n) in this case and it would be erroneous to claim that the lookup time was O(log n).
In short, O(h) is the correct bound. O(log n) is the correct bound in a balanced search tree when the height is at most O(log n), but not all trees have lookup time O(log n) because they may be imbalanced.
Hope this helps!
If your binary tree is balanced so that every node has exactly two child nodes, then the number of nodes in the tree will be exactly N = 2h − 1, so the height is the logarithm of the number of elements (and similarly for any complete n-ary tree).
An arbitrary, unconstrained tree may look totally different, though; for instance, it could just have one node at every level, so N = h in that case. So the height is the more general measure, as it relates to actual comparisons, but under the additional assumption of balance you can express the height as the logarithm of the number of elements.
O(h) would refer to a binary tree that is sorted but not balanced
O(logn) would refer to a tree that is sorted and balanced
It's sort of two ways of saying the same thing, because your average balanced binary tree of height 'h' will have around 2^h nodes.
Depending on the context, either height or #nodes may be more relevant, and so that's what you'll see referenced.
because the (h)eight of a balanced tree varies as the log of the (N)umber of elements

Resources