You intend to run heapify on an array of n integers, in order to turn it into a heap in-place. How long will this operation take, in the worst case? (choose the tightest possible bound)
Options are:
a) O(n)
b) O(nlogn)
c) O(nlog^2n)
d) O(n^2)
I tried this out and got the following:
Since we have at most n nodes we have O(n) and since we need to move up and compare only the height of the tree times we get O(logn) thereby giving us O(nlogn). But this solution is wrong.
Then I thought maybe we don't compare the only the height of the tree times because a smaller node can be placed on the right side of the tree forcing us to go all the way to the right side and I marked O(n^2). And that was wrong too. Any suggestions?
Related
I guess it's rather simple but it seems I'm troubling myself..
what's the complexity of the following?
// let's say that Q has M initial items
while Q not empty
v <- Q.getFirst
for each z in v // here, every v cannot have more than 3 z's
...
O(1) operations here
...
Q.insert(z)
end
end
The number of the times this will happen, depends on if at some point v's do not have more z's (let's call this number N)
Is the complexity O(MxN^2) or I'm wrong? It's like having a tree with M parent nodes and each node, at most, can have three children. N is the total number of nodes.
Your algorithmic complexity should have an upper bound of O( (M * v) - parent nodes that are children nodes ) which is much better stated as O(n) where n is the number of nodes in your tree, since you only iterate the tree once.
Depending on your operation, you would want to consider the runtime of your Q.insert(z) and Q.getFirst() operation, because depending on your data structure that may be worth considering.
Assuming Q.insert() and Q.getFirst() runtimes are O(1), you can say O(M * v) is an approximate bounding, but since v elements can be repeated, you are better off stating that the runtime is just O(n) because O(m*v) actually overestimates the upper bound in all cases. O(n) is exact for every instance of the tree (n being the number of nodes).
I would say that it's much more safe to call it O(n) since I don't know the exact implementation of your insert - although with a linked list both insert and the get first can be O(1) operations. (Most binary tree inserts will be O(log n) if properly implemented - sufficient information was not provided)
It should not harm you to play it safe and consider your runtime analysis O(n), but depending on who you're pitching it to, that extra variable may seem unnecessary.
HTH
edited: clarity of problem in comments helped me understand the question better, fixed nonsense
In the CLRS book, building a heap by top-down heapify has the complexity O(n). A heap can also be built by repeatedly calling insertion, which has the complexity nlg(n) in the worst case.
My question is: is there any insight why the latter method has the worse performance?
I asked this question since I feel there are simple insights behind the math. For example,
quicksort, merge sort, and heapsort are all based on reducing unnecessary comparisons, but with different methods.
quicksort: balanced partition, no need to compare left subset to right subset.
merge sort: simply compare the two minimum elements from two sub-arrays.
heapsort: if A has larger value than B, A has larger value than B's descendants, and no need to compare with them.
The main difference between the two is what direction they work: upwards (the O(n log n) algorithm) or downwards (the O(n)) algorithm.
In the O(n log n) algorithm done by making n insertions, each insertion might potentially bubble up an element from the bottom of the (current) heap all the way up to the top. So imagine that you've built all of the heap except the last full layer. Imagine that every time you do an insertion in that layer, the value you've inserted is the smallest overall value. In that case, you'd have to bubble the new element all the way up to the top of the heap. During this time, the heap has height (roughly) log n - 1, so the total number of swaps you'll have to do is (roughly) n log n / 2 - n / 2, giving a runtime of Θ(n log n) in the worst-case.
In the O(n) algorithm done by building the heap in one pass, new elements are inserted at the tops of various smaller heaps and then bubbled down. Intuitively, there are progressively fewer and fewer elements higher and higher up in the heap, so most of the work is spent on the leaves, which are lower down, than in the higher elements.
The major difference in the runtimes has to do with the direction. In the O(n log n) version, since elements are bubbled up, the runtime is bounded by the sum of the lengths of the paths from each node to the root of the tree, which is Θ(n log n). In the O(n) version, the runtime is bounded by the lengths of the paths from each node to the leaves of the tree, which is much lower (O(n)), hence the better runtime.
Hope this helps!
I already know if you try to find the item with particular key the running time of worst case
is O(n) ,nis the number of node. If you try to print out all the data items in order of their keys then the running time of worst case is O(n). If you try to search for a particular data item(you don't know the key) then the running time of worst case is O(n). However, what if the keys and data are both integers and, the input items were randomly scrambled before they were inserted. Will the worst cases of running time still the same?
In the worst-case, yes. A randomly-built BST with n nodes has a 2n-1 / n! chance of being built degenerately, which is extremely rare as n gets to any reasonable size but still possible. In that case, a lookup might take Θ(n) time because the search might need to descend all the way down to the deepest leaf.
On expectation, though, the tree height will be Θ(log n), so lookups will take expected O(log n) time.
The time to print a tree is independent of the shape of the tree, by the way. It's always Θ(n).
Hope this helps!
You might not be able to change the worst case running time of a normal BST, however, if you randomize the input(in less than O(log n) time, if you're targeting O(log n) overall), then chances of that worst case occurring are highly rare. See mathematical analysis here.
In case you are interested in guaranteed O(log n) time, you can use Balanced BSTs like Red Black Trees etc. However, time to print will still be O(n) as you still need to visit each and every node before you can print it.
The background
According to Wikipedia and other sources I've found, building a binary heap of n elements by starting with an empty binary heap and inserting the n elements into it is O(n log n), since binary heap insertion is O(log n) and you're doing it n times. Let's call this the insertion algorithm.
It also presents an alternate approach in which you sink/trickle down/percolate down/cascade down/heapify down/bubble down the first/top half of the elements, starting with the middle element and ending with the first element, and that this is O(n), a much better complexity. The proof of this complexity rests on the insight that the sink complexity for each element depends on its height in the binary heap: if it's near the bottom, it will be small, maybe zero; if it's near the top, it can be large, maybe log n. The point is that the complexity isn't log n for every element sunk in this process, so the overall complexity is much less than O(n log n), and is in fact O(n). Let's call this the sink algorithm.
The question
Why isn't the complexity for the insertion algorithm the same as that of the sink algorithm, for the same reasons?
Consider the actual work done for the first few elements in the insertion algorithm. The cost of the first insertion isn't log n, it's zero, because the binary heap is empty! The cost of the second insertion is at worst one swap, and the cost of the fourth is at worst two swaps, and so on. The actual complexity of inserting an element depends on the current depth of the binary heap, so the complexity for most insertions is less than O(log n). The insertion cost doesn't even technically reach O(log n) until after all n elements have been inserted [it's O(log (n - 1)) for the last element]!
These savings sound just like the savings gotten by the sink algorithm, so why aren't they counted the same for both algorithms?
Actually, when n=2^x - 1 (the lowest level is full), n/2 elements may require log(n) swaps in the insertion algorithm (to become leaf nodes). So you'll need (n/2)(log(n)) swaps for the leaves only, which already makes it O(nlogn).
In the other algorithm, only one element needs log(n) swaps, 2 need log(n)-1 swaps, 4 need log(n)-2 swaps, etc. Wikipedia shows a proof that it results in a series convergent to a constant in place of a logarithm.
The intuition is that the sink algorithm moves only a few things (those in the small layers at the top of the heap/tree) distance log(n), while the insertion algorithm moves many things (those in the big layers at the bottom of the heap) distance log(n).
The intuition for why the sink algorithm can get away with this that the insertion algorithm is also meeting an additional (nice) requirement: if we stop the insertion at any point, the partially formed heap has to be (and is) a valid heap. For the sink algorithm, all we get is a weird malformed bottom portion of a heap. Sort of like a pine tree with the top cut off.
Also, summations and blah blah. It's best to think asymptotically about what happens when inserting, say, the last half of the elements of an arbitrarily large set of size n.
While it's true that log(n-1) is less than log(n), it's not smaller by enough to make a difference.
Mathematically: The worst-case cost of inserting the i'th element is ceil(log i). Therefore the worst-case cost of inserting elements 1 through n is sum(i = 1..n, ceil(log i)) > sum(i = 1..n, log i) = log 1 + log 1 + ... + log n = log(1 × 2 × ... × n) = log n! = O(n log n).
Ran into the same problem yesterday. I tried coming up with some form of proof to satisfy myself. Does this make any sense?
If you start inserting from the bottom, The leaves will have constant time insertion- just copying it into the array.
The worst case running time for a level above the leaves is:
k*(n/2h)*h
where h is the height (leaves being 0, top being log(n) ) k is a constant( just for good measure ). So (n/2h) is the number of nodes per level and h is the MAXIMUM number of 'sinking' operations per insert
There are log(n) levels,
Hence, The total running time will be
Sum for h from 1 to log(n): n* k* (h/2h)
Which is k*n * SUM h=[1,log(n)]: (h/2h)
The sum is a simple Arithmetico-Geometric Progression which comes out to 2.
So you get a running time of k*n*2, Which is O(n)
The running time per level isn't strictly what i said it was but it is strictly less than that.Any pitfalls?
I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)