O(1) and O(log n), are they same thing? - runtime

There is a question asking whether taking a minimum element from a min-heap is o(logn) time, I thought it was false because it takes constant time because the minimum element is on the top. But the answer is true and the instructor's explanation is that constant time is also of O(logn), it was a wording issue. I am quite confused. So in practice, do we consider constant time a runtime of O(logn)?

It is true that anything that is O(1) is O(log n). However, it is not true that everything that is O(log n) is O(1). O is an upper bound, so you can always use a faster-growing function.
Think of it like this:
Mickey Mouse is a mouse
All mice are mammals
Hence, Mickey Mouse is a mammal
... but "mice" and "mammals" are not synonymous

It turns out that O notation is just the upper bound of the problem. constant time could be put in any O notation because the upper bound has no limit, maybe even all the way up to n!. Big O notation is not theta.

Taking the minimum element from a min-heap is an O(log n) operation because it consists of two steps:
Getting the minimum item which, as you point out, is O(1) because it's known to be at the top of the heap.
Removing the minimum item. This is O(log n) because it involves moving the last item from the heap to the root, and then sifting it down to readjust the heap.
If you just want to get the minimum item without removing it, then the operation is indeed O(1). But any time you want to modify the heap (and removing the first item certainly qualifies), it's an O(log n) operation.
Nitpickers note:
I understand the argument that inserting into a min-heap is, on average, an O(1) operation. Whereas that can be true in some situations, the worst case is still O(log n). (Try inserting values into a heap in reverse order. Every insertion is O(log n).)

Related

Amortized cost of insert/remove on min-heap

I ran into an interview question recently. no additional info is given into question (maybe default implementation should be used...)
n arbitrary sequences of insert and remove operations on empty min heap
(location for delete element is known) has amortized cost of:
A) insert O(1), remove O(log n)
B) insert O(log n), remove O(1)
The option (B) is correct.
I'm surprized when see answer sheet. i know this is tricky, maybe empty heap, maybe knowing location of elements for delete,... i dont know why (A) is false? Why (B) is true?
When assigning amortized costs to operations on a data structure, you need to ensure that, for any sequence of operations performed, that the sum of the amortized costs is always at least as big as the sum of the actual costs of those operations.
So let's take Option 1, which assigns an amortized cost of O(1) to insertions and an amortized cost of O(log n) to deletions. The question we have to ask is the following: is it true that for any sequence of operations on an empty binary heap, the real cost of those operations is upper-bounded by the amortized cost of those operations? And in this case, the answer is no. Imagine that you do a sequence purely of n insertions into the heap. The actual cost of performing these operations can be Θ(n log n) if each element has to bubble all the way up to the top of the heap. However, the amortized cost of those operations, with this accounting scheme, would be O(n), since we did n operations and pretended that each one cost O(1) time. Therefore, this amortized accounting scheme doesn't work, since it will let us underestimate the work that we're doing.
On the other hand, let's look at Option 2, where we assign O(log n) as our amortized insertion cost and O(1) as our amortized remove cost. Now, can we find a sequence of n operations where the real cost of those operations exceeds the amortized costs? In this case, the answer is no. Here's one way to see this. We've set the amortized cost of an insertion to be O(log n), which matches its real cost, and so the only way that we could end up underestimating the total is with our amortized cost of a deletion (O(1)), which is lower than the true cost of a deletion. However, that's not a problem here. In order for us to be able to do a delete operation, we have to have previously inserted the element that we're deleting. The combined real cost of the insertion and the deletion is O(log n) + O(log n) = O(log n), and the combined amortized cost of the insertion and the deletion is O(log n) + O(1) = O(log n). So in that sense, pretending that deletions are faster doesn't change our overall cost.
A nice intuitive way to see why the second approach works but the first one doesn't is to think about what amortized analysis is all about. The intuition behind amortization is to charge earlier operations a bit more so that future operations appear to take less time. In the case of the second accounting scheme, that's exactly what we're doing: we're shifting the cost of the deletion of an element from the binary heap back onto the cost of inserting that element into the heap in the first place. In that way, since we're only shifting work backwards, the sum of the amortized costs can't be lower than the sum of the real costs. On the other hand, in the first case, we're shifting work forward in time by making deletions pay for insertions. But that's a problem, because if we do a bunch of insertions and then never do the corresponding deletions we'll have shifted the work to operations that don't exist.
Because the heap is initially empty, you can't have more deletes than inserts.
An amortized cost of O(1) per deletion and O(log N) per insertion is exactly the same as an amortized cost of O(log N) for both inserts and deletes, because you can just count the deletion cost when you do the corresponding insert.
It does not work the other way around. Since you can have more inserts than deletes, there might not be enough deletes to pay the cost of each insert.

Find log n greatest entries in O(n) time

Is there a way to find the log n greatest elements in an array with n elements in O(n) time?
I would create an array based HeapPriorityQueue, because if all elements are available the heap can be created in O(n) time using bottom up heap construction.
Then removing the first element of this priorityqueue should be in O(1) time isn`t?
Then removing the first element of this priority queue should be in O(1) time isn`t?
That will be O(logn), since you also remove the first element. Looking at it without removing is O(1). Repeating this removal logn times will be O(log^2(n)), which is still in O(n), so this solution will indeed meet the requirements.
Another option is to use selection algorithm to find the log(n)'th biggest element directly, which will be O(n) as well.
Basically, yes. The creation of the heap takes O(n) and this dominates the algorithm.
Removing the first element may take either O(1) if the heap does not updates it's keys after removing or O(log n) if it does. Either way the complexity of removing log(n) elements from the heap with and without updating would be O(log n * log n) and O(log n) respectively. Both of which are sublinear.

Find minimum element of a subsequence

Given a sequence S of n integer elements, I need a function min(i,j) that finds the minimum element of the sequence between index i and index j (both inclusive) such that:
Initialization takes O(n);
Memory space O(n);
min(i,j) takes O(log(n)).
Please suggest an algorithm for this.
Segmenttree is that what you need because it fulfils all your requirements.
Initialisation takes O(n) with Segment Tree
Memory is also O(n)
Queries can be done in O(log n)
Beside this, the tree is dynamic and can support updating in O(log n). This means one can modify the element of some element i in O(log n) and still retrieve the minimum.
This TopCoder tutorial: An < O(n), O(1) > approach discusses your problem in a more detail way. In the notation, means the approach takes f(n) complexity to setup, and g(n) complexity to query.
Also, this post chews the algorithm again: Range Minimum Query <O(n), O(1)> approach (from tree to restricted RMQ).
Hope them clarifies your question :)
Segment tree is just what you need(it can be build in O(n) time and one query takes O(log n) time).
Here is an article about it: http://wcipeg.com/wiki/Segment_tree.
Even though there is an algorithm that uses O(n) time for initialization and O(1) time per query, segment tree can be a good choice because it is much simpler.

Heapsort. How is it possible so simulate worstcase-scenario?

I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)

Is there one type of set-like data structure supporting merging in O(logn) time and k-th search in O(logn) time?(n is the size of this set)

Is there one type of set-like data structure supporting merging in O(logn) time and k-th element search in O(logn) time? n is the size of this set.
You might try a Fibonacci heap which does merge in constant amortized time and decrease key in constant amortized time. Most of the time, such a heap is used for operations where you are repeatedly pulling the minimum value, so a check-for-membership function isn't implemented. However, it is simple enough to add one using the decrease key logic, and simply removing the decrease portion.
If k is a constant, then any meldable heap will do this, including leftist heaps, skew heaps, pairing heaps and Fibonacci heaps. Both merging and getting the first element in these structures typically take O(1) or O(lg n) amortized time, so O( k lg n) maximum.
Note, however, that getting to the k'th element may be destructive in the sense that the first k-1 items may have to be removed from the heap.
If you're willing to accept amortization, you could achieve the desired bounds of O(lg n) time for both meld and search by using a binary search tree to represent each set. Melding two trees of size m and n together requires time O(m log(n / m)) where m < n. If you use amortized analysis and charge the cost of the merge to the elements of the smaller set, at most O(lg n) is charged to each element over the course of all of the operations. Selecting the kth element of each set takes O(lg n) time as well.
I think you could also use a collection of sorted arrays to represent each set, but the amortization argument is a little trickier.
As stated in the other answers, you can use heaps, but getting O(lg n) for both meld and select requires some work.
Finger trees can do this and some more operations:
http://en.wikipedia.org/wiki/Finger_tree
There may be something even better if you are not restricted to purely functional data structures (i.e. aka "persistent", where by this is meant not "backed up on non-volatile disk storage", but "all previous 'versions' of the data structure are available even after 'adding' additional elements").

Resources