Find minimum element of a subsequence - algorithm

Given a sequence S of n integer elements, I need a function min(i,j) that finds the minimum element of the sequence between index i and index j (both inclusive) such that:
Initialization takes O(n);
Memory space O(n);
min(i,j) takes O(log(n)).
Please suggest an algorithm for this.

Segmenttree is that what you need because it fulfils all your requirements.
Initialisation takes O(n) with Segment Tree
Memory is also O(n)
Queries can be done in O(log n)
Beside this, the tree is dynamic and can support updating in O(log n). This means one can modify the element of some element i in O(log n) and still retrieve the minimum.

This TopCoder tutorial: An < O(n), O(1) > approach discusses your problem in a more detail way. In the notation, means the approach takes f(n) complexity to setup, and g(n) complexity to query.
Also, this post chews the algorithm again: Range Minimum Query <O(n), O(1)> approach (from tree to restricted RMQ).
Hope them clarifies your question :)

Segment tree is just what you need(it can be build in O(n) time and one query takes O(log n) time).
Here is an article about it: http://wcipeg.com/wiki/Segment_tree.
Even though there is an algorithm that uses O(n) time for initialization and O(1) time per query, segment tree can be a good choice because it is much simpler.

Related

Analysis of speed and memory for heapsort

I tried googling and wiki'ing these questions but can't seem to find concrete answers. Most of what I found involved using proofs with the master theorem, but I'm hoping for something in plain English that can be more intuitively remembered. Also I am not in school and these questions are for interviewing.
MEMORY:
What exactly does it mean to determine big-o in terms of memory usage? For example, why is heapsort considered to run with O(1) memory when you have to store all n items? Is it because you are creating only one structure for the heap? Or is it because you know its size and so you can create it on the stack, which is always constant memory usage?
SPEED:
How is the creation of the heap done in O(n) time if adding elements is done in O(1) but percolating is done in O(logn)? Wouldn't that mean you do n inserts at O(1) making it O(n) and percolating after each insert is O(logn). So O(n) * O(logn) = O(nlogn) in total. I also noticed most implementations of heap sort use a heapify function instead of percolating to create the heap? Since heapify does n comparisons at O(logn) that would be O(nlogn) and with n inserts at O(1) we would get O(n) + O(nlogn) = O(nlogn)? Wouldn't the first approach yield better performance than the second with small n?
I kind of assumed this above, but is it true that doing an O(1) operation n times would result in O(n) time? Or does n * O(1) = O(1)?
So I found some useful info about building a binary heap from wikipedia: http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap.
I think my main source of confusion was how "inserting" into a heap is both O(1) and O(logn), even though the first shouldn't be called an insertion and maybe just a build step or something. So you wouldn't use heapify anymore after you've already created your heap, instead you'd use the O(logn) insertion method.
The method of adding items iteratively while maintaining the heap property runs in O(nlogn) and creating the heap without respecting the heap property, and then heapifying, actually runs in O(n), the reason which isn't very intuitive and requires a proof, so I was wrong about that.
The removal step to get the ordered items is the same cost, O(nlogn), after each method has a heap that respects the heap property.
So in the end you'd have either an O(1) + O(n) + O(nlogn) = O(nlogn) for the build heap method, and an O(nlogn) + O(nlogn) = O(nlogn) for the insertion method. Obviously the first is preferable, especially for small n.

Heapsort. How is it possible so simulate worstcase-scenario?

I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)

Intuition behind splay tree (self balancing trees)

I am reading the basics of splay trees. The amortized cost of an operation is O(log n) over n operations. Some rough basic idea is that when you access a node, you splay it i.e. you take it to root so next time this is quickly accessed and also if the node was deep, it enhances balance-ness of tree.
I don't understand how the tree can perform amortized O(log n) for this sample input:
Say a tree of n nodes is already built. My next n operations are n reads. I access a deep node say at depth n. This takes O(n). True that after this access, the tree will become balanced. But say every time I access the most current deep node. This will never be less than O(log n). then how we can ever compensate for the first costly O(n) operation and bring the amortized cost of each read as O(log n)?
Thanks.
Assuming your analysis is correct and the operations are O(log(n)) per access and O(n) the first time...
If you always access the bottommost element (using some kind of worst-case oracle), a sequence of a accesses will take O(a*log(n) + n). And thus the amortized cost per operation is O((a*log(n) + n)/a)=O(log(n) + n/a) or just O(log(n)) as the number of accesses grows large.
This is the definition of asymptotic average-case performance/time/space, also called "amortized performance/time/space". You are accidentally thinking that a single O(n) step means all steps are at least O(n); one such step is only a constant amount of work in the long run; the O(...) is hiding what's really going on, which is taking the limit of [total amount of work]/[queries]=[average ("amortized") work per query].
This will never be less than O(log n).
It has to be in order to get O(log n) average performance. To get intuition, the following website may be good: http://users.informatik.uni-halle.de/~jopsi/dinf504/chap4.shtml specifically the image http://users.informatik.uni-halle.de/~jopsi/dinf504/splay_example.gif -- it seems that while performing the O(n) operations, you move the path you searched scrunching it towards the top of the tree. You probably only have a finite number of such O(n) operations to perform until the entire tree is balanced.
Here's another way to think about it:
Consider an unbalanced binary search tree. You can spend O(n) time balancing it. Assuming you don't add elements to it*, it takes O(log(n)) amortized time per query to fetch an element. The balancing setup cost is included in the amortized cost because it is effectively a constant which, as demonstrated in the equations in the answer, disappears (is dwarfed) by the infinite amount of work you are doing. (*if you do add elements to it, you need a self-balancing binary search tree, one of which is a splay tree)

Is there one type of set-like data structure supporting merging in O(logn) time and k-th search in O(logn) time?(n is the size of this set)

Is there one type of set-like data structure supporting merging in O(logn) time and k-th element search in O(logn) time? n is the size of this set.
You might try a Fibonacci heap which does merge in constant amortized time and decrease key in constant amortized time. Most of the time, such a heap is used for operations where you are repeatedly pulling the minimum value, so a check-for-membership function isn't implemented. However, it is simple enough to add one using the decrease key logic, and simply removing the decrease portion.
If k is a constant, then any meldable heap will do this, including leftist heaps, skew heaps, pairing heaps and Fibonacci heaps. Both merging and getting the first element in these structures typically take O(1) or O(lg n) amortized time, so O( k lg n) maximum.
Note, however, that getting to the k'th element may be destructive in the sense that the first k-1 items may have to be removed from the heap.
If you're willing to accept amortization, you could achieve the desired bounds of O(lg n) time for both meld and search by using a binary search tree to represent each set. Melding two trees of size m and n together requires time O(m log(n / m)) where m < n. If you use amortized analysis and charge the cost of the merge to the elements of the smaller set, at most O(lg n) is charged to each element over the course of all of the operations. Selecting the kth element of each set takes O(lg n) time as well.
I think you could also use a collection of sorted arrays to represent each set, but the amortization argument is a little trickier.
As stated in the other answers, you can use heaps, but getting O(lg n) for both meld and select requires some work.
Finger trees can do this and some more operations:
http://en.wikipedia.org/wiki/Finger_tree
There may be something even better if you are not restricted to purely functional data structures (i.e. aka "persistent", where by this is meant not "backed up on non-volatile disk storage", but "all previous 'versions' of the data structure are available even after 'adding' additional elements").

How to compute the algorithmic space complexity

I am reviewing my data structures and algorithm analysis lesson, and I get a question that how to determine to the space complexity of merge sort and quick sort
algorithms ?
The depth of recursion is only O(log n) for linked list merge-sort
The amount of extra storage space needed for contiguous quick sort is O(n).
My thoughts:
Both use divide-and-conquer strategy, so I guess the space complexity of linked list merge sort should be same as the contiguous quick sort. Actually I opt for O(log n) because before every iteration or recursion call the list is divided in half.
Thanks for any pointers.
The worst case depth of recursion for quicksort is not (necessarily) O(log n), because quicksort doesn't divide the data "in half", it splits it around a pivot which may or may not be the median. It's possible to implement quicksort to address this[*], but presumably the O(n) analysis was of a basic recursive quicksort implementation, not an improved version. That would account for the discrepancy between what you say in the blockquote, and what you say under "my thoughts".
Other than that I think your analysis is sound - neither algorithm uses any extra memory other than a fixed amount per level of recursion, so depth of recursion dictates the answer.
Another possible way to account for the discrepancy, I suppose, is that the O(n) analysis is just wrong. Or, "contiguous quicksort" isn't a term I've heard before, so if it doesn't mean what I think it does ("quicksorting an array"), it might imply a quicksort that's necessarily space-inefficient in some sense, such as returning an allocated array instead of sorting in-place. But it would be silly to compare quicksort and mergesort on the basis of the depth of recursion of the mergesort vs. the size of a copy of the input for the quicksort.
[*] Specifically, instead of calling the function recursively on both parts, you put it in a loop. Make a recursive call on the smaller part, and loop around to do the bigger part, or equivalently push (pointers to) the larger part onto a stack of work to do later, and loop around to do the smaller part. Either way, you ensure that the depth of the stack never exceeds log n, because each chunk of work not put on the stack is at most half the size of the chunk before it, down to a fixed minimum (1 or 2 if you're sorting purely with quicksort).
I'm not really familiar with the term "contiguous quicksort". But quicksort can have either O(n) or O(log n) space complexity depending on how it is implemented.
If it is implemented as follows:
quicksort(start,stop) {
m=partition(start,stop);
quicksort(start,m-1);
quicksort(m+1,stop);
}
Then the space complexity is O(n), not O(log n) as is commonly believed.
This is because you are pushing onto the stack twice at each level, so the space complexity is determined from the recurrance:
T(n) = 2*T(n/2)
Assuming the partitioning divides the array into 2 equal parts (best case). The solution to this according to the Master Theorem is T(n) = O(n).
If we replace the second quicksort call with tail recursion in the code snippet above, then you get T(n) = T(n/2) and therefore T(n) = O(log n) (by case 2 of the Master theorem).
Perhaps the "contiguous quicksort" refers to the first implementation because the two quicksort calls are next to each other, in which case the space complexity is O(n).

Resources