quick sort implementation using stack - algorithm

I am reading a quick sort implementation using a stack at the following link.
link
My question is regarding the following paragraph.
The policy of putting the larger of the small subfiles on the stack
ensures that each entry on the stack is no more than one-half of the
size of the one below it, so that the stack needs to contain room for
only about lg N entries. This maximum stack usage occurs when the
partition always falls at the center of the file. For random files,
the actual maximum stack size is much lower; for degenerate files it
is likely to be small.
This technique does not necessarily work in a truly recursive
implementation, because it depends on end- or tail-recursion removal.
If the last action of a procedure is to call another procedure, some
programming environments will arrange things such that local variables
are cleared from the stack before, rather than after, the call.
Without end-recursion removal, we cannot guarantee that the stack size
will be small for quicksort.
What does the author mean by "that each entry on the stack is no more than one-half of the size of the one below it"? Could you please give an example of this.
How did the author came to the conclusion that the stack needs space for only about lg N entries?
What does authore mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
Thanks for your time and help.

The policy of putting the larger of the small subfiles on the stack ensures that each entry on the stack is no more than one-half of the size of the one below it,
That is not quite true. Consider you want to sort a 100-element array, and the first pivot goes right in the middle. Then you have a stack
49
50
then you pop the 49-element part off the stack, partition, and push the two parts on the stack. Let's say the choice of pivot was not quite as good this time, there were 20 elements not larger than the pivot. Then you'd get the stack
20
28
50
and each stack entry is more than half of the one below.
But that cannot continue forever, and we have
During the entire sorting, if stack level k is occupied, its size is at most total_size / (2^k).
That is obviously true when the sorting begins, since then there is only one element on the stack, at level 0, which is the entire array of size total_size.
Now, assume the stated property holds on entering the loop (while(!stack.empty())).
A subarray of length s is popped from stack level m. If s <= 1, nothing else is done before the next loop iteration, and the invariant continues to hold. Otherwise, if s >= 2, After partitioning that, there are two new subarrays to be pushed on the stack, with s-1 elements together. The smaller of those two then has a size smaller_size <= (s-1)/2, and the larger has a size larger_size <= s-1. Stack level m will be occupied by the larger of the two, and we have
larger_size <= s-1 < s <= total_size / (2^m)
smaller_size <= (s-1)/2 < s/2 <= total_size / (2^(m+1))
for the stack levels m resp. m+1 at the end of the loop body. The invariant holds for the next iteration.
Since at most one subarray of size 0 is ever on the stack (it is then immediately popped off in the next iteration), there are never more than lg total_size + 1 stack levels occupied.
Regarding
What does author mean by "Without end-recursion removal, we cannot guarantee that the stack size will be small for quicksort" ?
In a recursive implementation, you can have deep recursion, and when the stack frame is not reused for the end-call, you may need linear stack space. Consider a stupid pivot selection, always choosing the first element as pivot, and an already sorted array.
[0,1,2,3,4]
partition, pivot goes in position 0, the smaller subarray is empty. The recursive call for the larger subarray [1,2,3,4], allocates a new stack frame (so there are now two stack frames). Same principle, the next recursive call with the subarray [2,3,4] allocates a third stack frame, etc.
If one has end-recursion removal, i.e. the stack frame is reused, one has the same guarantees as with the manual stack above.

I will try to answer your question (hopefully I am not wrong)...
Every step in quicksort you divide your input into two (one half). By doing so, you need logN. This explains your first and second question ("each entry on the stack is no more than one-half" and "logN" entries)

Related

Amortized analysis of an ordered stack

I was working through a tutorial sheet I found online and came across a question I couldn't figure out how to solve.
http://www.bowdoin.edu/~ltoma/teaching/cs231/fall08/Problems/amortized.pdf
An ordered stack S is a stack where the elements appear in increasing order. It supports the following operations:
Init(S): Create an empty ordered stack.
Pop(S): Delete and return the top element from the ordered stack.
Push(S, x): Insert x at top of the ordered stack and re-establish the increasing
order by repeatedly removing the element immediately below x until x is the
largest element on the stack.
Destroy(S): Delete all elements on the ordered stack.
Argue that the amortized running time of all operations is O(1). Can anyone help?
i think what you can do is,
firstly prove that init(s), pop(S) and destroy() really actually takes O(1) time ( and they really do.)
then for the push(S, x) function that is asymtotically increasing the complexity to O(n) argue that the push() will start with O(1) time and continue to give the same complexity until unless a number smaller than the top of the stack in pushed. the probability of this happening can be calculated to support your argument.
(do comment if something is not correct)

Designing a data structure acts like improved stack

I have been asked to design a data structure which will act like a stack, not limited in size, which will support the following methods, with given run-time restrictions.
push(s) - push s to the data structure - O(1)
pop() - remove and return the last element inserted O(1)
middle() - return the element (without removing) with index n/2 by insertion order where n is the current amount of elements in the data structure. - O(1)
peekAt(k) - return the kth element by insertion order (the bottom of the stack is k=1) - O(log(k))
I thought of using linked list, and always keep a pointer to the middle element, but then I had problem with implemnting peekAt(k). any ideas how can I implement this?
If the O(1) restriction can be relaxed to amortized O(1), a typical variable-length array implementation will do. When you allocate space for the array of current length N, reserve say N extra space at the end. Once you grow beyond this border, reallocate with the new size following the same strategy, copy the old contents there and free the old memory. Of course, you will have to maintain both the allocated and the actual length of your stack. The operations middle and peekAt can be done trivially in O(1).
Conversely, you may also shrink the array if it occupies less than 1/4 of the allocated space if the need arises.
All operations will be amortized O(1). The precise meaning of this is that for any K stack operations since the start, you will have to execute O(K) instructions in total. In particular, the number of reallocations after N pushes will be O(log(N)), and the total amount of elements copied due to reallocation will be no more than 1 + 2 + 4 + 8 ... + N <= 2N = O(N).
This can be done asymptotically better, requiring non-amortized O(1) for each operation, provided that the memory manager's allocate and free perform in O(1) for any size. The basic idea is to maintain the currently allocated stack and the 2x bigger future stack, and to start preparing the bigger copy in advance. Each time you push a value onto the present stack, copy two more elements into the future stack. When the present stack is full, all of its elements will be already copied into the future stack. After that, discard the present stack, declare that the future stack is now the present stack, and allocate a new future stack (currently empty, but allocated 2x bigger than the current one).
If you also need shrinking, you can maintain a smaller copy in a similar fashion when your stack occupies between 1/2 and 1/4 of the allocated space.
As you can see by the description, while this may be theoretically better, it is generally slower since it has to maintain two copies of the stack instead of one. However, this approach can be useful if you have a strict realtime O(1) requirement for each operation.
The implementation using a doubly linked list makes sense to me. Push and Pop would be implemented as it is usually done for a stack; The access to the 'middle' element would be done with an additional reference which would be updated on Push and Pop, depending on whether the number of contained elements would change from even to odd or vice versa. The peekAt Operation could be done using binary search.

recursion and memory usage in it

i recently saw a question which req. to reverse a stack in O(1) space.
1) stack is not necessarily of an array ... we can't access index.
2)number of elements are not known.
i came up with below code and it is working but not convinced that it is O(1) space because i have declared "int temp" exactly n times, suppose there are initially n elements in stack)so it has taken O(n) space.
please tell i am right or not and is there a better way to find the solution?.
code:
#include<bits/stdc++.h>
using namespace std;
stack<int>st;
void rec()
{
if(st.empty())
return;
int temp=st.top();
st.pop();
rec();
st.push(temp);
}
int main()
{
st.push(1);
st.push(2);
st.push(3);
st.push(4);
rec();
}
You can build 2 stacks "back to back" in a single array with n elements. Basically stack #1 is a "normal" stack, and stack #2 grows "downwards" from the end of the array.
Whenever the 2 stacks together contain all n elements, there is no gap between them, so for example popping an element from stack #1 and immediately pushing it onto stack #2 in this situation can be accomplished without even moving any data: just move the top pointer for stack #1 down, and the top pointer for stack #2 physically down (but logically up).
Suppose we start with all elements in stack #1. Now you can pop all of them except the last one, immediately pushing each onto stack #2. The last element you can pop off and store in a temporary place x (O(1) extra storage, which we are allowed). Now pop all n-1 items in stack #2, pushing each in turn back onto stack #1, and then finally push x back onto (the now-empty) stack #2. At this point, we have succeeded in deleting the bottom element in stack #1, and putting it at the top of (well, it's the only element in) stack #2.
Now just recurse: pretend we only have n-1 items, and solve this smaller problem. Keep recursing until all elements have been pushed onto stack #2 in reverse order. In one final step, pop each of them off and push them back onto stack #1.
All in all, O(n^2) steps are required, but we manage with just O(1) space.
The only way I can think of is to write your own stack using a linked list and then swap the head/tail pointers and a "direction" indicator which tells your routine to go forward or backwards when you push/pop. Any other way I can think of would be O(n).
If you know upper limit of n you can also use an array/index instead of a list.
Whether it makes sense to do so is probably dependent on the reason for doing so and the language.

Quicksort - which sub-part should be sorted first?

I am reading some text which claims this regarding the ordering of the two recursive Quicksort calls:
... it is important to call the smaller subproblem first, this in conjunction with tail recursion ensures that the stack depth is log n.
I am not at all sure what that means, why should I call Quicksort on the smaller subarray first?
Look at quicksort as an implicit binary tree. The pivot is the root, and the left and right subtrees are the partitions you create.
Now consider doing a depth first search of this tree. The recursive calls actually correspond to doing a depth first search on the implicit tree described above. Also assume that the tree always has the smaller sub-tree as the left child, so the suggestion is in fact to do a pre-order on this tree.
Now suppose you implement the preorder using a stack, where you push only the left child (but keep the parent on the stack) and when the time comes to push the right child (say you maintained a state where you knew whether a node has its left child explored or not), you replace the top of stack, instead of pushing the right child (this corresponds to the tail recursion part).
The maximum stack depth is the maximum 'left depth': i.e. if you mark each edge going to a left child as 1, and going to a right child as 0, then you are looking at the path with maximum sum of edges (basically you don't count the right edges).
Now since the left sub-tree has no more than half the elements, each time you go left (i.e. traverse and edge marked 1), you are reducing the number of nodes left to explore by at least half.
Thus the maximum number of edges marked 1 that you see, is no more than log n.
Thus the stack usage is no more than log n, if you always pick the smaller partition, and use tail recursion.
Some languages have tail recursion. This means that if you write f(x) { ... ... .. ... .. g(x)} then the final call, to g(x), isn't implemented with a function call at all, but with a jump, so that the final call does not use any stack space.
Quicksort splits the data to be sorted into two sections. If you always handle the shorter section first, then each call that consumes stack space has a section of data to sort that is at most half the size of the recursive call that called it. So if you start off with 10 elements to sort, the stack at its deepest will have a call sorting those 10 elements, and then a call sorting at most 5 elements, and then a call sorting at most 2 elements, and then a call sorting at most 1 element - and then, for 10 elements, the stack cannot go any deeper - the stack size is limited by the log of the data size.
If you didn't worry about this, you could end up with the stack holding a call sorting 10 elements, and then a call sorting 9 elements, and then a call sorting 8 elements, and so on, so that the stack was as deep as the number of elements to be sorted. But this can't happen with tail recursion if you sort the short sections first, because although you can split 10 elements into 1 element and 9 elements, the call sorting 9 elements is done last of all and implemented as a jump, which doesn't use any more stack space - it reuses the stack space previously used by its caller, which was just about to return anyway.
Ideally, the list is partitions into two roughly similar size sublists. It doesn't matter much which sublist you work on first.
But if on a bad day the list partitions in the most lopsided way possible, a sublist of two or three items, maybe four, and a sublist nearly as long as the original. This could be due to bad choices of partition value or wickedly contrived data. Imagine what would happen if you worked on the bigger sublist first. The first invocation of Quicksort is holding the pointers/indices for the short list in its stack frame while recursively calling quicksort for the long list. This too partitions badly into a very short list and a long one, and we do the longer sublist first, repeat...
Ultimately, on the baddest of bad days with the wickedest of wicked data, we'll have stack frames built up in number proportional to the original list length. This is quicksort's worst case behavior, O(n) depth of recursive calls. (Note we are talking of quicksort's depth of recursion, not performance.)
Doing the shorter sublist first gets rid of it fairly quickly. We still process a larger number of tiny lists, in proportion to the original list length, but now each one is taken care of by a shallow one or two recursive calls. We still make O(n) calls (performance) but each is depth O(1).
Surprisingly, this turns out to be important even when quicksort is not confronted with wildly unbalanced partitions, and even when introsort is actually being used.
The problem arises (in C++) when the values in the container being sorted are really big. By this, I don't mean that they point to really big objects, but that they are themselves really big. In that case, some (possibly many) compilers will make the recursive stack frame quite big, too, because it needs at least one temporary value in order to do a swap. Swap is called inside of partition, which is not itself recursive, so you would think that the quicksort recursive driver would not require the monster stack-frame; unfortunately, partition usually ends up being inlined because it's nice and short, and not called from anywhere else.
Normally the difference between 20 and 40 stack frames is negligible, but if the values weigh in at, say, 8kb, then the difference between 20 and 40 stack frames could mean the difference between working and stack overflow, if stacks have been reduced in size to allow for many threads.
If you use the "always recurse into the smaller partition" algorithm, the stack cannot every exceed log2 N frames, where N is the number of elements in the vector. Furthermore, N cannot exceed the amount of memory available divided by the size of an element. So on a 32-bit machine, the there could only be 219 8kb elements in a vector, and the quicksort call depth could not exceed 19.
In short, writing quicksort correctly makes its stack usage predictable (as long as you can predict the size of a stack frame). Not bothering with the optimization (to save a single comparison!) can easily cause the stack depth to double even in non-pathological cases, and in pathological cases it can get a lot worse.

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...
If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.
This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).
A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.
You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D
Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.
in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Resources