quicksort stack size - algorithm

Why do we prefer to sort the smaller partition of a file and push the larger one on stack after partitioning for quicksort(non-recursive implementation)? Doing this reduces the space complexity of quicksort O(log n) for random files. Could someone elaborate it?

As you know, at each recursive step, you partition an array. Push the larger part on the stack, continue working on the smaller part.
Because the one you carry on working with is the smaller one, it is at most half the size of the one you were working with before. So for each range we push on the stack, we halve the size of the range we're working with.
That means we can't push more than log n ranges onto the stack before the range we're working with hits size 1 (and therefore is sorted). This bounds the amount of stack we need to complete the first descent.
When we start processing the "big parts", each "big part" B(k) is bigger than the "small part" S(k) produced at the same time, so we might need more stack to handle B(k) than we needed to handle S(k). But B(k) is still smaller than the previous "small part", S(k-1) and once we're processing B(k), we've taken it back off the stack, which therefore is one item smaller than when we processed S(k), and the same size as when we processed S(k-1). So we still have our bound.
Suppose we did it the other way around - push the small part and continue working with the large part. Then in the pathologically nasty case, we'd push a size 1 range on the stack each time, and continue working with a size only 2 smaller than the previous size. Hence we'd need n / 2 slots in our stack.

Consider the worst case where you partition in such a way that your partition is 1:n. If you sort small subfile first than you only need to use O(1) space, as you push the large subfile and then pop it back (and then again push the large subfile). But, if you sort large subfile first than you need O(N) space, because you keep pushing 1 element array in the stack.
Here is a quote from Algorithms by ROBERT SEDGEWICK (he was the one who wrote paper on this) :
For Quicksort, the combination of end- recursion removal and a policy
of processing the smaller of the two subfiles first turns out to
ensure that the stack need only contain room for about, lg N entries,
since each entry on the stack after the top one must represent a
subfile less than half the size of the previous entry.

OK, am I right that you mean if we make the Quicksort algorithm non-recursive, you have to use a stack where you put partitions on the stack?
If so: an algorithm must allocate for each variable it uses memory. So, if you run two instances of it parallel, they are allocating the double amount of one algorithm memory space...
Now, in a recursive version, you start a new instance of the algorithm (which needs to allocate memory) BUT the instance which calls the recursive one, DOES NOT end, so the allocated memory is needed! -> in fact, we have started lets say 10 recursive instances and need 10*X memory, where X is the memory needed by one instance.
Now, we use the non-recursive algorithm. You MUST only allocate the needed memory ONCE. In fact, helper variables only use the space of one instance. To accomplish the function of the algorithm we must save the already made partitions (or what we haven't done already). In fact, we put it on a stack and take the partitions off until we made the last "recursion" step. So, imagine: you are giving the algorithm an array. The recursive algorithm needs to allocate the whole array and some helper variables with each instance (again: if the recursion depth is 10, we need 10*X memory where the array needs much).
The non-recursive one needs to allocate the array, helper variables only once BUT it needs a stack. However, in the end you won't put so many parts on the stack that the recursive algorithm will need less memory due to the part that we doesn't need to allocate the array again each time/instance.
I hope, I have described it so that you can understand it, but my English isn't soooo good. :)

Related

What data structure to use in maintaining k most frequently dialed numbers in a phone?

I was asked this question in an interview, "How to maintain k most frequent dialed numbers in a phone ?". So what kind of data structure to use in this case ?
The tasks are:
Keep track of the #times each number is dialed;
Keep track of top counted k numbers.
So, you'll have to use Augmented DS. In your case, this will be HashSet and PriorityQueue (aka Heap) of size k with minimum dialed number at the top.
Since the number of times a number has been dialed can only increase, this makes our job a bit easier in the sense that you will never have to pull a number out of Heap because its count went down. Instead you will only add a number that has been dialed and then remove the top of Heap because top is the least dialed number.
The class PhoneNumber would contain:
the phone number;
the count of times it has been dialed; and,
a boolean to tell whether it is in top-k number or not.
General steps would be:
Whenever a number is dialed:
Add it to HashSet if it has never been dialed before with a count of dials = 1 and the boolean tracking its presence in the heap to true;
If it is already present in the HashSet, increase its dial count by 1 making sure the hashing function independent of dialing counts (otherwise you will not be able to retrieve the number back from HashSet);
If the number is in Heap already (which you can know by the boolean in PhoneNumber object), increase its count and heapify() the heap again;
If the number is not in Heap, add the number to the heap and then remove the top, setting the trakcing boolean of the removed number to false. This will ensure that top-k dialed numbers only are present in the heap;
Make sure you don't remove the numbers until the heap's size = k.
Space complexity: O(n) for the n numbers dialed until now, stored in HashSet and referenced in Heap.
Time Complexity: O(k*Log(k)) O(k + Log(k)) for each dialing of number because you have to heapify at each new dial. Since the rearrangement of keys will be done for only 1 number in the worst case, you iterate over k numbers and then sometimes do a Log(k) work for exactly one number. O(1) would be the complexity for getting the k top dialed numbers as they are right there in your heap.
Priority Queue (which is an implementation of a MaxHeap)
A max heap which is known as Priority Queue in many programming languages can be used where each entry will <phone_number, count_of_dial> pair. The max heap will be sorted according to the count_of_dial. Top k items are the answer.
The purpose of this question is twofold:
To get you to ask questions.
To get you to talk through drawbacks and advantages of different approaches.
The interviewer isn't terribly interested in you getting the "right" answer, as much as he's interested in how you approach the problem. For example, the problem as stated is not well specified. You probably should have asked questions like:
Most frequent over what period? All time? Per month? Year to date?
How will this information be used?
How frequently will the information be queried?
How fast does response have to be?
How many phone numbers do you expect it to handle?
Is k constant? Or will users ask at one point for the top 10, and some other time for the top 100?
All of these questions are relevant to solving the problem.
Once you know all of the requirements, then you can start thinking about how to implement a solution. It could be as simple as maintaining a call count with every phone entry. Then, when the data is queried, you run a simple heap selection algorithm on the entire phone list, picking the top k items. That's a perfectly reasonable solution if queries are infrequent. The typical phone isn't going to have a huge number of called numbers, so this could work quite well.
Another possible solution would be to maintain the call count for each number, and then, after every call, run the heap selection algorithm and cache the result. The idea here is that the data can only update when a new call is made, and calls are very infrequent, in terms of computer time. If you could make a call every 15 seconds (quite unlikely), that's only 5,760 calls in a day. Even the slowest phone should be able to keep up with that.
There are other solutions, all of which have their advantages and disadvantages. Balancing memory use, CPU resources, simplicity, development time, and many other factors is a large part of what we do as software developers. The interviewer purposely under-specified a problem with a seemingly straightforward solution in order to see how you approach things.
If you did not do well on the interview, learn from it. Also understand that the interviewer probably thought you were technically competent, otherwise he wouldn't have moved on to see how well you approach problems. You're expected to ask questions. After all, anybody can regurgitate simple solutions to simple problems. But in the real world we don't get simple problems, and the people asking us to do things don't always fully specify the requirements. So we have to learn how to extract the requirements from them: usually by asking questions.
I'd use a structure where I have the number and how many time it was dialed. Would put that in a B-Tree organizing it according to the number of times the numbers was dialed.
Add O(log(n)) [Balanced]
Add O(n) [NOT balanced]
Search O(log(n))
Balance O(log(n))
Add(not balanced) + balance O(log(n))
IN THE WORS CASE: Searching + Adding + balancing it would be O(n). The avarage complexity of all operation in a B-Tree is still O(log(n).
A B-Tree grows by the root not by the leaves, so it'll guarantee that is balanced all the time, since you keep control of it when inserting by splitting down the nodes and moving values down.
In this specific case where I don't have to forget the number that were once dialed is even simpler.
The advantage of it is that it'll be always ordered, so what you are looking for would be the 50 first "nodes(key/value)" of the tree.

What is modified Garwick's Algorithm exactly?

Garwick's Algorithm is an algorithm for dealing with stack overflows. I know the what the original algorithm is and how it works. However, there is a modified Garwick's algorithm and I have a very vague description of it "even stacks growing in the left direction, and odd stacks in the right direction".
The illustration of the modified algorithm from my lecture note is as the following, which is also very vague.
Can anyone help give more details about this modified algorithm, or provide some reference? Thank you!
If you need to put 2 stacks in an array, then you can put start one stack at the start of the array, growing upward as you push on elements, and one stack at the end, growing downward.
This way you don't need to worry about redistributing free space when one of them fills up, because they both use the same free space, and you can freely push onto either stack until the whole array is full.
The modified Garwick algorithm you refer to extends this idea to more than 2 stacks. With the original Garwick algorithm, the array is divided into N segments, and each segment has one stack, with all stacks growing in the same direction. In the modified version, the array is divided into N/2 segments, and each segment has 2 stacks, one growing upward from the start of the segment, and one growing downward from the end.
In the modified algorithm, when one segment fills up, free space is redistributed among segments (pairs of stacks) in the same way that the original algorithm redistributes space among single stacks.

How to get O(log n) space complexity in quicksort? [duplicate]

Why do we prefer to sort the smaller partition of a file and push the larger one on stack after partitioning for quicksort(non-recursive implementation)? Doing this reduces the space complexity of quicksort O(log n) for random files. Could someone elaborate it?
As you know, at each recursive step, you partition an array. Push the larger part on the stack, continue working on the smaller part.
Because the one you carry on working with is the smaller one, it is at most half the size of the one you were working with before. So for each range we push on the stack, we halve the size of the range we're working with.
That means we can't push more than log n ranges onto the stack before the range we're working with hits size 1 (and therefore is sorted). This bounds the amount of stack we need to complete the first descent.
When we start processing the "big parts", each "big part" B(k) is bigger than the "small part" S(k) produced at the same time, so we might need more stack to handle B(k) than we needed to handle S(k). But B(k) is still smaller than the previous "small part", S(k-1) and once we're processing B(k), we've taken it back off the stack, which therefore is one item smaller than when we processed S(k), and the same size as when we processed S(k-1). So we still have our bound.
Suppose we did it the other way around - push the small part and continue working with the large part. Then in the pathologically nasty case, we'd push a size 1 range on the stack each time, and continue working with a size only 2 smaller than the previous size. Hence we'd need n / 2 slots in our stack.
Consider the worst case where you partition in such a way that your partition is 1:n. If you sort small subfile first than you only need to use O(1) space, as you push the large subfile and then pop it back (and then again push the large subfile). But, if you sort large subfile first than you need O(N) space, because you keep pushing 1 element array in the stack.
Here is a quote from Algorithms by ROBERT SEDGEWICK (he was the one who wrote paper on this) :
For Quicksort, the combination of end- recursion removal and a policy
of processing the smaller of the two subfiles first turns out to
ensure that the stack need only contain room for about, lg N entries,
since each entry on the stack after the top one must represent a
subfile less than half the size of the previous entry.
OK, am I right that you mean if we make the Quicksort algorithm non-recursive, you have to use a stack where you put partitions on the stack?
If so: an algorithm must allocate for each variable it uses memory. So, if you run two instances of it parallel, they are allocating the double amount of one algorithm memory space...
Now, in a recursive version, you start a new instance of the algorithm (which needs to allocate memory) BUT the instance which calls the recursive one, DOES NOT end, so the allocated memory is needed! -> in fact, we have started lets say 10 recursive instances and need 10*X memory, where X is the memory needed by one instance.
Now, we use the non-recursive algorithm. You MUST only allocate the needed memory ONCE. In fact, helper variables only use the space of one instance. To accomplish the function of the algorithm we must save the already made partitions (or what we haven't done already). In fact, we put it on a stack and take the partitions off until we made the last "recursion" step. So, imagine: you are giving the algorithm an array. The recursive algorithm needs to allocate the whole array and some helper variables with each instance (again: if the recursion depth is 10, we need 10*X memory where the array needs much).
The non-recursive one needs to allocate the array, helper variables only once BUT it needs a stack. However, in the end you won't put so many parts on the stack that the recursive algorithm will need less memory due to the part that we doesn't need to allocate the array again each time/instance.
I hope, I have described it so that you can understand it, but my English isn't soooo good. :)

Determining the maximum stack depth

Imagine I have a stack-based toy language that comes with the operations Push, Pop, Jump and If.
I have a program and its input is the toy language. For instance I get the sequence
Push 1
Push 1
Pop
Pop
In that case the maximum stack would be 2. A more complicated example would use branches.
Push 1
Push true
If .success
Pop
Jump .continue
.success:
Push 1
Push 1
Pop
Pop
Pop
.continue:
In this case the maximum stack would be 3. However it is not possible to get the maximum stack by walking top to bottom as shown in this case since it would result in a stack-underflow error actually.
CFGs to the rescue you can build a graph and walk every possible path of the basic blocks you have. However since the number of paths can grow quickly for n vertices you get (n-1)! possible paths.
My current approach is to simplify the graph as much as possible and to have less possible paths. This works but I would consider it ugly. Is there a better (read: faster) way to attack this problem? I am fine if the algorithm produces a stack depth that is not optimal. If the correct stack size is m then my only constraint is that the result n is n >= m. Is there maybe a greedy algorithm available that would produce a good result here?
Update: I am aware of cycles and the invariant that all controlf flow merges have the same stack depth. I thought I write down a simple toy-like language to illustrate the issue. Basically I have a deterministic stack-based language (JVM bytecode), so each operation has a known stack-delta.
Please note that I do have a working solution to this problem that produces good results (simplified cfg) but I am looking for a better/faster approach.
Given that your language doesn't seem to have any user input all programs will simply compute in the same way all the time. Therefore, you could execute the program and keep track of the maximum stacksize during execution. Probably not what you want though.
As for your path argumentation: Be aware, that jumping allows cycles, hence, without further analysis a cycle might imply non-termination and stack overflows (i.e. stack size is increased after each cycle execution). [n nodes still means infinitely many paths if there is a cycle]
Instead of actual execution of the code you might be able to do some form of abstract interpretation.
Regarding the comment from IVlad: Simply counting the pushs is wrong due to the existence of possible cycles.
I am not sure what the semantics of your if-statements is though, so this could be useful too: Assume that an if-statement's label can only be a forward label (i.e., you can never jump back in your code). In that case your path counting argument comes back to life. In effect, the resulting CFG will be a tree (or DAG if you don't copy code). In that case you could do an approximative count, by a bottom-up computation of the number of pushs and then taking the maximum number of pushs for both branches in case of an if-statement. It's still not the optimal correct result, but yields a better approximation than a simple count of push-statements.
You generally want to have the stack depth invariant over jumps and loops.
That means that for every node, every incoming edge should have the same stack depth. This simplifies walking the CFG significantly, because backedges can no longer change the stack depth of already calculated instructions.
This also is requirement for bounded stack depth. If not enforced, you will have increasing loops in your code.
Another thing you should consider is making the stack effect of all opcodes deterministic. An example of a nondeterministic opcode would be: POP IF TopOfStack == 0.
Edit:
If you do have a deterministic set of opcodes and the stack depth invariant, there is no need to visit every possible path of the program. It's enough to do a DFS/BFS through the CFG to determine the maximum stack depth. This can be done in linear time (depending on the amount of instructions), but not faster.
Evaluating if the basic blocks, at which the outgoing edges of your current basic block point, still need to be evaluated should not be performance relevant. Even in the worst case, every instruction is an IF, there will be only 2*N edges to evaluate.

Why are hash table expansions usually done by doubling the size?

I've done a little research on hash tables, and I keep running across the rule of thumb that when there are a certain number of entries (either max or via a load factor like 75%) the hash table should be expanded.
Almost always, the recommendation is to double (or double plus 1, i.e., 2n+1) the size of the hash table. However, I haven't been able to find a good reason for this.
Why double the size, rather than, say, increasing it 25%, or increasing it to the size of the next prime number, or next k prime numbers (e.g., three)?
I already know that it's often a good idea to choose an initial hash table size which is a prime number, at least if your hash function uses modulus such as universal hashing. And I know that's why it's usually recommended to do 2n+1 instead of 2n (e.g., http://www.concentric.net/~Ttwang/tech/hashsize.htm)
However as I said, I haven't seen any real explanation for why doubling or doubling-plus-one is actually a good choice rather than some other method of choosing a size for the new hash table.
(And yes I've read the Wikipedia article on hash tables :) http://en.wikipedia.org/wiki/Hash_table
Hash-tables could not claim "amortized constant time insertion" if, for instance, the resizing was by a constant increment. In that case the cost of resizing (which grows with the size of the hash-table) would make the cost of one insertion linear in the total number of elements to insert. Because resizing becomes more and more expensive with the size of the table, it has to happen "less and less often" to keep the amortized cost of insertion constant.
Most implementations allow the average bucket occupation to grow to until a bound fixed in advance before resizing (anywhere between 0.5 and 3, which are all acceptable values). With this convention, just after resizing the average bucket occupation becomes half that bound. Resizing by doubling keeps the average bucket occupation in a band of width *2.
Sub-note: because of statistical clustering, you have to take an average bucket occupation as low as 0.5 if you want many buckets to have at most one elements (maximum speed for finding ignoring the complex effects of cache size), or as high as 3 if you want a minimum number of empty buckets (that correspond to wasted space).
I had read a very interesting discussion on growth strategy on this very site... just cannot find it again.
While 2 is commonly used, it's been demonstrated that it was not the best value. One often cited problem is that it does not cope well with allocators schemes (which often allocate power of twos blocks) since it would always require a reallocation while a smaller number might in fact be reallocated in the same block (simulating in-place growth) and thus being faster.
Thus, for example, the VC++ Standard Library uses a growth factor of 1.5 (ideally should be the golden number if a first-fit memory allocation strategy is being used) after an extensive discussion on the mailing list. The reasoning is explained here:
I'd be interested if any other vector implementations uses a growth factor other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I don't have that compiler here).
There is a technical reason to prefer 1.5 to 2 -- more specifically, to prefer values less than 1+sqrt(5)/2.
Suppose you are using a first-fit memory allocator, and you're progressively appending to a vector. Then each time you reallocate, you allocate new memory, copy the elements, then free the old memory. That leaves a gap, and it would be nice to be able to use that memory eventually. If the vector grows too rapidly, it will always be too big for the available memory.
It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory will always be too big for the hole that has been left sofar; if it is < 1+sqrt(5)/2, the new memory will eventually fit. So 1.5 is small enough to allow the memory to be recycled.
Surely, if the growth factor is >= 2 the new memory will always be too big for the hole that has been left so far; if it is < 2, the new memory will eventually fit. Presumably the reason for (1+sqrt(5))/2 is...
Initial allocation is s.
The first resize is k*s.
The second resize is k*k*s, which will fit the hole iff k*k*s <= k*s+s, i.e. iff k <= (1+sqrt(5))/2
...the hole can be recycled asap.
It could, by storing its previous size, grow fibonaccily.
Of course, it should be tailored to the memory allocation strategy.
One reason for doubling size that is specific to hash containers is that if the container capacity is always a power of two, then instead of using a general purpose modulo for converting a hash to an offset, the same result can be achieved with bit shifting. Modulo is a slow operation for the same reasons that integer division is slow. (Whether integer division is "slow" in the context of whatever else is going in a program is of course case dependent but it's certainly slower than other basic integer arithmetic.)
Doubling the memory when expanding any type of collection is an oftenly used strategy to prevent memory fragmentation and not having to reallocate too often. As you point out there might be reasons to have a prime number of elements. When knowing your application and your data, you might also be able to predict the growth of the number of elements and thus choose another (larger or smaller) growth factor than doubling.
The general implementations found in libraries are exactly that: General implementations. They have to focus on being a reasonable choice in a variety of different situations. When knowing the context, it is almost always possible to write a more specialized and more efficient implementation.
If you don't know how many objects you will end up using (lets say N),
by doubling the space you'll do log2N reallocations at most.
I assume that if you choose a proper initial "n", you increase the odds
that 2*n + 1 will produce prime numbers in subsequent reallocations.
The same reasoning applies for doubling the size as for vector/ArrayList implementations, see this answer.

Resources