Recursion vs iteration with regards to memory usage - algorithm

Suppose I have a recursive as well as an iterative solution (using a stack) to some problem e.g. preorder traversal of a binary tree. With current computers, memory-wise, is there an advantage to using the recursive solution over the iterative version or vice versa for very large trees?
I'm aware that for certain recursive solutions where sub-problems repeat, there are additional time and memory costs if recursion is used. Assume that this is not the case here. For example,
preOrder(Node n){
if (n == null) return;
print(n);
preOrder(n.left);
preOrder(n.right);
}
vs
preOrder(Node n){
stack s;
s.push(n);
while(!s.empty()){
Node node = s.pop();
print(node);
s.push(node.right);
s.push(node.left);
}
}

If there is a risk of stack overflow (in this case, because the trees are not guaranteed to be even semi-balanced), then a robust program will avoid recursion and use an explicit stack.
The explicit stack may use less memory, because stack frames tend to be larger than is strictly necessary to maintain the context of recursive calls. (For example, the stack frame will contain at least a return address as well as the local variables.)
However, if the recursion depth is known to be limited, then not having to dynamically allocate can save space and time, as well as programmer time. For example, walking a balanced binary tree only requires recursion to the depth of the tree, which is log2 of the number of nodes; that cannot be a very large number.
As suggested by a commentator, one possible scenario is that the tree is known to be right-skewed. In that case, you can recurse down the left branches without worrying about stack overflow (as long as you are absolutely certain that the tree is right-skewed). Since the second recursive call is in the tail position, it can just be rewritten as a loop:
void preOrder(Node n) {
for (; n; n = n.right) {
print(n);
preOrder(n.left);
n = n.right;
}
A similar technique is often (and should always be) applied to quicksort: after partitioning, the function recurses on the smaller partition, and then loops to handle the larger partition. Since the smaller partition must be less than half the size of the original array, that will guarantee the recursion depth to be less than log2 of the original array size, which is certainly less than 50 stack frames, and probably a lot less.

Related

What is the easiest DS to a Sorted Sequence with fast inserts and fast searching?

I am looking for a easy-to-implement (in C++) data structure D that maintains a sorted sequence of integers (or any comparable objects) with fast insertion and fast searching (i.e. can check if X is in D, or get ith element). Fast should be something like O(log(n)), O(1) or O(sqrt(N))?
template<type T>
Class D {
D(...) { ... }
void insert(T x);
int get_ith(int x);
boolean check_x(T x);
}
Currently, I know about AVL Trees, Red Black Tree, Skip Lists, Tries, Hashing. All of these require very complex solutions that are hard to code. Tries would probably be the least complex. However, it is hard to do things such as deleting elements and choosing an alphabet for the trie.
I'm hoping to explore novel solutions that's hopefully short and fast.
Thank you in advance!
if you know the upper limit of the values of your integers, you can simply take a chunk of memory and use the set the bits of the number in your set.
if X is in D : O(1)
insert: O(1)
delete: O(1)
Iteration is slow if the integers are sparse in the range of lowest to highest value. In this case also the memory consumption might be inacceptable.

Realistic usage of unrolled skip lists

Why there is no any information in Google / Wikipedia about unrolled skip list? e.g. combination between unrolled linked list and skip list.
Probably because it wouldn't typically give you much of a performance improvement, if any, and it would be somewhat involved to code correctly.
First, the unrolled linked list typically uses a pretty small node size. As the Wikipedia article says: " just large enough so that the node fills a single cache line or a small multiple thereof." On modern Intel processors, a cache line is 64 bytes. Skip list nodes have, on average, two pointers per node, which means an average of 16 bytes per node for the forward pointers. Plus whatever the data for the node is: 4 or 8 bytes for a scalar value, or 8 bytes for a reference (I'm assuming a 64 bit machine here).
So figure 24 bytes, total, for an "element." Except that the elements aren't fixed size. They have a varying number of forward pointers. So you either need to make each element a fixed size by allocating an array for the maximum number of forward pointers for each element (which for a skip list with 32 levels would require 256 bytes), or use a dynamically allocated array that's the correct size. So your element becomes, in essence:
struct UnrolledSkipListElement
{
void* data; // 64-bit pointer to data item
UnrolledSkipListElement* forward_pointers; // dynamically allocated
}
That would reduce your element size to just 16 bytes. But then you lose much of the cache-friendly behavior that you got from unrolling. To find out where you go next, you have to dereference the forward_pointers array, which is going to incur a cache miss, and therefore eliminate the savings you got by doing the unrolling. In addition, that dynamically allocated array of pointers isn't free: there's some (small) overhead involved in allocating that memory.
If you can find some way around that problem, you're still not going to gain much. A big reason for unrolling a linked list is that you must visit every node (up to the node you find) when you're searching it. So any time you can save with each link traversal adds up to very big savings. But with a skip list you make large jumps. In a perfectly organized skip list, for example, you could skip half the nodes on the first jump (if the node you're looking for is in the second half of the list). If your nodes in the unrolled skip list only contain four elements, then the only savings you gain will be at levels 0, 1, and 2. At higher levels you're skipping more than three nodes ahead and as a result you will incur a cache miss.
So the skip list isn't unrolled because it would be somewhat involved to implement and it wouldn't give you much of a performance boost, if any. And it might very well cause the list to be slower.
Linked list complexity is O(N)
Skip list complexity is O(Log N)
Unrolled Linked List complexity can be calculate as following:
O (N / (M / 2) + Log M) = O (2N/M + Log M)
Where M is number of elements in single node.
Because Log M is not significant,
Unrolled Linked List complexity is O(N/M)
If we suppose to combine Skip list with Unrolled linked list, the new complexity will be
O(Log N + "something from unrolled linked list such N1/M")
This means the "new" complexity will not be as better as first someone will think. New complexity might be even worse than original O(Log N). The implementation will more complex as well. So gain is questionable and rather dubious.
Also, since single node will have lots of data, but only single "forward" array, the "tree" will not be so-balanced either and this will ruin O(Log N) part of the equation.

DFS questions (one for rooted trees, one for graphs)

Question 1:
For rooted trees, like binary trees, why do people make sure never to push NULL/nil/null onto the stack?
e.g.
while(!stack.empty() ) {
x = stack.pop();
visit(x);
if(x->right) stack.push(x->right);
if(x->left) stack.push(x->left);
}
instead of
while(!stack.empty() ) {
x = stack.pop();
if(!x) continue;
visit(x);
stack.push(x->left), stack.push(x->right);
}
I mean, the second form more naturally aligns with the recursive form of preorder/DFS, so why do people use the "check before" with iterative usually, and "check after" with recursive? Any reason I am missing, other than saving (n+1) stack spots (which doesn't add to the space complexity)?
Question 2:
DFS on Graph: Using iterative, why do people set visited flag when pushing adjacent nodes of current node, but in recursive, we only do it after the recursion takes place?
e.g. iterative:
while(!stack.empty()){
x=stack.pop();
// we do not need to check if x is visited after popping
for all u adjacent from x
if(!visited[u]) stack.push(u), visited[u]=true; //we check/set before push
}
but in recursive:
void DFS(Graph *G, int x)
{
if( !visited[x] ) return; //we check/set after popping into node
visited[x]=true;
for all u adjacent from x
DFS(G, u); //we do not check if u is already visited before push
}
SO in general, the connection between two questions is: Why we are more careful before pushing valid things onto an actual stack object (iterative methods of DFS), but are less careful when using the hardware stack (recursive)? Am I missing something? Isn't the hardware stack more of a "luxury" because it can overflow (for stack objects, we can declare them off heap) ?
Thank you kind people for the insights.
[ This is not simply coding style, this is about total rearrangements of how algorithms are coded. I'm talking about being careless with harware stack vs. being careful with software stack? I'm also wondering about some technical differences (i.e., are they truly equal methods for all situations). And almost all books follow above patterns. ]
Check when you push or when you pop is both OK, but check when you push perform better,
For example if you have a binary tree of depth 10, if you check when you pop, you are basically traversing a tree of depth 11(It's like you added added two more NULL Children to each leaf) thats 2048 more wasted operation. If its recursive then it means it will make 2048 more none necessary function call.
Other than that it's OK to check before or after, it's just happened that the code you saw been written that way.

Quicksort - which sub-part should be sorted first?

I am reading some text which claims this regarding the ordering of the two recursive Quicksort calls:
... it is important to call the smaller subproblem first, this in conjunction with tail recursion ensures that the stack depth is log n.
I am not at all sure what that means, why should I call Quicksort on the smaller subarray first?
Look at quicksort as an implicit binary tree. The pivot is the root, and the left and right subtrees are the partitions you create.
Now consider doing a depth first search of this tree. The recursive calls actually correspond to doing a depth first search on the implicit tree described above. Also assume that the tree always has the smaller sub-tree as the left child, so the suggestion is in fact to do a pre-order on this tree.
Now suppose you implement the preorder using a stack, where you push only the left child (but keep the parent on the stack) and when the time comes to push the right child (say you maintained a state where you knew whether a node has its left child explored or not), you replace the top of stack, instead of pushing the right child (this corresponds to the tail recursion part).
The maximum stack depth is the maximum 'left depth': i.e. if you mark each edge going to a left child as 1, and going to a right child as 0, then you are looking at the path with maximum sum of edges (basically you don't count the right edges).
Now since the left sub-tree has no more than half the elements, each time you go left (i.e. traverse and edge marked 1), you are reducing the number of nodes left to explore by at least half.
Thus the maximum number of edges marked 1 that you see, is no more than log n.
Thus the stack usage is no more than log n, if you always pick the smaller partition, and use tail recursion.
Some languages have tail recursion. This means that if you write f(x) { ... ... .. ... .. g(x)} then the final call, to g(x), isn't implemented with a function call at all, but with a jump, so that the final call does not use any stack space.
Quicksort splits the data to be sorted into two sections. If you always handle the shorter section first, then each call that consumes stack space has a section of data to sort that is at most half the size of the recursive call that called it. So if you start off with 10 elements to sort, the stack at its deepest will have a call sorting those 10 elements, and then a call sorting at most 5 elements, and then a call sorting at most 2 elements, and then a call sorting at most 1 element - and then, for 10 elements, the stack cannot go any deeper - the stack size is limited by the log of the data size.
If you didn't worry about this, you could end up with the stack holding a call sorting 10 elements, and then a call sorting 9 elements, and then a call sorting 8 elements, and so on, so that the stack was as deep as the number of elements to be sorted. But this can't happen with tail recursion if you sort the short sections first, because although you can split 10 elements into 1 element and 9 elements, the call sorting 9 elements is done last of all and implemented as a jump, which doesn't use any more stack space - it reuses the stack space previously used by its caller, which was just about to return anyway.
Ideally, the list is partitions into two roughly similar size sublists. It doesn't matter much which sublist you work on first.
But if on a bad day the list partitions in the most lopsided way possible, a sublist of two or three items, maybe four, and a sublist nearly as long as the original. This could be due to bad choices of partition value or wickedly contrived data. Imagine what would happen if you worked on the bigger sublist first. The first invocation of Quicksort is holding the pointers/indices for the short list in its stack frame while recursively calling quicksort for the long list. This too partitions badly into a very short list and a long one, and we do the longer sublist first, repeat...
Ultimately, on the baddest of bad days with the wickedest of wicked data, we'll have stack frames built up in number proportional to the original list length. This is quicksort's worst case behavior, O(n) depth of recursive calls. (Note we are talking of quicksort's depth of recursion, not performance.)
Doing the shorter sublist first gets rid of it fairly quickly. We still process a larger number of tiny lists, in proportion to the original list length, but now each one is taken care of by a shallow one or two recursive calls. We still make O(n) calls (performance) but each is depth O(1).
Surprisingly, this turns out to be important even when quicksort is not confronted with wildly unbalanced partitions, and even when introsort is actually being used.
The problem arises (in C++) when the values in the container being sorted are really big. By this, I don't mean that they point to really big objects, but that they are themselves really big. In that case, some (possibly many) compilers will make the recursive stack frame quite big, too, because it needs at least one temporary value in order to do a swap. Swap is called inside of partition, which is not itself recursive, so you would think that the quicksort recursive driver would not require the monster stack-frame; unfortunately, partition usually ends up being inlined because it's nice and short, and not called from anywhere else.
Normally the difference between 20 and 40 stack frames is negligible, but if the values weigh in at, say, 8kb, then the difference between 20 and 40 stack frames could mean the difference between working and stack overflow, if stacks have been reduced in size to allow for many threads.
If you use the "always recurse into the smaller partition" algorithm, the stack cannot every exceed log2 N frames, where N is the number of elements in the vector. Furthermore, N cannot exceed the amount of memory available divided by the size of an element. So on a 32-bit machine, the there could only be 219 8kb elements in a vector, and the quicksort call depth could not exceed 19.
In short, writing quicksort correctly makes its stack usage predictable (as long as you can predict the size of a stack frame). Not bothering with the optimization (to save a single comparison!) can easily cause the stack depth to double even in non-pathological cases, and in pathological cases it can get a lot worse.

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...
If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.
This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).
A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.
You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D
Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.
in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Resources