Stack overflow solution with O(n) runtime - performance

I have a problem related to runtime for push and pop in a stack.
Here, I implemented a stack using array.
I want to avoid overflow in the stack when I insert a new element to a full stack, so when the stack is full I do the following (Pseudo-Code):
(I consider a stack as an array)
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
Now, I know that for a single push operation to the stack with the size of n the action executes in the worst case in O(n).
I want to show that the runtime of n pushes to an empty stack in the worst case is also O(n).
Also how can I update this algorithm that for every push the operation will execute in a constant runtime in the worst case?

Amortized constant-time is often just as good in practice if not better than constant-time alternatives.
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
This is actually a very decent and respectable solution for a stack implementation because it has good locality of reference and the cost of reallocating and copying is amortized to the point of being almost negligible. Most generalized solutions to "growable arrays" like ArrayList in Java or std::vector in C++ rely on this type of solution, though they might not exactly double in size (lots of std::vector implementations increase their size by something closer to 1.5 than 2.0).
One of the reasons this is much better than it sounds is because our hardware is super fast at copying bits and bytes sequentially. After all, we often rely on millions of pixels being blitted many times a second in our daily software. That's a copying operation from one image to another (or frame buffer). If the data is contiguous and just sequentially processed, our hardware can do it very quickly.
Also how can I update this algorithm that for every push the operation
will execute in a constant runtime in the worst case?
I have come up with stack solutions in C++ that are ever-so-slightly faster than std::vector for pushing and popping a hundred million elements and meet your requirements, but only for pushing and popping in a LIFO pattern. We're talking about something like 0.22 secs for vector as opposed to 0.19 secs for my stack. That relies on just allocating blocks like this:
... of course typically with more than 5 elements worth of data per block! (I just didn't want to draw an epic diagram). There each block stores an array of contiguous data but when it fills up, it links to a next block. The blocks are linked (storing a previous link only) but each one might store, say, 512 bytes worth of data with 64-byte alignment. That allows constant-time pushes and pops without the need to reallocate/copy. When a block fills up, it just links a new block to the previous block and starts filling that up. When you pop, you just pop until the block becomes empty and then once it's empty, you traverse its previous link to get to the previous block before it and start popping from that (you can also free the now-empty block at this point).
Here's your basic pseudo-C++ example of the data structure:
template <class T>
struct UnrolledNode
{
// Points to the previous block. We use this to get
// back to a former block when this one becomes empty.
UnrolledNode* prev;
// Stores the number of elements in the block. If
// this becomes full with, say, 256 elements, we
// allocate a new block and link it to this one.
// If this reaches zero, we deallocate this block
// and begin popping from the previous block.
size_t num;
// Array of the elements. This has a fixed capacity,
// say 256 elements, though determined at runtime
// based on sizeof(T). The structure is a VLS to
// allow node and data to be allocated together.
T data[];
};
template <class T>
struct UnrolledStack
{
// Stores the tail end of the list (the last
// block we're going to be using to push to and
// pop from).
UnrolledNode<T>* tail;
};
That said, I actually recommend your solution instead for performance since mine barely has a performance edge over the simple reallocate and copy solutions and yours would have a slight edge when it comes to traversal since it can just traverse the array in a straightforward sequential fashion (as well as straightforward random-access if you need it). I didn't actually implement mine for performance reasons. I implemented it to prevent pointers from being invalidated when you push things to the container (the actual thing is a memory allocator in C) and, again, in spite of achieving true constant-time push backs and pop backs, it's still barely any faster than the amortized constant-time solution involving reallocation and memory copying.

Related

Mutable data types that use stack allocation

Based on my earlier question, I understand the benefit of using stack allocation. Suppose I have an array of arrays. For example, A is a list of matrices and each element A[i] is a 1x3 matrix. The length of A and the dimension of A[i] are known at run time (given by the user). Each A[i] is a matrix of Float64 and this is also known at run time. However, through out the program, I will be modifying the values of A[i] element by element. What data structure can also allow me to use stack allocation? I tried StaticArrays but it doesn't allow me to modify a static array.
StaticArrays defines MArray (MVector, MMatrix) types that are fixed-size and mutable. If you use these there's a higher chance of the compiler determining that they can be stack-allocated, but it's not guaranteed. Moreover, since the pattern you're using is that you're passing the mutable state vector into a function which presumably modifies it, it's not going to be valid or helpful to stack allocate that anyway. If you're going to allocate state once and modify it throughout the program, it doesn't really matter if it is heap or stack allocated—stack allocation is only a big win for objects that are allocated, used locally and then don't escape the local scope, so they can be “freed” simply by popping the stack.
From the code snippet you showed in the linked question, the state vector is allocated in the outer function, test_for_loop, which shouldn't be a big deal since it's done once at the beginning of execution. Using a variably sized state vector to index into an array with a splat (...) might be an issue, however, and that's done in test_function. Using something with fixed size like MVector might be better for that. It might, however, be better still, to use a state tuple and return a new rather than mutated state tuple at the end. The compiler is very good at turning that kind of thing into very efficient code because of immutability.
Note that by convention test_function should be called test_function! since it modifies its M argument and even more so if it modifies the state vector.
I would also note that this isn't a great question/answer pair since it's not standalone at all and really just a continuation of your other question. StackOverflow isn't very good for this kind of iterative question/discussion interaction, I'm afraid.

"cut and paste" the last k elements of std::vector efficiently?

Is it possible in C++11 "cut and paste" the last k elements of an std::vector A to a new std:::vector B in constant time?
One way would be to use B.insert(A.end() - k, A.end()) and then use erase on A but these are both O(k) time operations.
Mau
No, vectors own their memory.
This operation is known as splice. forward_list is ridiculously slow otherwise, but it does have an O(1) splice.
Typically, the process of deciding which elements to move is already O(n), so having the splice itself take O(n) time is not a problem. The other operations being faster on vector more than make up for it.
This isn't possible in general, since (at least in the C++03 version -- there it's 23.2.4/1) the C++ standard guarantees that the memory used by a vector<T> is a single contiguous block. Thus the only way to "transfer" more than a fixed number of elements in O(1) time would be if the receiving vector were empty, and you had somehow arranged to have it's allocated block of memory begin at the right place inside the first vector -- in which case the "transfer" could be argued to have taken no time at all. (Deliberately overlapping objects in this way is almost certain to constitute Undefined Behaviour in theory -- and in practice, it's also very fragile, since any operation that invalidates iterators to a vector<T> can also reallocate memory, thus breaking things.)
If you're prepared to sacrifice a whole bunch of portability, I've heard it's possible to play OS-level (or hardware-level) tricks with virtual memory mapping to achieve tricks like no-overhead ring buffers. Maybe these tricks could also be applied here -- but bear in mind that the assumption that the mapping of virtual to physical memory within a single process is one-to-one is very deeply ingrained, so you could be in for some surprises.

Purpose of Xor Linked List?

I stumbled on a Wikipedia article about the Xor linked list, and it's an interesting bit of trivia, but the article seems to imply that it occasionally gets used in real code. I'm curious what it's good for, since I don't see why it makes sense even in severely memory/cache constrained systems:
The main advantage of a regular doubly linked list is the ability to insert or delete elements from the middle of the list given only a pointer to the node to be deleted/inserted after. This can't be done with an xor-linked list.
If one wants O(1) prepending or O(1) insertion/removal while iterating then singly linked lists are good enough and avoid a lot of code complexity and caveats with respect to garbage collection, debugging tools, etc.
If one doesn't need O(1) prepending/insertion/removal then using an array is probably more efficient in both space and time. Even if one only needs efficient insertion/removal while iterating, arrays can be pretty good since the insertion/removal can be done while iterating.
Given the above, what's the point? Are there any weird corner cases where an xor linked list is actually worthwhile?
Apart from saving memory, it allows for O(1) reversal, while still supporting all the other destructive update operations efficienctly, like
concating two lists destructively in O(1)
insertAfter/insertBefore in O(1), when you only have a reference to the node and its successor/predecessor (which differs slightly from standard doubly linked lists)
remove in O(1), also with a reference to either the successor or predecessor.
I don't think the memory aspect is really important, since for most scenarios where you might use a XOR list, you can use a singly-linked list instead.
It is about saving memory. I had a situation where my data structure was 40 bytes. The memory manager aligned things on a 16 byte boundary, so each allocation was 48 bytes; regardless of the fact that I only needed 40. By using xor chain list, I was able to eliminate 8 bytes and drop my data structure size down to 32 bytes. Now, I can fit 2 nodes in the 64 byte pipeline cache at the same time. So, I was able to reduce memory usage, and improve performance.
Its purpose is (or more precisely was) just to save memory.
With a xor-linked-list you can do anything you can do with a ordinary doubly-linked list. The only difference is that you have to decode the previous and next memory addresses from the xor-ed pointer for each node every time you need them.

How does the stack work?

I have a couple questions about the stack. One thing I don't understand about the stack is the "pop" and "push" idea. Say I have integers a and b, with a above b on the stack. To access b, as I understand it, a would have to be popped off the stack to access b. So where is "a" stored when it is popped off the stack.
Also if stack memory is more efficient to access than heap memory, why isn't heap memory structured like the stack? Thanks.
So where is "a" stored when it is popped off the stack.
It depends. It goes where the program that's reading the stack decides. It may store the value, ignore it, print it, anything.
Also if stack memory is more efficient to access than heap memory, why
isn't heap memory structured like the stack?
A stack isn't more efficient to access than a heap is, it depends on the usage. The program's flow gets deeper and shallower just like a stack does. Local variables, arguments and return addresses are, in mainstream languages, stored in a stack structure because this kind of structure implements more easily the semantics of what we call a function's stack frame. A function can very efficiently access its own stack frame, but not necessarily its caller functions' stack frames, that is, the whole stack.
On the other hand, the heap would be inefficient if it were implemented that way, because it's expected for the heap to be able to access and possibly delete items anywhere, not just from its top/bottom.
I'm not an expert, but you can sort of think of this like the Tower of Hanoi puzzle. To access a lower disc, you "pop" discs above it and place them elsewhere - in this case, on other stacks, but in the case of programming it could be just a simple variable or pointer or anything. When you've got the item you need, then the other ones can be put back on the stack or moved elsewhere entirely.
Lets take your case scenario .
You have a stack with n elements on it, the last one is a, b is underneath.
pop operation returns the popped value, so if you want to access the second from the top being b, you could do:
var temp = stack.pop()
var b = stack.pop()
stack.push(temp)
However, stack would rarely be used this way. It is a LIFO queue and works best when accessed like a LIFO queue.
It seems you would rather need a collection with a random index based access.
That collection would probably be stored on the heap. Hope it clarified stack pop/push a little.
a is stored wherever you decide to store it. :-) You need to provide a variable in which to store the value at the top of the stack (a) when you remove it, then remove the next item (b) and store it in a different variable to use it, and then push the first value (a) back on the stack.
Picture an actual pile of dirty plates sitting on your counter to your left. You pick one up to wash it (pop it from the "dirty" stack), wash it, dry it, and put it on the top of the clean stack (push it) on your right.
If you want to reach the second plate from the top in either stack, you have to move the top one to get to it. So you pick it up (pop it), put it somewhere temporarily, pick up the next plate (pop it) and put it somewhere, and then put the first one you removed back on the pile (push it back on the stack).
If you can't picture it with plates, use an actual deck of playing cards (or baseball cards, or a stack of paper - anything you can neatly pile ("stack")) and put it on your desk at your left hand. Then perform the steps in my last paragraph, replacing the word "plate" with "card" and physically performing the steps.
So to access b, you declare a variable to store a in, pop a and save it in that variable, pop b into it's own variable, and then push a back onto the stack.

How much stack space does this routine use?

Assuming the tree is balanced, how much stack space will the routine use for a tree of 1,000,000 elements?
void printTree(const Node *node) {
char buffer[1000];
if(node) {
printTree(node->left);
getNodeAsString(node, buffer);
puts(buffer);
printTree(node->right);
}
}
This was one of the algo questions in "The Pragmatic Programmer" where the answer was 21 buffers needed (lg(1m) ~= 20 and with the additional 1 at very top)
But I am thinking that it requires more than 1 buffer at levels lower than top level, due to the 2 calls to itself for left and right node. Is there something I missed?
*Sorry, but this is really not a homework. Don't see this on the booksite's errata.
First the left node call is made, then that call returns (and so its stack is available for re-use), then there's a bit of work, then the right node call is made.
So it's true that there are two buffers at the next level down, but those two buffers are required consecutively, not concurrently. So you only need to count one buffer in the high-water-mark stack usage. What matters is how deep the function recurses, not how many times in total the function is called.
This assuming of course that the code is written in a language similar to C, and that the C implementation uses a stack for automatic variables (I've yet to see one that doesn't), blah blah.
The first call will recurse all the way to the leaf node, then return. Then the second call will start -- but by the time the second call takes place, all activation records from the first call will have been cleared off the stack. IOW, there will only be data from one of those on the stack at any given time.

Resources