Space complexity of reassigning an array - algorithm

What is the space complexity of the following Java code?
public int[] foo(int[] x) {
x = new int[x.length];
// Do stuff with x that does not require additional memory
return x
}
Is it O(1) or O(N)? I've seen both answers. But I can't understand how it could be O(1). I would guess that it's O(N). We create a new array of the same size while the original array might still exist. Thus the original array is not replaced, i.e. we allocated additional storage space that increases linear with the length N of the input array. Am I correct?

The semantic of this piece of code is unsure, as the language is not specified. In any case, O(1) isn't possible because one allocates a new array at the same time that the original exists. (In a garbage collected language, one could imagine, with a lot of bad faith, that x is deallocated then immediately reallocated at the same place.)

O(N) with details that are highly dependent on various external factors such as your operating system.
A naive implementation requires actually zeroing out memory and assigning it. If the space requested is small, and particularly if you've already freed a good place to put it, this is probably what is going to happen. That operation is O(N).
If the space requested is large, you're probably just going to set up page table entries and NOT allocate any space. This is again O(N), but with extremely good constants. As you use the memory it actually has to get assigned, which is no faster than doing it up front. (It is actually slower.) But, in the meantime, being slow to use up memory is good for reducing contention on RAM.

Related

Mutable data types that use stack allocation

Based on my earlier question, I understand the benefit of using stack allocation. Suppose I have an array of arrays. For example, A is a list of matrices and each element A[i] is a 1x3 matrix. The length of A and the dimension of A[i] are known at run time (given by the user). Each A[i] is a matrix of Float64 and this is also known at run time. However, through out the program, I will be modifying the values of A[i] element by element. What data structure can also allow me to use stack allocation? I tried StaticArrays but it doesn't allow me to modify a static array.
StaticArrays defines MArray (MVector, MMatrix) types that are fixed-size and mutable. If you use these there's a higher chance of the compiler determining that they can be stack-allocated, but it's not guaranteed. Moreover, since the pattern you're using is that you're passing the mutable state vector into a function which presumably modifies it, it's not going to be valid or helpful to stack allocate that anyway. If you're going to allocate state once and modify it throughout the program, it doesn't really matter if it is heap or stack allocated—stack allocation is only a big win for objects that are allocated, used locally and then don't escape the local scope, so they can be “freed” simply by popping the stack.
From the code snippet you showed in the linked question, the state vector is allocated in the outer function, test_for_loop, which shouldn't be a big deal since it's done once at the beginning of execution. Using a variably sized state vector to index into an array with a splat (...) might be an issue, however, and that's done in test_function. Using something with fixed size like MVector might be better for that. It might, however, be better still, to use a state tuple and return a new rather than mutated state tuple at the end. The compiler is very good at turning that kind of thing into very efficient code because of immutability.
Note that by convention test_function should be called test_function! since it modifies its M argument and even more so if it modifies the state vector.
I would also note that this isn't a great question/answer pair since it's not standalone at all and really just a continuation of your other question. StackOverflow isn't very good for this kind of iterative question/discussion interaction, I'm afraid.

Stack overflow solution with O(n) runtime

I have a problem related to runtime for push and pop in a stack.
Here, I implemented a stack using array.
I want to avoid overflow in the stack when I insert a new element to a full stack, so when the stack is full I do the following (Pseudo-Code):
(I consider a stack as an array)
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
Now, I know that for a single push operation to the stack with the size of n the action executes in the worst case in O(n).
I want to show that the runtime of n pushes to an empty stack in the worst case is also O(n).
Also how can I update this algorithm that for every push the operation will execute in a constant runtime in the worst case?
Amortized constant-time is often just as good in practice if not better than constant-time alternatives.
Generate a new array with the size of double of the origin array.
Copy all the elements in the origin stack to the new array in the same order.
This is actually a very decent and respectable solution for a stack implementation because it has good locality of reference and the cost of reallocating and copying is amortized to the point of being almost negligible. Most generalized solutions to "growable arrays" like ArrayList in Java or std::vector in C++ rely on this type of solution, though they might not exactly double in size (lots of std::vector implementations increase their size by something closer to 1.5 than 2.0).
One of the reasons this is much better than it sounds is because our hardware is super fast at copying bits and bytes sequentially. After all, we often rely on millions of pixels being blitted many times a second in our daily software. That's a copying operation from one image to another (or frame buffer). If the data is contiguous and just sequentially processed, our hardware can do it very quickly.
Also how can I update this algorithm that for every push the operation
will execute in a constant runtime in the worst case?
I have come up with stack solutions in C++ that are ever-so-slightly faster than std::vector for pushing and popping a hundred million elements and meet your requirements, but only for pushing and popping in a LIFO pattern. We're talking about something like 0.22 secs for vector as opposed to 0.19 secs for my stack. That relies on just allocating blocks like this:
... of course typically with more than 5 elements worth of data per block! (I just didn't want to draw an epic diagram). There each block stores an array of contiguous data but when it fills up, it links to a next block. The blocks are linked (storing a previous link only) but each one might store, say, 512 bytes worth of data with 64-byte alignment. That allows constant-time pushes and pops without the need to reallocate/copy. When a block fills up, it just links a new block to the previous block and starts filling that up. When you pop, you just pop until the block becomes empty and then once it's empty, you traverse its previous link to get to the previous block before it and start popping from that (you can also free the now-empty block at this point).
Here's your basic pseudo-C++ example of the data structure:
template <class T>
struct UnrolledNode
{
// Points to the previous block. We use this to get
// back to a former block when this one becomes empty.
UnrolledNode* prev;
// Stores the number of elements in the block. If
// this becomes full with, say, 256 elements, we
// allocate a new block and link it to this one.
// If this reaches zero, we deallocate this block
// and begin popping from the previous block.
size_t num;
// Array of the elements. This has a fixed capacity,
// say 256 elements, though determined at runtime
// based on sizeof(T). The structure is a VLS to
// allow node and data to be allocated together.
T data[];
};
template <class T>
struct UnrolledStack
{
// Stores the tail end of the list (the last
// block we're going to be using to push to and
// pop from).
UnrolledNode<T>* tail;
};
That said, I actually recommend your solution instead for performance since mine barely has a performance edge over the simple reallocate and copy solutions and yours would have a slight edge when it comes to traversal since it can just traverse the array in a straightforward sequential fashion (as well as straightforward random-access if you need it). I didn't actually implement mine for performance reasons. I implemented it to prevent pointers from being invalidated when you push things to the container (the actual thing is a memory allocator in C) and, again, in spite of achieving true constant-time push backs and pop backs, it's still barely any faster than the amortized constant-time solution involving reallocation and memory copying.

"cut and paste" the last k elements of std::vector efficiently?

Is it possible in C++11 "cut and paste" the last k elements of an std::vector A to a new std:::vector B in constant time?
One way would be to use B.insert(A.end() - k, A.end()) and then use erase on A but these are both O(k) time operations.
Mau
No, vectors own their memory.
This operation is known as splice. forward_list is ridiculously slow otherwise, but it does have an O(1) splice.
Typically, the process of deciding which elements to move is already O(n), so having the splice itself take O(n) time is not a problem. The other operations being faster on vector more than make up for it.
This isn't possible in general, since (at least in the C++03 version -- there it's 23.2.4/1) the C++ standard guarantees that the memory used by a vector<T> is a single contiguous block. Thus the only way to "transfer" more than a fixed number of elements in O(1) time would be if the receiving vector were empty, and you had somehow arranged to have it's allocated block of memory begin at the right place inside the first vector -- in which case the "transfer" could be argued to have taken no time at all. (Deliberately overlapping objects in this way is almost certain to constitute Undefined Behaviour in theory -- and in practice, it's also very fragile, since any operation that invalidates iterators to a vector<T> can also reallocate memory, thus breaking things.)
If you're prepared to sacrifice a whole bunch of portability, I've heard it's possible to play OS-level (or hardware-level) tricks with virtual memory mapping to achieve tricks like no-overhead ring buffers. Maybe these tricks could also be applied here -- but bear in mind that the assumption that the mapping of virtual to physical memory within a single process is one-to-one is very deeply ingrained, so you could be in for some surprises.

Pseudo Least Recently Used Binary Tree

The logic behind Pseudo LRU is to use less bits and to speed up the replacement of the block. The logic is given as "let 1 represent that the left side has been referenced more recently than the right side, and 0 vice-versa"
But I am unable to understand the implementation given in the following diagram:
Details are given at : http://courses.cse.tamu.edu/ejkim/614/CSCE614-2011c-HW4-tutorial.pptx
I'm also studying about Pseudo-LRU.
Here is my understand. Hope it's helpful.
"Hit CL1": there's a referent to CL1, and hit
LRU state (B0 and B1) are changed to inform CL1 is recently referred.
"Hit CL0": there's a referent to CL0, and hit
LRU state (B1) is updated to inform that CL0 is recently used (than CL1)
"Miss; CL2 replace"
There's a miss, and LRU is requested for replacement index.
As current state, CL2 is chose.
LRU state (B0 and B2) are updated to inform CL2 is recently used.
(it's also cause next replacement will be CL1)
I know there is already an answer which clearly explains the photo, but I wanted to post my way of thinking to implement a fast pseudo-LRU algorithm and it's advantages over classic LRU.
From the memory point of view, if there are N objects(pointers, 32/64 bit values) you need N-1 flag bits and a HashMap to store the information of objects( pointer to actual address and position in the array) for querying if an elements exists already in cache. It doesn't use less memory than classic LRU, actually it uses N-1 auxiliar bits.
The optimization comes from cpu time. Comparing some flags takes really no time because they are bits. In classic LRU you must have some sort of structure which permits insertion/deletion and you can take the LRU fast(maybe heap). This structure takes O(log(N)) for a usual operation, but also the comparison between values is expensive. So in the end you end up with O(log(N)^2) complexity per operation, instead of O(log(N)) for Pseudo-LRU.
Even if Pseudo-LRU doesn't always take the LRU object out when there is a cache miss, in practice it seems that it behaves pretty good and it's not a major drawback.

Purpose of Xor Linked List?

I stumbled on a Wikipedia article about the Xor linked list, and it's an interesting bit of trivia, but the article seems to imply that it occasionally gets used in real code. I'm curious what it's good for, since I don't see why it makes sense even in severely memory/cache constrained systems:
The main advantage of a regular doubly linked list is the ability to insert or delete elements from the middle of the list given only a pointer to the node to be deleted/inserted after. This can't be done with an xor-linked list.
If one wants O(1) prepending or O(1) insertion/removal while iterating then singly linked lists are good enough and avoid a lot of code complexity and caveats with respect to garbage collection, debugging tools, etc.
If one doesn't need O(1) prepending/insertion/removal then using an array is probably more efficient in both space and time. Even if one only needs efficient insertion/removal while iterating, arrays can be pretty good since the insertion/removal can be done while iterating.
Given the above, what's the point? Are there any weird corner cases where an xor linked list is actually worthwhile?
Apart from saving memory, it allows for O(1) reversal, while still supporting all the other destructive update operations efficienctly, like
concating two lists destructively in O(1)
insertAfter/insertBefore in O(1), when you only have a reference to the node and its successor/predecessor (which differs slightly from standard doubly linked lists)
remove in O(1), also with a reference to either the successor or predecessor.
I don't think the memory aspect is really important, since for most scenarios where you might use a XOR list, you can use a singly-linked list instead.
It is about saving memory. I had a situation where my data structure was 40 bytes. The memory manager aligned things on a 16 byte boundary, so each allocation was 48 bytes; regardless of the fact that I only needed 40. By using xor chain list, I was able to eliminate 8 bytes and drop my data structure size down to 32 bytes. Now, I can fit 2 nodes in the 64 byte pipeline cache at the same time. So, I was able to reduce memory usage, and improve performance.
Its purpose is (or more precisely was) just to save memory.
With a xor-linked-list you can do anything you can do with a ordinary doubly-linked list. The only difference is that you have to decode the previous and next memory addresses from the xor-ed pointer for each node every time you need them.

Resources