What is a Shadow Array - data-structures

What is a shadow array and how is it implemented?
I came through the term while reading about compiler optimizations but I couldn't find any substantial reference about it.

When using arrays to implement dynamically resizable abstract data types, such as a List, Queue or Stack, the obvious problem that one encounters is that arrays are not themselves freely resized. At some point, then, if one adds enough items to an array, one will eventually run out of space.
The naive solution to this problem is to wait until the array being used runs out of space and then create a new larger array, copy all of the items from the old array into the new array, and the start using the new array.
A shadow array using implementation of an abstract data type is an alternative to this. Instead of waiting until the old array is full, a second, larger array is created after some threshold of fullness is passed on the array that's being used. Thereafter, as items are added to the old array, multiple items are copied from the old array to the shadow array, such that when the old array is full, all of it's items have already been copied to the new array.
The advantage of using a shadow array implementation instead of the naive "copy everything at the end" approach is that the time required by each add operation is much more consistent.

I think of it as a form of dynamic array.
The term shadow would referr to the underlying algorithms that try to resize it with good performance but are hidden behind an easy interface. (For example ArrayList in Java)

Is this what you're looking for? (Scroll to the bottom.)

Related

Best statically allocated data structure for writing and extending contiguous blocks of data?

Here's what I want to do:
I have an arbitrary number of values of a different kind: string, int, float, bool, etc. that I need to store somehow. Multiple elements are often written and read as a whole, forming "contiguous blocks" that can also be extended and shortened at the users wish and even elements in the middle might be taken out. Also, the whole thing should be statically allocated.
I was thinking about using some kind of statically allocated forward lists. The way I imagine this to work is defining an array of a struct containing one std::variant field and a field "previous head" which always points to the location of the previous head of the list. A new element is always placed at the globally known "head" which it stores inside "previous head" field. This way I can keep track of holes inside my list because once an element is taken out, its location is written to global head and will be filled up by subsequent inserts.
This approach however has downsides: When a "contiguous block" is extended, there might be the case that further elements of other blocks have already queued up in the list past its last element. So I either need to move all subsequent entries or copy over the last element in the previous list and insert a link object that allows me to jump to the new location when traversing the contiguous block.
The priority to optimize this datastructure is following (by number of use cases):
Initially write contigous blocks
read the whole data structure
add new elements to contigous blocks
remove elements of contigous blocks
At the moment my data structure will have time complexity of O(1) für writes, O(n) for continous reads (with the caveat that in the worst case there is a jump to the next location inside the array every other element), O(1) for adding new elements and O(1) for removing elements. However, space complexity is S(2n) in the worst case (when I have to do a jump every second time the slot to store data is lost to the "link").
What I'm wondering now is: Is the described way the best viable way to accomplish what I'm trying or is there a better data structure? Is there an official name for this data structure?

resizing tradeoffs when implementing Hashtable using linear probing

I am trying to implement a hashtable using linear probing.
Before inserting a (key, value) pair into the hashtable, I want to check if it's half full. If it is, I need to double the size of the underlying array.
Obviously, there are two ways to do that:
One is to create another array with the doubled size, rehash all entries in the old one and add them to the new array. Then, rebind the old array to the new one. This way is easy to implement but uses a lot of space.
The other one is to double the array and do the rehashing in-place. It seems that this way may lead to longer running time because rehashing may cause collisions with both newly hashed slots and old slots.
Which way should I use?
Your second solution only saves space during the resize process if there is in fact room to expand the existing hash table in-place - I think the chances of that being the case for a large hash table are quite slim, so I would just go for your first solution.

Using dynamic array to handle collisions in hash tables

Looking around at some of the hash table implementations, separate chaining seems to be handled via a linked list or a tree. Is there a reason why a dynamic array is not used? I would imagine that having a dynamic array would have better cache performance as well. However, since I've not seen any such implementation, I'm probably missing something.
What am I missing?
One advantage of a linked list over a dynamic array is that rehashing can be accomplished more quickly. Rather than having to make a bunch of new dynamic arrays and then copy all the elements from the old dynamic arrays into the new, the elements from the linked lists can be redistributed into the new buckets without performing any allocations.
Additionally, if the load factor is small, the space overhead of using linked lists may be better than the space overhead for dynamic arrays. When using dynamic arrays, you usually need to store a pointer, a length, and a capacity. This means that if you have an empty dynamic array, you end up needing space for two integers and a pointer, plus any space preallocated to hold the elements. In an empty bucket, this space overhead is large compared to storing just a null pointer for a linked list. On the other hand, if the buckets have large numbers of elements in them, then dynamic arrays will be a bit more space-efficient and have higher performance due to locality of reference.
Hope this helps!
One advantage i can think of are the deletes ..while addition is done at the head of the hash..if i want to delete a value in the hash it will be difficult for array as it may be present in the middle of the array.

Optimizing Inserting into the Middle of a List

I have algorithms that works with dynamically growing lists (contiguous memory like a C++ vector, Java ArrayList or C# List). Until recently, these algorithms would insert new values into the middle of the lists. Of course, this was usually a very slow operation. Every time an item was added, all the items after it needed to be shifted to a higher index. Do this a few times for each algorithm and things get really slow.
My realization was that I could add the new items to the end of the list and then rotate them into position later. That's one option!
Another option, when I know how many items I'm adding ahead of time, is to add that many items to the back, shift the existing items and then perform the algorithm in-place in the hole I've made for myself. The negative is that I have to add some default value to the end of the list and then just overwrite them.
I did a quick analysis of these options and concluded that the second option is more efficient. My reasoning was that the rotation with the first option would result in in-place swaps (requiring a temporary). My only concern with the second option is that I am creating a bunch of default values that just get thrown away. Most of the time, these default values will be null or a mem-filled value type.
However, I'd like someone else familiar with algorithms to tell me which approach would be faster. Or, perhaps there's an even more efficient solution I haven't considered.
Arrays aren't efficient for lots of insertions or deletions into anywhere other than the end of the array. Consider whether using a different data structure (such as one suggested in one of the other answers) may be more efficient. Without knowing the problem you're trying to solve, it's near-impossible to suggest a data structure (there's no one solution for all problems). That being said...
The second option is definitely the better option of the two. A somewhat better option (avoiding the default-value issue): simply copy 789 to the end and overwrite the middle 789 with 456. So the only intermediate step would be 0123789789.
Your default-value concern is, however, (generally) not a big issue:
In Java, for one, you cannot (to my knowledge) even assign memory for an array that's not 0- or null-filled. C++ STL containers also enforce this I believe (but not C++ itself).
The size of a pointer compared to any moderate-sized class is minimal (thus assigning it to a default value also takes minimal time) (in Java and C# everything is pointers, in C++ you can use pointers (something like boost::shared_ptr or a pointer-vector is preferred above straight pointers) (N/A to primitives, which are small to start, so generally not really a big issue either).
I'd also suggest forcing a reallocation to a specified size before you start inserting to the end of the array (Java's ArrayList::ensureCapacity or C++'s vector::reserve). In case you didn't know - varying-length-array implementations tend to have an internal array that's bigger than what size() returns or what's accessible (in order to prevent constant reallocation of memory as you insert or delete values).
Also note that there are more efficient methods to copy parts of an array than doing it manually with for loops (e.g. Java's System.arraycopy).
You might want to consider changing your representation of the list from using a dynamic array to using some other structure. Here are two options that allow you to implement these operations efficiently:
An order statistic tree is a modified type of binary tree that supports insertions and selections anywhere in O(log n) time, as well as lookups in O(log n) time. This will increase your memory usage quite a bit because of the overhead for the pointers and extra bookkeeping, but should dramatically speed up insertions. However, it will slow down lookups a bit.
If you always know the insertion point in advance, you could consider switching to a linked list instead of an array, and just keep a pointer to the linked list cell where insertions will occur. However, this slows down random access to O(n), which could possibly be an issue in your setup.
Alternatively, if you always know where insertions will happen, you could consider representing your array as two stacks - one stack holding the contents of the array to the left of the insert point and one holding the (reverse) of the elements to the right of the insertion point. This makes insertions fast, and if you have the right type of stack implementation could keep random access fast.
Hope this helps!
HashMaps and Linked Lists were designed for the problem you are having. Given a indexed data structure with numbered items, the difficulty of inserting items in the middle requires a renumbering of every item in the list.
You need a data structure which is optimized to make inserts a constant O(1) complexity. HashMaps were designed to make insert and delete operations lightning quick regardless of dataset size.
I can't pretend to do the HashMap subject justice by describing it. Here is a good intro: http://en.wikipedia.org/wiki/Hash_table

removeObjectAtIndex without releasing the object

I have a mutable array and i would like to arrange it in a nested order upon some criteria.
To achieve this I'd like to move certain elements to mutable arrays of another elements, removing them from the main mutable array, but without releasing them. Actually the question is about removing elements from array without releasing them How do i achieve it?
Thanks
You cannot remove an object from an array without the array releasing it. If you want to make sure it sticks around, just retain it yourself first, and release it when you're done. These are pretty cheap operations, so you shouldn't worry too much about it.
Since you're moving items from one array to another, it would be easier if you first added the object to the new array and then removed it from the original array.
When you add it to the new array it is implicitly retained.
When you remove it from the old array it is implicitly released.
This is faster than retaining it, removing it from the array, adding it to the new array and then releasing it.

Resources