What is the best way to implement Tree : LinkedList - Array - data-structures

I am preparing for an exam, one of the questions I came across is : what is the best way to implement tree, LinkedList or Array.
Most likely:
- Array uses 1 address
- LinkedList use two addresses.
Using LinkedList, we can insert the value we need (we manage perfectly the memory), but most likey use O(N) to access to this element, while in Array it O(1).
How should I answer this question ? Or should I just say that is subjective.

For a Binary Search Tree, the answer would definitely be an array ( at least hopefully an extendable array, like a vector<> so you aren't limited to a fixed size). I'll do an analysis of the common operations, assuming the tree is balanced.
Query
In a BST, nodes need to have pointers to left and right children and is also very common to have parent pointers. In an array implementation, the "pointers" can simply be integer indexes into the array ( this would mean the array would store Node objects). Thus looking up the parent and children of a node is constant since indexing into the array is constant. O(1). A linked list implementation would probably need to also store a reference to the position as to where their ancestors/children are, thus requiring an O(N) pass through the list to get the desired references.
Search
Starting at the root, array[0], searching would be an O(log N) operation. Searching would just call/get the info of the children per node, which is O(1) amount of work, roughly O(log N) times, thus O(log N) for search in an array.
A linked list would require an O(N) pass through the data to get the required left/right pointers, and can also be done in O(log N) steps, thus producing an O(n log n) search in linked-lists.
Insert
Arrays would be similar to search, except would require additional O(1) constant time for pointer assignments. So O(log N) insert.
Linked-lists would also be similar to their search routine, except with an additional O(n) time for adjusting the pointers, so O(n log n)
Delete
Arrays would also be similar to search, except you could take more than a single O(log n) factor to delete, since you have to traverse back up the tree, but it still is O(log n).
Linked lists would too have the O(n log n) plus more O(n log n) for traversing up. So O(n log n) for linked lists as well.
Conclusion
The answer should be fairly evident by now :) Plus with arrays you'll get the benefit of better caching than linked-lists. Plus, some derivatives of binary search trees, such as heaps (usually min-heaps/max-heaps) are commonly represented as arrays,
I hope this helps :)

An array is what I'd like to call a fundamental type. That is, it's essentially operating on the raw memory of the system. You can wrap a fundamental array with a class to add things like out-of-bounds protection, but you're still going to have to have the raw support for accessing/indexing into a block of memory in anything that isn't assembly.
For a (singly) linked list, in python it would look like this:
class Entry(object):
def __init__(self, value, child = None)
self.child = child
self.value = value
class LinkedList(object):
def __init__(self, root_entry = None):
self.root = root_entry
self.tail = self.root
self.size = 1 if root_entry else 0
def add_entry(value):
new_entry = Entry(value)
if size:
self.tail.child = new_entry
else:
self.root = new_entry
self.tail = new_entry
self.size +=1
...
And so on, you can see that for most operations you simply traverse the root's children till you get where you want to be. Inserting an item simply means replacing that entry's child with the new entry and setting the new entry's child to the old child (and incrementing size)
In general, for singly linked lists element insertion is O(1) if you already have the old entry (which you will for most iterators) or at the tail/root. Finding an element by index is O(n) as you need to traverse the list, and iterators can only move down the list (that is from index = 0 to index = size-1.
Basically all of this information can be studied by simply reading the relevant wikipedia entries. These are all pretty fundamental types that have been around for a very long time.

Related

Data structure for inverting a subarray in log(n)

Build a Data structure that has functions:
set(arr,n) - initialize the structure with array arr of length n. Time O(n)
fetch(i) - fetch arr[i]. Time O(log(n))
invert(k,j) - (when 0 <= k <= j <= n) inverts the sub-array [k,j]. meaning [4,7,2,8,5,4] with invert(2,5) becomes [4,7,4,5,8,2]. Time O(log(n))
How about saving the indices in binary search tree and using a flag saying the index is inverted? But if I do more than 1 invert, it mess it up.
Here is how we can approach designing such a data structure.
Indeed, using a balanced binary search tree is a good idea to start.
First, let us store array elements as pairs (index, value).
Naturally, the elements are sorted by index, so that the in-order traversal of a tree will yield the array in its original order.
Now, if we maintain a balanced binary search tree, and store the size of the subtree in each node, we can already do fetch in O(log n).
Next, let us only pretend we store the index.
Instead, we still arrange elements as we did with (index, value) pairs, but store only the value.
The index is now stored implicitly and can be calculated as follows.
Start from the root and go down to the target node.
Whenever we move to a left subtree, the index does not change.
When moving to a right subtree, add the size of the left subtree plus one (the size of the current vertex) to the index.
What we got at this point is a fixed-length array stored in a balanced binary search tree. It takes O(log n) to access (read or write) any element, as opposed to O(1) for a plain fixed-length array, so it is about time to get some benefit for all the trouble.
The next step is to devise a way to split our array into left and right parts in O(log n) given the required size of the left part, and merge two arrays by concatenation.
This step introduces dependency on our choice of the balanced binary search tree.
Treap is the obvious candidate since it is built on top of the split and merge primitives, so this improvement comes for free.
Perhaps it is also possible to split a Red-black tree or a Splay tree in O(log n) (though I admit I didn't try to figure out the details myself).
Right now, the structure is already more powerful than an array: it allows splitting and concatenation of "arrays" in O(log n), although element access is as slow as O(log n) too.
Note that this would not be possible if we still stored index explicitly at this point, since indices would be broken in the right part of a split or merge operation.
Finally, it is time to introduce the invert operation.
Let us store a flag in each node to signal whether the whole subtree of this node has to be inverted.
This flag will be lazily propagating: whenever we access a node, before doing anything, check if the flag is true.
If this is the case, swap the left and right subtrees, toggle (true <-> false) the flag in the root nodes of both subtrees, and set the flag in the current node to false.
Now, when we want to invert a subarray:
split the array into three parts (before the subarray, the subarray itself, and after the subarray) by two split operations,
toggle (true <-> false) the flag in the root of the middle (subarray) part,
then merge the three parts back in their original order by two merge operations.

Efficiently storing objects allowing rapid access to highest value (hashmap vs. priority queue)

So I have a set of objects X, and each of them has a value v[x].
How can I store the objects X in a way that allows me to efficiently compute the x with the highest value?
Also I would like to be able to change the value of v[x], and have x automatically fall to the correct place in the data structure.
I thought about using a priority queue for this but my friend told me I should use a hashmap instead. Which confused me because hashmaps are unordered.
You are correct, and your friend is wrong: hash map is not going to work, because it is unordered. Hash map may be useful if you wish to maintain values v externally to your objects x, but then it would need a separate data structure, in addition to the one providing the ordering.
Priority queue with a comparator that compares the value v attached to the object x will provide you with a fast way to get the object with the highest value.
No matter what data structure you are going to use, it would be up to you to update it when the value v[x] changes. Generally, you will need to remove the object from the structure, and then insert it back right away, so that it could be placed at its new position according to its updated value.
You have 2 operations that you wish to support efficiently:
Find maximum
Update value
For #1, a priority queue (i.e. heap) is a good idea, but it doesn't allow you to efficiently do #2 - you'll have to look through the whole queue to find the correct node, then update and move (or delete and reinsert) it - this takes O(n).
To support #2 efficiently, you can use a hash map in addition to a priority queue (perhaps this is what your friend was talking about) - have each object map to the applicable node in the tree, then you can find the correct node in expected O(1) and update it in O(log n).
As an alternative, you can use a (self-balancing) binary search tree. You'll firstly sort on the value, then on a unique member of the object (like a unique ID). This will allow you to find any object in O(log n). #1 can be implemented to take O(1) and #2 will take O(log n) (through delete and reinsert).
Lastly, for completeness, elements in a hash map are unordered - you'll have to look through all the values to find the maximum (i.e. it takes O(n)) (but update can be performed in expected O(1)).
Summary:
Find Max Update
Heap only O(1) O(n)
Heap + HM O(1) O(log n) (expected)
BST O(1) O(log n)
HM only O(n) O(1) (expected)

Data Structure for fast position lookup

Looking for a datastructure that logically represents a sequence of elements keyed by unique ids (for the purpose of simplicity let's consider them to be strings, or at least hashable objects). Each element can appear only once, there are no gaps, and the first position is 0.
The following operations should be supported (demonstrated with single-letter strings):
insert(id, position) - add the element keyed by id into the sequence at offset position. Naturally, the position of each element later in the sequence is now incremented by one. Example: [S E L F].insert(H, 1) -> [S H E L F]
remove(position) - remove the element at offset position. Decrements the position of each element later in the sequence by one. Example: [S H E L F].remove(2) -> [S H L F]
lookup(id) - find the position of element keyed by id. [S H L F].lookup(H) -> 1
The naïve implementation would be either a linked list or an array. Both would give O(n) lookup, remove, and insert.
In practice, lookup is likely to be used the most, with insert and remove happening frequently enough that it would be nice not to be linear (which a simple combination of hashmap + array/list would get you).
In a perfect world it would be O(1) lookup, O(log n) insert/remove, but I actually suspect that wouldn't work from a purely information-theoretic perspective (though I haven't tried it), so O(log n) lookup would still be nice.
A combination of trie and hash map allows O(log n) lookup/insert/remove.
Each node of trie contains id as well as counter of valid elements, rooted by this node and up to two child pointers. A bit string, determined by left (0) or right (1) turns while traversing the trie from its root to given node, is part of the value, stored in the hash map for corresponding id.
Remove operation marks trie node as invalid and updates all counters of valid elements on the path from deleted node to the root. Also it deletes corresponding hash map entry.
Insert operation should use the position parameter and counters of valid elements in each trie node to search for new node's predecessor and successor nodes. If in-order traversal from predecessor to successor contains any deleted nodes, choose one with lowest rank and reuse it. Otherwise choose either predecessor or successor, and add a new child node to it (right child for predecessor or left one for successor). Then update all counters of valid elements on the path from this node to the root and add corresponding hash map entry.
Lookup operation gets a bit string from the hash map and uses it to go from trie root to corresponding node while summing all the counters of valid elements to the left of this path.
All this allow O(log n) expected time for each operation if the sequence of inserts/removes is random enough. If not, the worst case complexity of each operation is O(n). To get it back to O(log n) amortized complexity, watch for sparsity and balancing factors of the tree and if there are too many deleted nodes, re-create a new perfectly balanced and dense tree; if the tree is too imbalanced, rebuild the most imbalanced subtree.
Instead of hash map it is possible to use some binary search tree or any dictionary data structure. Instead of bit string, used to identify path in the trie, hash map may store pointer to corresponding node in trie.
Other alternative to using trie in this data structure is Indexable skiplist.
O(log N) time for each operation is acceptable, but not perfect. It is possible, as explained by Kevin, to use an algorithm with O(1) lookup complexity in exchange for larger complexity of other operations: O(sqrt(N)). But this can be improved.
If you choose some number of memory accesses (M) for each lookup operation, other operations may be done in O(M*N1/M) time. The idea of such algorithm is presented in this answer to related question. Trie structure, described there, allows easily converting the position to the array index and back. Each non-empty element of this array contains id and each element of hash map maps this id back to the array index.
To make it possible to insert element to this data structure, each block of contiguous array elements should be interleaved with some empty space. When one of the blocks exhausts all available empty space, we should rebuild the smallest group of blocks, related to some element of the trie, that has more than 50% empty space. When total number of empty space is less than 50% or more than 75%, we should rebuild the whole structure.
This rebalancing scheme gives O(MN1/M) amortized complexity only for random and evenly distributed insertions/removals. Worst case complexity (for example, if we always insert at leftmost position) is much larger for M > 2. To guarantee O(MN1/M) worst case we need to reserve more memory and to change rebalancing scheme so that it maintains invariant like this: keep empty space reserved for whole structure at least 50%, keep empty space reserved for all data related to the top trie nodes at least 75%, for next level trie nodes - 87.5%, etc.
With M=2, we have O(1) time for lookup and O(sqrt(N)) time for other operations.
With M=log(N), we have O(log(N)) time for every operation.
But in practice small values of M (like 2 .. 5) are preferable. This may be treated as O(1) lookup time and allows this structure (while performing typical insert/remove operation) to work with up to 5 relatively small contiguous blocks of memory in a cache-friendly way with good vectorization possibilities. Also this limits memory requirements if we require good worst case complexity.
You can achieve everything in O(sqrt(n)) time, but I'll warn you that it's going to take some work.
Start by having a look at a blog post I wrote on ThriftyList. ThriftyList is my implementation of the data structure described in Resizable Arrays in Optimal Time and Space along with some customizations to maintain O(sqrt(n)) circular sublists, each of size O(sqrt(n)). With circular sublists, one can achieve O(sqrt(n)) time insertion/removal by the standard insert/remove-then-shift in the containing sublist followed by a series of push/pop operations across the circular sublists themselves.
Now, to get the index at which a query value falls, you'll need to maintain a map from value to sublist/absolute-index. That is to say, a given value maps to the sublist containing the value, plus the absolute index at which the value falls (the index at which the item would fall were the list non-circular). From these data, you can compute the relative index of the value by taking the offset from the head of the circular sublist and summing with the number of elements which fall behind the containing sublist. To maintain this map requires O(sqrt(n)) operations per insert/delete.
Sounds roughly like Clojure's persistent vectors - they provide O(log32 n) cost for lookup and update. For smallish values of n O(log32 n) is as good as constant....
Basically they are array mapped tries.
Not quite sure on the time complexity for remove and insert - but I'm pretty sure that you could get a variant of this data structure with O(log n) removes and inserts as well.
See this presentation/video: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Source code (Java): https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...
If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.
This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).
A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.
You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D
Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.
in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

What is the fastest way of updating an ordered array of numbers?

I need to calculate a 1d histogram that must be dynamically maintained and looked up frequently. One idea I had involves keeping an ordered array with the data (cause thus I can determine percentiles in O(1), and this suffices for quickly finding a histogram with non-uniform bins with the exactly same amount of points inside each bin).
So, is there a way that is less than O(N) to insert a number into an ordered array while keeping it ordered?
I guess the answer is very well known but I don't know a lot about algorithms (physicists doing numerical calculations rarely do).
In the general case, you could use a more flexible tree-like data structure. This would allow access, insertion and deletion in O(log) time and is also relatively easy to get ready-made from a library (ex.: C++'s STL map).
(Or a hash map...)
An ordered array with binary search does the same things as a tree, but is more rigid. It might probably be faster for acess and memory use but you will pay when having to insert or delete things in the middle (O(n) cost).
Note, however, that an ordered array might be enough for you: if your data points are often the same, you can mantain a list of pairs {key, count}, ordered by key, being able to quickly add another instance of an existing item (but still having to do more work to add a new item)
You could use binary search. This is O(log(n)).
If you like to insert number x, then take the number in the middle of your array and compare it to x. if x is smaller then then take the number in the middle of the first half else the number in the middle of the second half and so on.
You can perform insertions in O(1) time if you rearrange your array as a bunch of linked-lists hanging off of each element:
keys = Array([0][1][2][3][4]......)
a c b e f . .
d g i . . .
h j .
|__|__|__|__|__|__|__/linked lists
There's also the strategy of keeping two datastructures at the same time, if your update workload supports it without increasing time-complexity of common operations.
So, is there a way that is less than O(N) to insert a number into an
ordered array while keeping it ordered?
Yes, you can use an array to implement a binary search tree using arrays and do the insertion in O(log n) time. How?
Keep index 0 empty; index 1 = root; if node is the left child of parent node, index of node = 2 * index of parent node; if node is the right child of parent node, index of node = 2 * index of parent node + 1.
Insertion will thus be O(log n). Unfortunately, you might notice that the binary search tree for an ordered list might degenerate to a linear search if you don't balance the tree i.e. O(n), which is pointless. Here, you may have to implement a red black tree to keep the height balanced. However, this is quite complicated, BUT insertion can be done with arrays in O(log n). Note that the array elements will no longer be ints; instead, they'll have to be objects with a colour attribute.
I wouldn't recommend it.
Any particular reason this demands an array? You need an data structure which keeps data ordered and allows you to insert quickly. Why not a binary search tree? Or better still, a red black tree. In C++, you could use the Set structure in the Standard template library which is implemented as a red black tree. Gives you O(log(n)) insertion time and the ability to iterate over it like an array.

Resources