Data Structure for a particular problem? - algorithm

Which data structure can perform insertion, deletion and searching operation in O(1) time in the worst case?
We may assume the set of elements are integers drawn from a finite set 1,2,...,n, and initialization can take O(n) time.
I can only think of implementing a hash table.
Implementing it with Trees will not give O(1) time complexity for any of the operation. Or is it possible??
Kindly share your views on this, or any other data structure apart from these..
Thanks..

Although this sounds like homework, given enough memory you can just use an array. Access to any one element will be O(1). If you use each cell to keep the count of how many integers of that type have been encountered insertion will also be O(1). Searching will be O(1) because it would require indexing the array at that index and seeing the count. This is basically how radix sort works.

Depending on the range of elements an array might do but for a lot of data you want a hashtable. It will give you O(1) amortized operations.

Related

Is it possible to create a LinkedList implementation with O(1) Insertion, Deletion, and O(1) Access to Minimum?

Assume you have unlimited space complexity for this problem. I believe I was showed the solution and I have completely forgotten it. If I recall correctly, one solution involved a stack to keep track of the min and the other involved adding a data values to the LinkedList Node.
A Minimum-Heap implementation would result in Log(n) insertion and deletion but is there a way to make it O(1)?
What would be the implementation of a data structure that can do this, if it is even possible.
If you had such a datastructure, you could sort n items using O(n) comparisons: add them to the list, then repeatedly find the minimum and remove it.
So it's not possible in general for this datastructure to exist with these performance bounds.

Why not use an array of pointers to optimize linked list but use skip list?

Recently I was learning about the skip list, and I've learned that it is designed to speed up the lookup of linked list. But I am wondering why not use a data structure which adds an array of pointers of all nodes based on the structure of the linked list ? For a list of 2^n nodes, if each levels we have half of the number of pointers of the lower level we will add 2^n-1 ponters, it's almost the same number of adding a list of pointers of each nodes, and at the same time it's O(1) to access by index.
There must be some reasons not to implement my idea, can anyone tell me?
A skip list offers average performance of O(log(n)) to read, insert, update and delete the element at any index.
Your array of pointers idea offers average performance of O(1), O(n), O(1) and O(n) for the same operations. Admittedly with a good constant on the O(n) operations, but still that is how it scales.
Both data structures are used in practice. They are just used for different situations.

What datastructure is effective for minimizing the cost of look ups in hash table buckets?

Given a hash table with collisions, the generic hash table implementation will cause look ups within a bucket to run in O(n), assuming that a linkedlist is used.
If we switch the linked list for a binary search tree, we go down to O(log n). Is this the best we can do, or is there a better data structure for this use case?
Using hash tables for the buckets themselves would bring the look up time to O(1), but that would require clever revisions of the hash function.
There is trade-off between insertion time to look-up time in your solution. (Keep bucket sorted)
If you want to keep every bucket sorted, you will get O(log n) look-up time using Binary search. However when you insert a new element, you will have to place him in the right location so the bucket will continue be sorted - O(log n) search time for placing new element.
So in your solution, you get total complexity O(log n) for both insertion and look-up.
(In contrast to the traditional solution that take O(n) for look-up in the worst case, and O(1) for insertion)
EDIT :
If you choose to use a sorted bucket, of course you can't use LinkedList any more. You can switch to any other suitable data structure.
Perfect hashing is known to achieve collision-free O(1) hashing of a limited set of keys known at the time the hash function is constructed. The Wikipedia article mensions several aproaches to apply those ideas to a dynamic set of keys, like dynamic perfect hashing and cuckoo hashing, which might be of interest to you.
You've pretty much answered your own question. Since a hash table is just an array of other data structures, your lookup time is just dependent on the lookup time of the secondary data structure and how well your hash function distributes items across the buckets.

find index of element inside a collection, which collection to use?

I have a problem choosing the right data structure/s, these are the requirements:
I must be able to insert and delete elements
I must also be able to get the index of the element in the collection (order in the collection)
Elements has an unique identifier number
I can sort (if necessary) the elements using any criterium
Ordering is not really a must, the important thing is getting the index of the element, no matters how is internally implemented, but anyway I think that the best approach is ordering.
The index of the element is the order inside the collection. So some kind of order has to be used. When I delete an element, the other elements from that to the end change their order/index.
First approach is using a linked list, but I don't want O(n).
I have also thinked about using and ordered dictionary, that would give O(log n) for lookup/insert/delete, isn't it?
Is there a better approach? I know a TRIE would give O(1) for common operations, but I don't see how to get the index of an element, I would have to iterate over the trie and would give O(n), am I wrong?
Sounds like you want an ordered data structure, i.e. a (balanced) BST. Insertion and deletion would indeed be O(lg n), which suffices for many applications. If you also want elements to have an index in the structure, then you'd want an order statistic tree (see e.g., CLR, Introduction to Algorithms, chapter 14) which provides this operation in O(lg n). Dynamically re-sorting the entire collection would be O(n lg n).
If by "order in the collection" you mean any random order is good enough, then just use a dynamic array (vector): amortized O(1) append and delete, O(n lg n) in-place sort, but O(n) lookup until you do the sort, after which lookup becomes O(lg n) with binary search. Deletion would be O(n) if the data is to remain sorted, though.
If your data is string-like, you might be able to extend a trie in the same that a BST is extended to become an order statistic tree.
You don't mention an array/vector here, but it meets most of these criteria.
(Note that "Elements has a unique identifer number" is really irrespective of datastructure; does this mean the same thing as the index? Or is it an immutable key, which is more a function of the data you're putting into the structure...)
There are going to be timing tradeoffs in any scenario: you say linked list is O(n), but O(n) for what? You don't really get into your performance requirements for additions vs. deletes vs. searches; which is more important?
Well if your collection is sorted, you don't need O(n) to find elements. It's possible to use binary search for example to determine index of element. Also it's possible to write simple wrapper about Entry inside your array, which remember its index inside collection.

Fastest data structure for inserting/sorting

I need a data structure that can insert elements and sort itself as quickly as possible. I will be inserting a lot more than sorting. Deleting is not much of a concern and nethier is space. My specific implementation will additionally store nodes in an array, so lookup will be O(1), i.e. you don't have to worry about it.
If you're inserting a lot more than sorting, then it may be best to use an unsorted list/vector, and quicksort it when you need it sorted. This keeps inserts very fast. The one1 drawback is that sorting is a comparatively lengthy operation, since it's not amortized over the many inserts. If you depend on relatively constant time, this can be bad.
1 Come to think of it, there's a second drawback. If you underestimate your sort frequency, this could quickly end up being overall slower than a tree or a sorted list. If you sort after every insert, for instance, then the insert+quicksort cycle would be a bad idea.
Just use one of the self-balanced binary search trees, such as red-black tree.
Use any of the Balanced binary trees like AVL trees. It should give O(lg N) time complexity for both of the operations you are looking for.
If you don't need random access into the array, you could use a Heap.
Worst and average time complexity:
O(log N) insertion
O(1) read largest value
O(log N) to remove the largest value
Can be reconfigured to give smallest value instead of largest. By repeatedly removing the largest/smallest value you get a sorted list in O(N log N).
If you can do a lot of inserts before each sort then obviously you should just append the items and sort no sooner than you need to. My favorite is merge sort. That is O(N*Log(N)), is well behaved, and has a minimum of storage manipulation (new, malloc, tree balancing, etc.)
HOWEVER, if the values in the collection are integers and reasonably dense, you can use an O(N) sort, where you just use each value as an index into a big-enough array, and set a boolean TRUE at that index. Then you just scan the whole array and collect the indices that are TRUE.
You say you're storing items in an array where lookup is O(1). Unless you're using a hash table, that suggests your items may be dense integers, so I'm not sure if you even have a problem.
Regardless, memory allocating/deleting is expensive, and you should avoid it by pre-allocating or pooling if you can.
I had some good experience for that kind of task using a Skip List
At least in my case it was about 5 times faster compared to adding everything to a list first and then running a sort over it at the end.

Resources