I have the following question that I haven't been able to answer:
design a data structure that support the following features:
Insertion will be made to the first empty slot
Access to an object with index i will be made in O(1) time
Doesnt need to support extraction
The goal is to decrease to minimum the amount of unused memory, and the complexity of the insertion.
Show that for a K amount of unused memory the complexity of insertion is O(n/k) in amortized analysis.
Does anyone have an idea?
Related
I'm making a forth-like language that is supposed to run in places memory is a premium.
The language uses a linked list for it's dictionary of language words due to space constraints and lack of resizeable memory.
However, it hurts me to think of the performance costs of the linked list lookup. I understand that the worst case will always be O(n), but I was trying to think of ways to at least improve the typical case when I realized something: what if the "find" method, in addition to finding a key, also performs a single bubble-sort-like operation each time a key is hit. This way the most common keys will "bubble" to the top. Even better, the entire list will be re-weighted as compilation continues and should roughly correlate to a key's continuous statistical likelihood.
Has this technique been used in other places? I'm curious if there is a decent mathematical demonstration of it's runtime complexity (assuming a statistical curve of some names vs others). A single bubble sort operation is clearly O(1) so at least it can't hurt the theoretical runtime complexity.
While this strategy should improve the average run-time in common cases, this does not change the worst-case complexity which is O(n).
Indeed, if the searched key is in the end of the list of size n, a find will run in O(n) time. The bubble-swap operation runs in O(1) time (assuming the key can actually be compared in constant time). The next find operation is a bit faster if the same key is fetched but still O(n). After n fetches of the same key, fetching this key can be done in O(1) time. However, fetching n other key in a specific order can reorder the list so that the initial key is put at the end. More specifically, fetching the item next to the initial key n times does that. In the end, fetching n time the initial key and n other keys in a specific order results in (n + n-1 + n-2 + ... + 2 + 1) + (1 + 2 + ... + n-1 + n) = n*(n-1) = n²-n = O(n²) operations. After these operations, the list should be in the same state then the initial one. Thus, the complexity is clearly O(n).
Note that you can implement a cache to speed up fetches. Many policies exists. You can find some of them described here. Note that this should not impact the complexity, but will certainly greatly improve the execution time. The cache can store an iterator to a node of the linked list. Iterators are not invalidated when items are inserted/deleted (unless the target item is actually deleted).
Note also that linked lists are generally very slow. They are not very efficient in term of memory usage too because the pointer to the next item take some space (8 bytes on a 64-bit architecture). Allocated nodes can also require some hidden space regarding the standard library allocator used (some store pointer metadata like the allocated space). A solution to use less memory is to use linked list containing buckets of key-value pairs.
Note that while balanced binary search trees require a bit more memory, they can be much more efficient to solve this problem. The complexity of finding a key in such data structure is O(log n). A good hash-map implementation can be quite compact in memory (see hopscotch hashing) too although the memory consumption of the hash-map can be quite big when it is resized.
I need to frequently find the minimum value object in a set that's being continually updated. I need to have a priority queue type of functionality. What's the best algorithm or data structure to do this? I was thinking of having a sorted tree/heap, and every time the value of an object is updated, I can remove the object, and re-insert it into the tree/heap. Is there a better way to accomplish this?
A binary heap is hard to beat for simplicity, but it has the disadvantage that decrease-key takes O(n) time. I know, the standard references say that it's O(log n), but first you have to find the item. That's O(n) for a standard binary heap.
By the way, if you do decide to use a binary heap, changing an item's priority doesn't require a remove and re-insert. You can change the item's priority in-place and then either bubble it up or sift it down as required.
If the performance of decrease-key is important, a good alternative is a pairing heap, which is theoretically slower than a Fibonacci heap, but is much easier to implement and in practice is faster than the Fibonacci heap due to lower constant factors. In practice, pairing heap compares favorably with binary heap, and outperforms binary heap if you do a lot of decrease-key operations.
You could also marry a binary heap and a dictionary or hash map, and keep the dictionary updated with the position of the item in the heap. This gives you faster decrease-key at the cost of more memory and increased constant factors for the other operations.
Quoting Wikipedia:
To improve performance, priority queues typically use a heap as their
backbone, giving O(log n) performance for inserts and removals, and
O(n) to build initially. Alternatively, when a self-balancing binary
search tree is used, insertion and removal also take O(log n) time,
although building trees from existing sequences of elements takes O(n
log n) time; this is typical where one might already have access to
these data structures, such as with third-party or standard libraries.
If you are looking for a better way, there must be something special about the objects in your priority queue. For example, if the keys are numbers from 1 to 10, a countsort-based approach may outperform the usual ones.
If your application looks anything like repeatedly choosing the next scheduled event in a discrete event simulation, you might consider the options listed in e.g. http://en.wikipedia.org/wiki/Discrete_event_simulation and http://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-03-FutureEventList.pdf. The later summarizes results from different implementations in this domain, including many of the options considered in other comments and answers - and a search will find a number of papers in this area. Priority queue overhead really does make some difference in how many times real time you can get your simulation to run - and if you wish to simulate something that takes weeks of real time this can be important.
Everybody knows (or must know), that it is impossible to design a list data structure that supports both O(1) insertion in the middle, and O(1) lookup.
For instance, linked list support O(1) insertion, but O(N) for lookup, while arrays support O(1) for lookup, but O(N) for insertion (possibly amortized O(1) for insertion at the beginning, the end, or both).
However, suppose you are willing to trade O(1) insertion with:
Amortized O(1) insertion
O(log(N)) insertion
Then what is the theoretical bound for lookup in each of these cases? Do you know existing data structures? What about memory complexity?
Tree-based data structures, like a rope or finger tree, can often provide logarithmic insertion time at arbitrary positions. The tradeoff is in access time, which tends to also be logarithmic except in special cases, like the ends of a finger tree.
Dynamic arrays can provide amortized constant insertion at the ends, but insertion in the middle requires copying part of the array, and is O(N) in time, as you mention.
It's probably possible to implement a data structure which supports amortized constant middle insertion. If adding to either end, treat as a dynamic array. If inserting in the middle, keep the old array and add a new array "above" it which contains the new "middle" of the list, using the old array for data which is left or right of the middle. Access time would be logarithmic after your first middle insertion, and keeping track of what data was in which layer would quickly get complicated.
This might be the 'tiered' dynamic array mentioned in the wikipedia article, I haven't researched it further.
I suspect the reason no one really uses a data structure like that is that inserting in the middle is infrequently the case you most need to most for, and logarithmic insert (using trees) is good enough for most real world cases.
These are still open problems, but the best bounds that I am aware of are from Arne Andersson's Sublogarithmic searching without multiplications, which has insertions, deletions, and lookups of O(sqrt(lg(n))). However, this comes at a cost of 2^k additional space, where k is the number of bits in the integers being stored in the data structure, hence the reason that we're still using balanced binary trees instead of Andersson's data structure. A variant of the data structure allows O(1) lookups, but then the additional space increases to n2^k where n is the number of elements in the data structure. A randomized variant doesn't use any additional space, but then the sqrt(lg(n)) insertion/deletion/lookup times become average space times instead of worst case times.
This is regarding amortized analysis. Following is text from an article.
Amortized analyis for problems in which one must perform a series of
operations, and our goal is to analyze the time per operation. The
motivation for amortized analysis is that looking at the worst case
time per operation can be too pessimistic if the only way to produce
an expensive is to "set it up" with a a large number of cheap
operations before hand.
Question: What does author mean by last statement i.e., "if the only way to produce an expensive is to "set it up" with a a large number of cheap operations before
hand"? Can any one please explain with example what this statement mean?
Thanks!
Another example. Consider an array that dynamically increases it's capacity when an element is added that exceeds the current capacity. Let increasing the capacity be O(n), where n is the old size of the array. Now, adding an element has a worst case complexity of O(n), because we might have to increase the capacity. The idea behind amortized analysis is that you have to do n simple adds that cost O(1) before the capacity is exhausted. Thus, many cheap operations lead up to one expensive operation. In other words, the expensive operation is amortized by the cheap operations.
The author means that the only way an expensive operation can occour is to be preceded by a big number of cheap operations.
Look at this example:
We have a stack and we want the stack to implement in addition to the usual operations, also a operation called multipop(k) that pop k elements. Now, multipop costs O(min(n, k)), where n is the size of the stack; thus the prerequisite for the multipop to costs for example O(k) is to be precided by at least k cheap push each costing O(1).
Which data structure can perform insertion, deletion and searching operation in O(1) time in the worst case?
We may assume the set of elements are integers drawn from a finite set 1,2,...,n, and initialization can take O(n) time.
I can only think of implementing a hash table.
Implementing it with Trees will not give O(1) time complexity for any of the operation. Or is it possible??
Kindly share your views on this, or any other data structure apart from these..
Thanks..
Although this sounds like homework, given enough memory you can just use an array. Access to any one element will be O(1). If you use each cell to keep the count of how many integers of that type have been encountered insertion will also be O(1). Searching will be O(1) because it would require indexing the array at that index and seeing the count. This is basically how radix sort works.
Depending on the range of elements an array might do but for a lot of data you want a hashtable. It will give you O(1) amortized operations.