Consider the scenario where data to be inserted in an array is always in order, i.e. (1, 5, 12, 20, ...)/A[i] >= A[i-1] or (1000, 900, 20, 1, -2, ...)/A[i] <= A[i-1].
To support such a dataset, is it more efficient to have a binary search tree or an array.
(Side note: I am just trying to run some naive analysis for a timed hash map of type (K, T, V) and the time is always in order. I am debating using Map<K, BST<T,V>> vs Map<K, Array<T,V>>.)
As I understand, the following costs (worst case) apply—
Array BST
Space O(n) O(n)
Search O(log n) O(n)
Max/Min O(1) O(1) *
Insert O(1) ** O(n)
Delete O(n) O(n)
*: Max/Min pointers
**: Amortized time complexity
Q: I want to be more clear about the question. What kind of data structure should I be using for such a scenario between these two? Please feel free to discuss other data structures like self balancing BSTs, etc.
EDIT:
Please note I didn't consider the complexity for a balanced binary search tree (RBTree, etc). As mentioned, a naive analysis using a binary search tree.
Deletion has been updated to O(n) (didn't consider time to search the node).
Max/Min for skewed BST will cost O(n). But it's also possible to store pointers for Max & Min so overall time complexity will be O(1).
See below the table which will help you choose. Note that I am assuming 2 things:
1) data will always come in sorted order - you mentioned this i.e. if 1000 is the last data inserted, new data will always be more than 1000 - if data does not come in sorted order, insertion can take O(log n), but deletion will not change
2) your "array" is actually similar to java.util.ArrayList. In short, its length is mutable. (it is actually unfair compare a mutable and an immutable data structure) However, if it is a normal array, your deletion will take amortized O(log n) {O(log n) to search and O(1) to delete, amortized if you need to create new array} and insertion will take amortized O(1) {you need to create new array}
ArrayList BST
Space O(n) O(n)
Search O(log n) O(log n) {optimized from O(n)}
Max/Min O(1) O(log n) {instead of O(1) - you need to traverse till the leaf}
Insert O(1) O(log n) {optimized from O(n)}
Delete O(log n) O(log n)
So, based on this, ArrayList seems better
Related
I have an AVL tree implementation where the insertion method runs in O(log n) time and the method that returns an in-order list representation runs in O(n^2) time. If I have a list needed to be sorted. By using a for-loop, I can iterate through the list and insert each element into the AVL tree, which will run in O(n log n) time combined. So what is the performance of this entire sorting algorithm (i.e. iterate through the list, insert each element, then use in-order traversal to return a sorted list)?
You correctly say that adding n elements to the tree will take time O(nlog(n)). A simple in-order traversal of a BST can be performed in time O(n). It is thus possible to get a sorted list of the elements in time O(nlog(n) + n) = O(nlog(n)). If the time complexity of your algorithm to generate the sorted list from the tree is quadratic (i.e. in O(n^2) but not always in O(n)) the worst case time complexity of the procedure you describe is in O(nlog(n) + n^2) = O(n^2), which is not optimal.
What will be time Complexity for Displaying data in alphabetical order using skip-list ?
and what time complexity will be for skip list if we implement using quad node ?
Let's assume that you have the input that contains N elements. Firstly you have to construct a skip-list. The complexity of a single insert operation is O(log N) in average so the complexity of inserting N elements would be O(N * log N). When the skip-list is constructed then elements within this list are sorted. So in order to enumerate them you only need to visit each element what is O(N).
It is worth saying that the skip-list is based on randomness. It means that O(log N) complexity of a single insert operation is not guaranteed. The worst case complexity is O(N), It means that in the worst case the complexity of inserting N elements into the skip-list would be O(N^2).
I'm working on a question from a test in Data Structures, I need to suggest a data structure S that will comply with the follwing requirements:
NOTE: S should allow multiple values with the same keys to be inserted
INSERT(S, k): insert object with key k to S with time
complexity O(lg n)
DELETE_OLD(S): Delete the oldest object in S with time complexity
O(lg n)
DELETE_OLD_MIN(S): Delete the oldest object that has the lowest key
in S with time complexity O(lg n)
MAX_COUNT(S): Return the key with the maximum frequency (most
common key in S) with time complexity O(lg n)
FREQ_SUM(S,z): Finding two keys (a and b) in S such that
frequency.a + frequency.b = z with time complexity O(lg n)
I tried some ideas but could not get passed the last two.
EDIT: The question A data structure traversable by both order of insertion and order of magnitude does NOT answer my question. Please do not mark it as duplicate. Thank you.
EDIT #2: Example for what freq_sum(S,z) does:
Suppose that one called freq_sum(S,5) over the data structure that contains: 2, 2, 2, 3, 4, 4, 4, 5, 5
The combination 2 and 5 could be a possible answer, becuase 2 exists 3 times in the structure and 5 exists 2 times, so 3+2=z
You could use a Red-Black to accomplish this.
Red-Black Trees are very fast data structures that adhere to the requirements you have stated above (the last two would require slight modification to the structure).
You would simply have to allow for duplicate keys, since Red-Black Trees follow the properties of Binary Search Trees. Here is an example of a BST allowing duplicate keys
Red-Black Trees are sufficient to maintain running time of:
Search: O(log N)
Insert: O(log N)
Delete: O(log N)
Space : O(N)
EDIT:
You could implement a Self-Balancing Binary Tree, with modification to allow duplicate keys and find the oldest key (see above reference). Building on this, a Splay Tree can meet all of your requirements with an amortized runtime of O(log N)
For finding FREQ_SUM(S,z):
Since search runs with an amortized time of O(log N) and you are searching for 2 nodes in the tree you end up with runtime of O(2*log N). But when considering runtime, scalar constants are ignored; you result with a runtime of O (log N). You then find the node 'z', with runtime of O(log N);
This is the fundamental runtime of a search utilizing a Binary Search Tree, which the Splay tree is built on.
By using the Split operation, you can return two new trees: one that contains all of the elements less than or equal to x, and the other that contains all of the elements greater than x.
The problem is online
Details: The length of array <= 35000, the number of insertions <= 35000, the number of assignments <= 70000 and the number of queries <= 70000; time limit: 10s (Java:20s).
The vague solution I found online says that I need to maintain intervals using a scapegoat tree and in each node of the scapegoat tree, maintain a functional interval tree to query the kth largest element. I do know how to do the second step, but I don't know how to do the first one.
Let's suppose that we have (semantically) an array like
0: 31337
1: 42
2: 314159
3: 9000
4: 100 .
We have a scapegoat tree where the array entries are ordered by index. Each node of the tree stores the number of left-descendants so that we can search efficiently by index. (This makes the scapegoat implementation simpler too.)
9000(3)
/ \
42(1) 100(0)
/ \
31337(0) 314159(0)
For each subtree, we also maintain a value-ordered BST of values that it contains. This BST can be a scapegoat tree and also has left-descendant counts for implementing selection.
31337: {31337}
42: {42, 31337, 314159}
314159: {314159}
9000: {42, 100, 9000, 31337, 314159}
100: {100}
To insert, we insert into the scapegoat tree, updating the left-descendant counts and inserting the new value into the BSTs as we walk down. The amortized insertion cost is O(log^2 n) if we reconstitute the BSTs in linear time (proof: each value belongs to O(log n) BSTs, so scapegoating is O(log n) per node touched, for a total of O(log^2 n); inserting into O(log n) BSTs above the scapegoated node is O(log^2 n)). To update, we have to delete/insert from the BSTs (O(log^2 n)).
The query path is where things get ugly. Identifying the O(log n) BSTs and singleton sets whose union is the array section is the easy part. The hard part is actually doing the selection. Binary search will yield O(log^3 n)-time queries, because we have O(log n) rounds of selecting in O(log n) arrays, each with a selection cost of O(log n). Perhaps the Frederickson--Johnson algorithm points to an answer, but it's complicated even for arrays.
Partial answer:
The basic key to performance in such cases is a data structure (aka collection) which supports the required operations with O(log n) complexity (or better).
In your case you need insertions and lookups (called assignments and queries in your question).
Because you also ask for "largest" you need a sorted collection. (This rules out collections based on Hashes which have O(1) complexity)
So you should start with binary trees or tries.
It is currently impossible to give more details because your answer is too vague.
Is there one type of set-like data structure supporting merging in O(logn) time and k-th element search in O(logn) time? n is the size of this set.
You might try a Fibonacci heap which does merge in constant amortized time and decrease key in constant amortized time. Most of the time, such a heap is used for operations where you are repeatedly pulling the minimum value, so a check-for-membership function isn't implemented. However, it is simple enough to add one using the decrease key logic, and simply removing the decrease portion.
If k is a constant, then any meldable heap will do this, including leftist heaps, skew heaps, pairing heaps and Fibonacci heaps. Both merging and getting the first element in these structures typically take O(1) or O(lg n) amortized time, so O( k lg n) maximum.
Note, however, that getting to the k'th element may be destructive in the sense that the first k-1 items may have to be removed from the heap.
If you're willing to accept amortization, you could achieve the desired bounds of O(lg n) time for both meld and search by using a binary search tree to represent each set. Melding two trees of size m and n together requires time O(m log(n / m)) where m < n. If you use amortized analysis and charge the cost of the merge to the elements of the smaller set, at most O(lg n) is charged to each element over the course of all of the operations. Selecting the kth element of each set takes O(lg n) time as well.
I think you could also use a collection of sorted arrays to represent each set, but the amortization argument is a little trickier.
As stated in the other answers, you can use heaps, but getting O(lg n) for both meld and select requires some work.
Finger trees can do this and some more operations:
http://en.wikipedia.org/wiki/Finger_tree
There may be something even better if you are not restricted to purely functional data structures (i.e. aka "persistent", where by this is meant not "backed up on non-volatile disk storage", but "all previous 'versions' of the data structure are available even after 'adding' additional elements").