Related
Linked List
A linked list’s insertion time complexity is O(1) for the actual operation, but requires O(n) time to traverse to the proper position. Most online resources list a linked list’s average insertion time as O(1):
https://stackoverflow.com/a/17410009/10426919
https://www.bigocheatsheet.com/
https://www.geeksforgeeks.org/time-complexities-of-different-data-structures/
BST
A binary search tree’s insertion requires the traversal of nodes, taking O(log n) time.
Problem
Am I mistaken to believe that insertion in a BST also takes O(1) time for the actual operation?
Similar to the nodes of a linked list, an insertion of a node in a BST will simply point the current node’s pointer to the inserted-node, and the inserted-node will point to the current node’s child node.
If my thinking is correct, why do most online resources list the average insert time for a BST to be O(log n), as opposed to O(1) like for a linked list?
It seems that for linked list, the actual insertion time is listed as the insertion time complexity, but for BST, the traversal time is listed as the insertion time complexity.
It reflects the usage. It's O(1) and O(log n) for the operations you'll actually request from them.
With a BST, you'll likely let it manage itself while you stay out of the implementation details. That is, you'll issue commands like tree.insert(value) or queries like tree.contains(value). And those things take O(log n).
With a linked list, you'll more likely manage it yourself, at least the positioning. You wouldn't issue commands like list.insert(value, index), unless the index is very small or you don't care about performance. You're more likely to issue commands like insertAfter(node, newNode) or insertBeginning(list, newNode), which do only take O(1) time. Note that I took these two from Wikipedia's Linked list operations > Singly linked lists section, which doesn't even have an operation for inserting at a certain position given as an index. Because in reality, you'll manage the "position" (in the form of a node) with the algorithm that uses the linked list, and the time to manage the position is attributed to that algorithm instead. That can btw also be O(1), examples are:
You're building a linked list from an array. You'll do this by keeping a variable referencing the last node. To append the next value/node, insert it after that last node (an O(1) operation indeed), and update your variable to reference the new last node instead (also O(1)).
Imagine you don't find a position with a linear scan but with a hash map, storing references directly to linked list nodes. Then looking up the reference takes O(1) and inserting after the looked-up node also again only takes O(1) time.
If you had shown us some of those "Most online resources list a linked list’s average insertion time as O(1)", we'd likely see that they're indeed showing insertion operations like insertAfterNode, not insertAtIndex. Edit now that you included some links in the question: My thoughts on those sources regarding the O(1) insertion for linked lists: The first one does point out that it's O(1) only if you already have something like an "iterator to the location". The second one in turn refers to the same Wikipedia section I showed above, i.e., with insertions after a given node or at the beginning of a list. The third one is, well, the worst site about programming I know, so I'm not surprised they just say O(1) without any further information.
Put differently, as I like real-world analogies: If you ask me how much it costs to replace part X inside a car motor, I might say $200, even though the part only costs $5. Because I wouldn't do that myself. I'd let a mechanic do that, and I'd have to pay for their work. But if you ask me how much it costs to replace the bell on a bicycle, I might say $5 when the bell costs $5. Because I'd do the replacing myself.
A binary search tree is ordered, and it's typically balanced (to avoid O(n) worst-case search times), which means that when you insert a value some amount of shuffling has to be done to balance out the tree. That rebalancing takes an average of O(log n) operations, whereas a Linked List only needs to update a fixed number of pointers once you've found your place to insert an item between nodes.
To insert into a linked list, you just need to maintain the end node of the list (assuming you are inserting at the end).
To insert into a binary search tree (BST), and to maintain the BST after insertion, there is no way you can do that in O(1) - since the tree might re-balance. This operation is not as simple as inserting into a linked list.
Check out some of the examples here.
The insertion time of a Linked List is actually depends on where you are inserting and the types of Linked List.
For example consider the following cases:
You are using a single linked list and you are going to insert at the end / middle, you would have running time of O(n) to traverse the list till the end node or middle node.
You are using double linked list (with two pointer first pointer points to head element and second pointer points to last element) and you are going to insert at the end, this time still you would have O(n) time complexity since you need to traverse to the middle of the list using either first or second pointer.
You are using single linked list and you are going to insert at the first position of the list, this time you would have complexity of O(1) since you don't need to traverse any node at all. The same is true for double linked list and insert position at the end of the list.
So you can see in worst cases scenario a Linked list would take O(n) instead of O(1).
Now in case of BST you can come up with O(log n) time if your BST is balanced and not skewed. If your TREE is skewed (where every elements are greater than the prev elements) this time you need to traverse all the nodes to find the insertion position. For example consider your tree is 1->2->4->6 and you are going to insert node 9, so you need to visit all the nodes to find to insertion position.
1
\
2
\
4
\
6 (last position after which new node going to insert)
\
9 (new insertion position for the new node)
Therefore you can see you need to visit all the nodes to find the proper place, if you have n-nodes you would have O(n+1) => O(n) running time complexity.
But if your BST is balanced and not skewed the situation changes dramatically, since every move you can eliminate nodes which is not comes under condition.
PS: What I mean by not comes under the condition you can take this as homework!
I have to implement a Doubly Ended Priority Queue using both Doubly Linked list as well as Binary search Tree.
Main functions should be getMin() and getMax()
Using Doubly Linked List:
The idea to get minimum and maximum element in O(1) is to insert small elements at one side of list and greater elements on other side, but there will be problem in insertion of elements everytime(It will not be O(1) then)
Is there any better way to implement it ?
Using BST:
I couldn't understand how will I be able to implement the getMin() and getMax() in BST.
Normal priority queues are usually implemented using heap so we can get the top value easily in O(1) and insert new elements in O(logn). I doubt there is a way to implement a priority queue using doubly linked lists that gets the same asymtoptical complexity, let alone a double-ended priority queue (i can be wrong though). Using a BST we can do both operations in O(logn):
Insertions and deletions are the same as in your usual BST
To get the min value, start a traversal at the root and to all the way to the left until the current node has no left child. The last node you visited contains the min value
To get the max value, start a traversal at the root and go all the way to the right until the current node has no right child. The last node you visited contains the max value
Of course, getMin and getMax will only be O(logn) if the BST is balanced, otherwise it can degenerate to O(n)
insertion of elements everytime(It will not be O(1) then) Is there any
better way to implement it ?
what you are basically trying to do is to search for the first bigger element in order which can't be done in O(1). Linear search (which means going through elements one by one) is probably the best way to do this. If you have huge lists and want to focus on efficiency you could use exponential search or interpolation search (interpolation only works if you know about the probability of the stored keys) but you can't get closer than O(loglog(n)).
I couldn't understand how will I be able to implement the getMin() and
getMax() in BST.
If you're not allowed to add any additional structure to your BST then the only way to get min and max is traversal as Lucas Sampaio mentioned already.
However it can be helpful to store a reference to the current minimum and maximum so you can access them faster
I'm working on this problem but I'm pretty confused on how to solve it:
Design a data structure that supports the following operations in amortized O(log n) time, where n is the total number of elements:
Ins(k): Insert a new element with key k
Extract-Max: Find and remove the element with largest key
Extract-Min: Find and remove the element with smallest key
Union: Merge two different sets of elements
How do I calculate the amortized time? Isn't this already something like a hash table? Or is it a variant of it?
I would really appreciate if someone can help me with this.
Thank you!!
What you're proposing isn't something that most hash tables are equipped to deal with because hash tables don't usually support finding the min and max elements quickly while supporting deletions.
However, this is something that you could do with a pair of priority queues that support melding. For example, suppose that you back your data structure with two binomial heaps - a min-heap and a max-heap. Every time you insert an element into your data structure, you add it to both the min-heap and the max-heap. However, you slightly modify the two heaps so that each element in the heap stores a pointer to its corresponding element in the other heap; that way, given a node in the min-heap, you can find the corresponding node in the max-heap and vice-versa.
Now, to do an extract-min or extract-max, you just apply a find-min operation to the min-heap or a find-max operation to the max-heap to get the result. Then, delete that element from both heaps using the normal binomial heap delete operation. (You can use the pointer you set up during the insert step to quickly locate the sibling element in the other heap).
Finally, for a union operation, just apply the normal binomial heap merge operation to the corresponding min-heaps and max-heaps.
Since all of the described operations requires O(1) operations on binomial heaps, each of them runs in time O(log n) worst-case, with no amortization needed.
Generally speaking, the data structure you're describing is called a double-ended priority queue. There are a couple of specialized data structures you can use to meet those requirements, though the one described above is probably the easiest to build with off-the-shelf components.
Is there a data structure that can be traversed in both order of insertion and order of magnitude in O(n) with at most O(log(n)) insertion and deletion?
In other words given elements 5, 1, 4, 3, 2 (inserted in this order), it can be traversed either as 1,2,3,4,5 or as 5,1,4,3,2 in O(n) time.
Of course I could use an array and simply sort it before traversing, but this would require an O(n*log(n)) pre-traversal step. Also, I could use a multi-linked list to achieve O(n) traversal, but in this case the insertion and deletion operations will also take O(n) since I cannot guarantee that the newest element will necessarily be the largest.
If there exists such a data structure, please provide me with a formal name for it so that I may research it further, or if it doesn't have one, a brief surface-level description would be appreciated.
Thank you
One solution that also permits sublinear deletion is to build a data structure D that uses a linked list D.L for the traversal in order of insertion, and a sorted tree D.T for the traversal in order of magnitude. But how to link them to additionally achieve a remove operation in sublinear time? The trick is that D.T should not store the values, but just a reference to the corresponding linked list element in D.L.
Insertion: Append to D.L in time O(1), and insert a reference to the appended element into D.T in time O(log(n)). Any comparisons in D.T are of course made on the referenced values, not by the references themselve)
Traverse by order of insertion (or backwards): simply traverse D.L in time O(n) linearly
Traverse by order of magnitude (or backwards): simply traverse D.T in time O(n) by tree-walk
Deletion: first find&remove the element in D.T in time O(log n), which also gives you the correct element reference into D.L, so it can be removed from D.L in time O(1).
The commenters are right: your best bet is to store your objects twice: once in a linked list (order of insertion) and once in a binary tree (intrinsic sort order).
This is not as bad as it may sound as you do not have to copy the objects, thus the only cost is the list/tree scaffolding and that would cost you 4 machine words per object you store.
You don't even really need two data structures. Just use a binary tree, but rather than inserting your object, wrap it in an object which also contains pointers to the previous and next objects. This is fairly trivial to do in main stream languages like java where you can use the default tree implementation with a comparator to order your tree by a property.
As long as you keep a reference to the first and last element you can traverse them in order using the internal pointers of the object.
Any help would be appreciated.
You can intersect any two sorted lists in linear time.
get the in-order (left child, then parent data, then right child) iterators for both AVL trees.
peek at the head of both iterators.
if one iterator is exhausted, return the result set.
if both elements are equal or the union is being computed, add their minimum to the result set.
pop the lowest (if the iterators are in ascending order) element. If both are equal, pop both
This runs in O(n1+n2) and is optimal for the union operation (where you are bound by the output size).
Alternatively, you can look at all elements of the smaller tree to see if they are present in the larger tree. This runs in O(n1 log n2).
This is the algorithm Google uses (or considered using) in their BigTable engine to find an intersection:
Get iterators for all sources
Start with pivot = null
loop over all n iterators in sequence until any of them is exhausted.
find the smallest element larger than the pivot in this iterator.
if the element is the pivot
increment the count of the iterators the pivot is in
if this pivot is in all iterators, add the pivot to the result set.
else
reset the count of the iterators the pivot is in
use the found element as the new pivot.
To find an element or the next largest element in a binary tree iterator:
start from the current element
walk up until the current element is larger than the element being searched for or you are in the root
walk down until you find the element or you can't go to the left
if the current element is smaller than the element being searched, return null (this iterator is exhausted)
else return the current element
This decays to O(n1+n2) for similarly-sized sets that are perfectly mixed, and to O(n1 log n2) if the second tree is much bigger. If the range of a subtree in one tree does not intersect any node in the other tree / all other trees, then at most one element from this subtree is ever visited (its minimum). This is possibly the fastest algorithm available.
Here is a paper with efficient algorithms for finding intersections and unions of AVL trees (or other kinds of trees for that matter, and other operations).
Implementing Sets Efficiently in a Functional Language
I found this paper when I was researching this subject. The algorithms are in Haskell and they are primarily designed for immutable trees, but they will work as well for any kind of tree (though there might be some overhead in some languages). Their performance guarantees are similar to the algorithm presented above.