Linked List v.s. Binary Search Tree Insertion Time Complexity - algorithm

Linked List
A linked list’s insertion time complexity is O(1) for the actual operation, but requires O(n) time to traverse to the proper position. Most online resources list a linked list’s average insertion time as O(1):
https://stackoverflow.com/a/17410009/10426919
https://www.bigocheatsheet.com/
https://www.geeksforgeeks.org/time-complexities-of-different-data-structures/
BST
A binary search tree’s insertion requires the traversal of nodes, taking O(log n) time.
Problem
Am I mistaken to believe that insertion in a BST also takes O(1) time for the actual operation?
Similar to the nodes of a linked list, an insertion of a node in a BST will simply point the current node’s pointer to the inserted-node, and the inserted-node will point to the current node’s child node.
If my thinking is correct, why do most online resources list the average insert time for a BST to be O(log n), as opposed to O(1) like for a linked list?
It seems that for linked list, the actual insertion time is listed as the insertion time complexity, but for BST, the traversal time is listed as the insertion time complexity.

It reflects the usage. It's O(1) and O(log n) for the operations you'll actually request from them.
With a BST, you'll likely let it manage itself while you stay out of the implementation details. That is, you'll issue commands like tree.insert(value) or queries like tree.contains(value). And those things take O(log n).
With a linked list, you'll more likely manage it yourself, at least the positioning. You wouldn't issue commands like list.insert(value, index), unless the index is very small or you don't care about performance. You're more likely to issue commands like insertAfter(node, newNode) or insertBeginning(list, newNode), which do only take O(1) time. Note that I took these two from Wikipedia's Linked list operations > Singly linked lists section, which doesn't even have an operation for inserting at a certain position given as an index. Because in reality, you'll manage the "position" (in the form of a node) with the algorithm that uses the linked list, and the time to manage the position is attributed to that algorithm instead. That can btw also be O(1), examples are:
You're building a linked list from an array. You'll do this by keeping a variable referencing the last node. To append the next value/node, insert it after that last node (an O(1) operation indeed), and update your variable to reference the new last node instead (also O(1)).
Imagine you don't find a position with a linear scan but with a hash map, storing references directly to linked list nodes. Then looking up the reference takes O(1) and inserting after the looked-up node also again only takes O(1) time.
If you had shown us some of those "Most online resources list a linked list’s average insertion time as O(1)", we'd likely see that they're indeed showing insertion operations like insertAfterNode, not insertAtIndex. Edit now that you included some links in the question: My thoughts on those sources regarding the O(1) insertion for linked lists: The first one does point out that it's O(1) only if you already have something like an "iterator to the location". The second one in turn refers to the same Wikipedia section I showed above, i.e., with insertions after a given node or at the beginning of a list. The third one is, well, the worst site about programming I know, so I'm not surprised they just say O(1) without any further information.
Put differently, as I like real-world analogies: If you ask me how much it costs to replace part X inside a car motor, I might say $200, even though the part only costs $5. Because I wouldn't do that myself. I'd let a mechanic do that, and I'd have to pay for their work. But if you ask me how much it costs to replace the bell on a bicycle, I might say $5 when the bell costs $5. Because I'd do the replacing myself.

A binary search tree is ordered, and it's typically balanced (to avoid O(n) worst-case search times), which means that when you insert a value some amount of shuffling has to be done to balance out the tree. That rebalancing takes an average of O(log n) operations, whereas a Linked List only needs to update a fixed number of pointers once you've found your place to insert an item between nodes.

To insert into a linked list, you just need to maintain the end node of the list (assuming you are inserting at the end).
To insert into a binary search tree (BST), and to maintain the BST after insertion, there is no way you can do that in O(1) - since the tree might re-balance. This operation is not as simple as inserting into a linked list.
Check out some of the examples here.

The insertion time of a Linked List is actually depends on where you are inserting and the types of Linked List.
For example consider the following cases:
You are using a single linked list and you are going to insert at the end / middle, you would have running time of O(n) to traverse the list till the end node or middle node.
You are using double linked list (with two pointer first pointer points to head element and second pointer points to last element) and you are going to insert at the end, this time still you would have O(n) time complexity since you need to traverse to the middle of the list using either first or second pointer.
You are using single linked list and you are going to insert at the first position of the list, this time you would have complexity of O(1) since you don't need to traverse any node at all. The same is true for double linked list and insert position at the end of the list.
So you can see in worst cases scenario a Linked list would take O(n) instead of O(1).
Now in case of BST you can come up with O(log n) time if your BST is balanced and not skewed. If your TREE is skewed (where every elements are greater than the prev elements) this time you need to traverse all the nodes to find the insertion position. For example consider your tree is 1->2->4->6 and you are going to insert node 9, so you need to visit all the nodes to find to insertion position.
1
\
2
\
4
\
6 (last position after which new node going to insert)
\
9 (new insertion position for the new node)
Therefore you can see you need to visit all the nodes to find the proper place, if you have n-nodes you would have O(n+1) => O(n) running time complexity.
But if your BST is balanced and not skewed the situation changes dramatically, since every move you can eliminate nodes which is not comes under condition.
PS: What I mean by not comes under the condition you can take this as homework!

Related

Is there a data structure representing an ordered list with O(n*log n) time on main operations?

I am looking for a data structure that allows a specific problem to be solved in O(n*log(n)) complexity. It needs to represent a set of integers, in which I can do the following operations : 
- add an element
- check if an element exists in the set
- delete every value bigger than a given integer
Hopefully with logarithmic complexity.
I looked for linked list since adding an element in the middle and deleting a whole part of the structure is easy, but I don't know how to keep an ordered list or implement a dichotomic search. At first I was considering hash tables but I don't know how to filter the set. I'm looking at balanced binary trees and I do not know if I am looking for something delusional or if it exists somehow and I just can't find it.
For implementing from scratch, I would suggest a Treap.
A Treap is just a binary search tree where every node is given a random priority, and it satisfies the heap condition as a tree. This randomized data structure makes the expected time to find, insert, delete and split the tree be O(log(n)). The first three are fairly straightforward. To split, you just put a node in at the point to split with higher priority than the root. Then one half winds up on one side of that node, and the other half on the other.
Please note, while splitting is O(log(n)), freeing up the deleted bits is O(n).
Please note that you may not have to implement anything. For example in C++ you can just use an std::map. The performance of those operations except the delete are O(log(n)). While deleting a range of length m from a structure of size n is O(m + log(n)). If you consider the comment about freeing memory, that's about ideal.

Time complexity of deletion in a linked list

I'm having a bit of trouble understanding why time complexity of link lists are O(1) according to this website. From what I understand if you want to delete an element surely you must traverse the list to find out where the element is located (if it even exists at all)? From what I understand shouldn't it be O(n) or am I missing something completely?
No, you are not missing something.
If you want to delete a specific element, the time complexity is O(n) (where n is the number of elements) because you have to find the element first.
If you want to delete an element at a specific index i, the time complexity is O(i) because you have to follow the links from the beginning.
The time complexity of insertion is only O(1) if you already have a reference to the node you want to insert after. The time complexity for removal is only O(1) for a doubly-linked list if you already have a reference to the node you want to remove. Removal for a singly-linked list is only O(1) if you already have references to the node you want to remove and the one before. All this is in contrast to an array-based list where insertions and removal are O(n) because you have to shift elements along.
The advantage of using a linked list rather than a list based on an array is that you can efficiently insert or remove elements while iterating over it. This means for example that filtering a linked list is more efficient than filtering a list based on an array.
Your are correct.
Deletion:
1.If pointer is given in this case Time Complexity is O(1).
2.You DON'T have pointer to the node to be deleted(Search and Delete).
In this case Time Complexity is O(n).

A data structure traversable by both order of insertion and order of magnitude

Is there a data structure that can be traversed in both order of insertion and order of magnitude in O(n) with at most O(log(n)) insertion and deletion?
In other words given elements 5, 1, 4, 3, 2 (inserted in this order), it can be traversed either as 1,2,3,4,5 or as 5,1,4,3,2 in O(n) time.
Of course I could use an array and simply sort it before traversing, but this would require an O(n*log(n)) pre-traversal step. Also, I could use a multi-linked list to achieve O(n) traversal, but in this case the insertion and deletion operations will also take O(n) since I cannot guarantee that the newest element will necessarily be the largest.
If there exists such a data structure, please provide me with a formal name for it so that I may research it further, or if it doesn't have one, a brief surface-level description would be appreciated.
Thank you
One solution that also permits sublinear deletion is to build a data structure D that uses a linked list D.L for the traversal in order of insertion, and a sorted tree D.T for the traversal in order of magnitude. But how to link them to additionally achieve a remove operation in sublinear time? The trick is that D.T should not store the values, but just a reference to the corresponding linked list element in D.L.
Insertion: Append to D.L in time O(1), and insert a reference to the appended element into D.T in time O(log(n)). Any comparisons in D.T are of course made on the referenced values, not by the references themselve)
Traverse by order of insertion (or backwards): simply traverse D.L in time O(n) linearly
Traverse by order of magnitude (or backwards): simply traverse D.T in time O(n) by tree-walk
Deletion: first find&remove the element in D.T in time O(log n), which also gives you the correct element reference into D.L, so it can be removed from D.L in time O(1).
The commenters are right: your best bet is to store your objects twice: once in a linked list (order of insertion) and once in a binary tree (intrinsic sort order).
This is not as bad as it may sound as you do not have to copy the objects, thus the only cost is the list/tree scaffolding and that would cost you 4 machine words per object you store.
You don't even really need two data structures. Just use a binary tree, but rather than inserting your object, wrap it in an object which also contains pointers to the previous and next objects. This is fairly trivial to do in main stream languages like java where you can use the default tree implementation with a comparator to order your tree by a property.
As long as you keep a reference to the first and last element you can traverse them in order using the internal pointers of the object.

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?
Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.
Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.
Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.
Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Listing values in a binary heap in sorted order using breadth-first search?

I'm currently reading this paper and on page five, it discusses properties of binary heaps that it considers to be common knowledge. However, one of the points they make is something that I haven't seen before and can't make sense of. The authors claim that if you are given a balanced binary heap, you can list the elements of that heap in sorted order in O(log n) time per element using a standard breadth-first search. Here's their original wording:
In a balanced heap, any new element can be
inserted in logarithmic time. We can list the elements of a heap in order by weight, taking logarithmic
time to generate each element, simply by using breadth first search.
I'm not sure what the authors mean by this. The first thing that comes to mind when they say "breadth-first search" would be a breadth-first search of the tree elements starting at the root, but that's not guaranteed to list the elements in sorted order, nor does it take logarithmic time per element. For example, running a BFS on this min-heap produces the elements out of order no matter how you break ties:
1
/ \
10 100
/ \
11 12
This always lists 100 before either 11 or 12, which is clearly wrong.
Am I missing something? Is there a simple breadth-first search that you can perform on a heap to get the elements out in sorted order using logarithmic time each? Clearly you can do this by destructively modifying heap by removing the minimum element each time, but the authors' intent seems to be that this can be done non-destructively.
You can get the elements out in sorted order by traversing the heap with a priority queue (which requires another heap!). I guess this is what he refers to as a "breadth first search".
I think you should be able to figure it out (given your rep in algorithms) but basically the key of the priority queue is the weight of a node. You push the root of the heap onto the priority queue. Then:
while pq isn't empty
pop off pq
append to output list (the sorted elements)
push children (if any) onto pq
I'm not really sure (at all) if this is what he was referring to but it vaguely fitted the description and there hasn't been much activity so I thought I might as well put it out there.
In case that you know that all elements lower than 100 are on left you can go left, but in any case even if you get to 100 you can see that there no elements on left so you go out. In any case you go from node (or any other node) at worst twice before you realise that there are no element you are searching for. Than men that you go in this tree at most 2*log(N) times. This is simplified to log(N) complexity.
Point is that even if you "screw up" and you traverse to "wrong" node you go that node at worst once.
EDIT
This is just how heapsort works. You can imagine, that you have to reconstruct heap using N(log n) complexity each time you take out top element.

Resources