Is this O(n) or O(1)?
Version: 5, id: 6114603: the worst-case time complexity of adding to a singly-linked list after the 10th node with a size of at least 10?
[What is] the worst-case time complexity of adding to a singly-linked list after the tenth node with a size of at least ten.
That specific task, adding an item immediately after node ten, is O(1)(a).
That's because the time taken to do it is not affected by the list size at all. It takes the exact same time whether the list has ten nodes, or a hundred billion nodes, and could be implemented as per the following pseudo-code:
def insert_after_tenth(first, item):
tenth = first.next.next.next.next.next.next.next.next.next # O(1)
item.next = tenth.next # O(1)
tenth.next = item # O(1)
I've calculated the tenth variable in that way (rather than a loop) to make it clear that the time cost is not dependent on the list size. And, since every line is O(1), the complete operation is also O(1).
(a) Admittedly, that's based on a certain reading of the question, inserting the new item immediately after node ten. There is another reading you could argue, that of inserting at some point after node ten.
However, if that were the case, all that extra stuff about node ten would be irrelevant, since it would be O(n) regardless of where you wanted to insert.
If you wanted extra marks (assuming this is some form of classwork), you may want to mention both possibilities but the first is by far the most likely.
If a LinkedList has n node, where n > 10, then the worst-case time complexity is O(N) since it needs to iterate through the list to get the last Node.
Related
Linked List
A linked list’s insertion time complexity is O(1) for the actual operation, but requires O(n) time to traverse to the proper position. Most online resources list a linked list’s average insertion time as O(1):
https://stackoverflow.com/a/17410009/10426919
https://www.bigocheatsheet.com/
https://www.geeksforgeeks.org/time-complexities-of-different-data-structures/
BST
A binary search tree’s insertion requires the traversal of nodes, taking O(log n) time.
Problem
Am I mistaken to believe that insertion in a BST also takes O(1) time for the actual operation?
Similar to the nodes of a linked list, an insertion of a node in a BST will simply point the current node’s pointer to the inserted-node, and the inserted-node will point to the current node’s child node.
If my thinking is correct, why do most online resources list the average insert time for a BST to be O(log n), as opposed to O(1) like for a linked list?
It seems that for linked list, the actual insertion time is listed as the insertion time complexity, but for BST, the traversal time is listed as the insertion time complexity.
It reflects the usage. It's O(1) and O(log n) for the operations you'll actually request from them.
With a BST, you'll likely let it manage itself while you stay out of the implementation details. That is, you'll issue commands like tree.insert(value) or queries like tree.contains(value). And those things take O(log n).
With a linked list, you'll more likely manage it yourself, at least the positioning. You wouldn't issue commands like list.insert(value, index), unless the index is very small or you don't care about performance. You're more likely to issue commands like insertAfter(node, newNode) or insertBeginning(list, newNode), which do only take O(1) time. Note that I took these two from Wikipedia's Linked list operations > Singly linked lists section, which doesn't even have an operation for inserting at a certain position given as an index. Because in reality, you'll manage the "position" (in the form of a node) with the algorithm that uses the linked list, and the time to manage the position is attributed to that algorithm instead. That can btw also be O(1), examples are:
You're building a linked list from an array. You'll do this by keeping a variable referencing the last node. To append the next value/node, insert it after that last node (an O(1) operation indeed), and update your variable to reference the new last node instead (also O(1)).
Imagine you don't find a position with a linear scan but with a hash map, storing references directly to linked list nodes. Then looking up the reference takes O(1) and inserting after the looked-up node also again only takes O(1) time.
If you had shown us some of those "Most online resources list a linked list’s average insertion time as O(1)", we'd likely see that they're indeed showing insertion operations like insertAfterNode, not insertAtIndex. Edit now that you included some links in the question: My thoughts on those sources regarding the O(1) insertion for linked lists: The first one does point out that it's O(1) only if you already have something like an "iterator to the location". The second one in turn refers to the same Wikipedia section I showed above, i.e., with insertions after a given node or at the beginning of a list. The third one is, well, the worst site about programming I know, so I'm not surprised they just say O(1) without any further information.
Put differently, as I like real-world analogies: If you ask me how much it costs to replace part X inside a car motor, I might say $200, even though the part only costs $5. Because I wouldn't do that myself. I'd let a mechanic do that, and I'd have to pay for their work. But if you ask me how much it costs to replace the bell on a bicycle, I might say $5 when the bell costs $5. Because I'd do the replacing myself.
A binary search tree is ordered, and it's typically balanced (to avoid O(n) worst-case search times), which means that when you insert a value some amount of shuffling has to be done to balance out the tree. That rebalancing takes an average of O(log n) operations, whereas a Linked List only needs to update a fixed number of pointers once you've found your place to insert an item between nodes.
To insert into a linked list, you just need to maintain the end node of the list (assuming you are inserting at the end).
To insert into a binary search tree (BST), and to maintain the BST after insertion, there is no way you can do that in O(1) - since the tree might re-balance. This operation is not as simple as inserting into a linked list.
Check out some of the examples here.
The insertion time of a Linked List is actually depends on where you are inserting and the types of Linked List.
For example consider the following cases:
You are using a single linked list and you are going to insert at the end / middle, you would have running time of O(n) to traverse the list till the end node or middle node.
You are using double linked list (with two pointer first pointer points to head element and second pointer points to last element) and you are going to insert at the end, this time still you would have O(n) time complexity since you need to traverse to the middle of the list using either first or second pointer.
You are using single linked list and you are going to insert at the first position of the list, this time you would have complexity of O(1) since you don't need to traverse any node at all. The same is true for double linked list and insert position at the end of the list.
So you can see in worst cases scenario a Linked list would take O(n) instead of O(1).
Now in case of BST you can come up with O(log n) time if your BST is balanced and not skewed. If your TREE is skewed (where every elements are greater than the prev elements) this time you need to traverse all the nodes to find the insertion position. For example consider your tree is 1->2->4->6 and you are going to insert node 9, so you need to visit all the nodes to find to insertion position.
1
\
2
\
4
\
6 (last position after which new node going to insert)
\
9 (new insertion position for the new node)
Therefore you can see you need to visit all the nodes to find the proper place, if you have n-nodes you would have O(n+1) => O(n) running time complexity.
But if your BST is balanced and not skewed the situation changes dramatically, since every move you can eliminate nodes which is not comes under condition.
PS: What I mean by not comes under the condition you can take this as homework!
I was going through this article to understand BIT : https://www.hackerearth.com/practice/notes/binary-indexed-tree-or-fenwick-tree/#c217533
In this the author says the following at one place:
If we look at the for loop in update() operation, we can see that the loop runs at most the number of bits in index x which is restricted to be less or equal to n (the size of the given array), so we can say that the update operation takes at most O(log2(n)) time
Then my question is that, if it can go upto n (the size of the given array), then how is the time complexity any different from the normal approach he has mentioned at the starting because in that update should be O(1) ? and prefixsum(int k) can go upto max n.
The key is that you don't do a step of 1 in the loop, but a step of the size x&-x.
This is equivalent to going upwards in the tree, to the next relevant node that needs to include the current one and thus gives you a worst case of O(log n).
Suppose the list of intervals may be [[1,3],[2,4],[6,12]] and the Query time T = 3. The number of intervals which have 3 in the above list are 2 (i.e) [[1,3],[2,4]]. Is it possible to do this in O(logn) time?
This cannot be done in O(log n) time in the general case.
You can binary search on the start time to find the last interval that could possibly contain the query time, but because there's no implied ordering on the end times, you have sequentially search from the start of the list to the item you identified as the last to determine if the query time is in any of those intervals.
Consider, for example, [(1,7),(2,11),(3,8),(4,5),(6,10),(7,9)], with a query time of 7.
Binary search on the start time will tell you that all of the intervals could contain the query time. But because the ending times are not in any particular order, you can't do a binary search on them. You have to look at each individual interval to determine if the ending time is greater than or equal to the query time. Here, you see that (4,5) does not contain the query time.
Well, one thing to note is that for an interval to contain T, its start time must be less than or equal to T. Since these are sorted by start time, you can use a basic binary search to eliminate all the ones which start too late in O(log n) time.
If we can assume that these are also sorted by end time -- that is, no interval completely encompasses a previous interval -- then you can use another binary search to eliminate all the ones whose end times are before T. That will keep the running time in O(log n).
If we can't make that assumption, things get more complex, and I can't think of any way to do better than O(n log n) [by sorting the remaining list by end time and performing another binary search on that]. Perhaps there's a way?
EDIT As Qbyte says below, the final sort is superfluous; you can get it down to O(n) with a simple linear search on the remaining set. Then again, if you're going with an O(n) solution anyway, you may as well skip the entire algorithm and just do a linear search on the original set.
Let's take your assumption that the intervals are sorted by start time. A binary search O(log n) will eliminate the intervals that can't contain T. The remaining might.
Assuming End Time is not also Sorted (OP)
You have to scan the remaining ones, O(n), counting them. Total complexity O(n). Given this, you might as well have never binary searched and just scanned the whole list.
Assuming End Time is also Sorted
If the remaining ones are sorted by end time as well, you can do another binary search, keeping the complexity at O(log n).
But you're not done. You need the count.
You know the count to start with. If you didn't you couldn't have binary searched. You will know the indexes of the last tests of each binary search. From here it's an O(1) calculation option.
Thus the total complexity is O(log n) for this option.
I have this question in my DSA Course Mid-term test:
Consider a Single Linked List contains N nodes (N > 8), a method f1() is designed to
find the 8th node from beginning, and method f2() is designed to find the 8th node from end.
Which is the time complexity of f1() and f2()?
Select one:
a. O(N) and O(N)
b. O(1) and O(1)
c. O(1) and O(N)
d. O(N) and O(1)
The correct answer given is c. O(1) and O(N). However I think that the correct answer is a. I know if N = 8 it will take O(1) time to find the 8th node from the beginning (just return the tail node) but in this case N > 8. Could any one explain this for me please?
Thank you in advance for any help you can provide.
O(1) implies constant running time. In other words, it doesn't depend on the input size.
When you apply that definition here, you can see that fetching the 8th element is always a constant operation irrespective of the input size. This is because, irrespective of the size of the input (ex:10,100,100..), the operation get(8) will always take the same time. Also, since we know for sure that n > 8, there's no chance that trying to fetch the 8th element will result in going beyond the size of the input.
I have a massive (~109) set of events characterized by a start and end time. Given a time, I want to find how many on those events were ongoing at that time.
What sort of data structure would be helpful in this situation? The operations I need to be fast are:
Inserting new events, e.g., {start: 100000 milliseconds, end: 100010 milliseconds}.
Querying the number of concurrent events at a given time.
Update: Someone put a computational geometry flag on this, so I figure I should rephrase this in terms of computational geometry. I have a set of 1-dimensional intervals and I want to calculate how many of those intervals intersect with a given point. Insertion of new intervals must be fast.
You're looking for an interval tree.
Construction: O(n log n), where n is the number of intervals
Query: O(m+log n), where m is the number of query results and n is the number of intervals
Space: O(n)
Just to add to the other answers, depending on the length of time and the granularity desired, you could simply have an array of counters. For example, if the length of time is 24 hours and the desired granularity is 1 millisecond, there will be 86,400,000 cells in the array. With one 4 byte int per cell (which is enough to hold 10^9), that will be less than 700 MB of RAM, versus tree-based solutions which would take at least (8+8+4+4)*10^9 = 24 GB of RAM for two pointers plus two ints per tree node (since 32 bits of addressable memory is insufficient, you'd need 64 bits per pointer). You can use swap, but this will slow down some queries considerably.
You can also use this solution if you only care about the last 24 hours of data, for example, by using the array as a circular buffer. Besides the limitation on time and granularity, the other downside is that insertion time of an interval is proportional to the length of the interval, so if interval length is unbounded, you could be in trouble. Queries, on the other hand, are a single array lookup.
(Extending the answers by tskuzzy and Snowball)
A balanced binary search tree makes sense, except that the memory requirements would be excessive for your data set. A B-tree would be much more memory efficient, albeit more complicated unless you can use a library.
Keep two trees, one of start times and one of end times. To insert an event, add the start time to the tree of start times and the end time to the tree of end times. To query the number of active events at time T, search the the start-time tree to find out how many start times are less than T, and search the end-time tree to find out how many end times are less than T. Subtract the number of end times from the number of start times, and that's the number of active events.
Insertions and queries should both take O(log N) time.
A few comments:
The way you have phrased the question, you only care about the number of active events, not which events were active. This means you do not need to keep track of which start time goes with which end time! This also makes it easier to avoid the "+M" term in the queries cited by previous answers.
Be careful about the exact semantics of your query. In particular, does an event count as active at time T if it starts at time T? If it ends at time T? The anwers to these questions affect whether you use < or <= in certain places.
Do not use a "set" data structure, because you almost certainly want to allow and count duplicates. That is, more than one event might start and/or end at the same time. A set would typically ignore duplicates. What you are looking for instead is a "multiset" (sometimes called a "bag").
Many binary search trees do not support "number of elements < T" queries out of the box. But it is easy to add this functionality by storing a size at each node.
Suppose we have a sorted set (e.g., a balanced binary search tree or a skip list) data structure with N elements. Furthermore, suppose that the sorted set has O(log N) search time, O(log N) insert time, and O(N) space usage (these are reasonable assumptions, see red-black tree, for example).
One possibility is to have two sorted sets, bystart and byend, respectively sorted by the start and end times of the events.
To find the number of events that are ongoing at time t, ask byend for the first interval whose end time is greater than t: an O(log N) search operation. Call the start time of this interval left. Now, ask bystart for the number of intervals whose start time is greater than or equal to left and less than t. This is O(log N + M), where M is the number of such intervals. So, the total time for a search is O(log N + M).
Insertion was O(log N) for sorted sets, which we have to do once for each sorted set. This makes the total time for the insertion operation O(log N).
Construction of this structure from scratch just consists of N insertion operations, so the total time for construction is O(N log N).
Space usage is O(N) for each sorted set, so the total space usage is O(N).
Summary:
Insert: O(log N), where N is the number of intervals
Construct: O(N log N)
Query: O(log N + M), where M is the number of results
Space: O(N)