I implemented a recursive algorithm to check if an array is a min-heap. I can't figure out what the worst case time complexity should be. Here's the code:
CHECK-MIN-HEAP(A, i, n)
if i > (n - 1) / 2
return true
if A[i] > A[left(i)] or A[i] > A[right(i)]
return false
return CHECK-MIN-HEAP(A, left(i), n) and CHECK-MIN-HEAP(A, right(i), n)
A brief explanation: the base case is represented by the case in which a node is a leaf. This because the element A[(n-1)/2] represent the last not-leaf node. The other base case is when the min-heap condition is violated. In the recursive case we check if the left and right subtrees are heaps.
So in best case, when the array isn't an heap, we have a constant time complexity. In the worst one, when the array is an heap, the function check all the nodes of the heap and 'cause the height of the heap is logn, the time complexity should be O(logn). Is correct? Or the time complexity is O(n)?
O(N) is obviously the correct answer here.
It is obvious because you traverse the entire array element by element looking for invalid invariants.
The best case you state is quite useless as an analysis point, it might be that the final node is invalidating the entire heap O(N).
Related
A reversort Algorithm is defined as the following:
Reversort(L):
for i := 1 to length(L) - 1
j := position with the minimum value in L between i and length(L), inclusive
Reverse(L[i..j])
I understand that the time complexity is O(n^2) for a array
But for a array which is already sorted(in ascending) what is the complexity?
Will it remain same or will it become O(n)?
Still takes quadratic time. Not for reversals, since j will always be i so each reversal takes O(1). But for finding the minimum values.
(Finding the minima could be done faster if you for example additionally kept the remaining elements in a min-heap (leading to overall O(n log n) time), but that would really have to be stated. As it's written, it's doing a full search through the remaining part each time.)
We are processing a stream of positive integers. At any point in time, we can be asked a query to which the answer is the smallest positive number that we have not seen yet.
One can assume two APIs.
void processNext(int val)
int getSmallestNotSeen()
We can assume the numbers to be bounded by the range [1,10^6]. Let this range be N.
Here is my solution.
Let's take an array of size 10^6. Whenever processNext(val) is called we mark the array[val] to be 1. We make a sum segment tree on this array. This will be a point update in the segment tree. Whenever getSmallestNotSeen() is called I find the smallest index j such that sum [1..j] is less than j. I find j using binary search. processNext(val) -> O(1) getSmallestNotSeen() -> O((logN)^2)
I was thinking maybe if there was something more optimal. Or the above solution can be improved.
Make a map of id - > node (nodes of a doubly-linked list) and initialize for 10^6 nodes, each pointing to its neighbors. Initialize the min to one.
processNext(val): check if the node exists. If it does, delete it and point its neighbors at each other. If the node you delete has no left neighbor (i.e. was smallest), update the min to be the right neighbor.
getSmallestNotSeen(): return the min
The preprocessing is linear time and linear memory. Everything after that is constant time.
In case the number of processNext calls (i.e. the length of the stream) is fairly small compared with the range of N, then space usage could be limited by storing consecutive ranges of numbers, instead of all possible individual numbers. This is also interesting when N could be a much larger range, like [1, 264-1]
Data structure
I would suggest a binary search tree with such [start, end] ranges as elements, and self-balancing (like AVL, red-black, ...).
Algorithm
Initialise the tree with one (root) node: [1, Infinity]
Whenever a new value val is pulled with processNext, find the range [start, end] that includes val, using binary search.
If the range has size 1 (and thus only contains val), perform a deletion of that node (according to the tree rules)
Else if val is a bounding value of the range, then just update the range in that node, excluding val.
Otherwise split the range into two. Update the node with one of the two ranges (decide by the balance information) and let the other range sift down to a new leaf (and rebalance if needed).
In the tree maintain a reference to the node having the least start value. Only when this node gets deleted during processNext it will need a traversal up or down the tree to find the next (in order) node. When the node splits (see above) and it is decided the put the lower part in a new leaf, the reference needs to be updated to that leaf.
The getSmallestNotSeen function will return the start-value from that least-range node.
Time & Space Complexity
The space complexity is O(S), where S is the length of the stream
The time complexity of processNext is O(log(S))
The time complexity of getSmallestNotSeen is O(1)
The best case space and time complexity is O(1). Such a best case occurs when the stream has consecutive integers (increasing or decreasing)
bool array[10^6] = {false, false, ... }
int min = 1
void processNext(int val) {
array[val] = true // A
while (array[min]) // B
min++ // C
}
int getSmallestNotSeen() {
return min
}
Time complexity:
processNext: amortised O(1)
getSmallestNotSeen: O(1)
Analysis:
If processNext is invoked k times and n is the highest value stored in min (which could be returned in getSmallestNotSeen), then:
the line A will be executed exactly k times,
the line B will be executed exactly k + n times, and
the line C will be executed exactly n times.
Additionally, n will never be greater than k, because for min to reach n there needs to be a continous range of n true's in the array, and there can be only k true's in the array in total. Therefore, line B can be executed at most 2 * k times and line C at most k times.
Space complexity:
Instead of an array it is possible to use a HashMap without any additional changes in the pseudocode (non-existing keys in the HashMap should evaluate to false). Then the space complexity is O(k). Additionally, you can prune keys smaller than min, thus saving space in some cases:
HashMap<int,bool> map
int min = 1
void processNext(int val) {
if (val < min)
return
map.put(val, true)
while (map.get(min) = true)
map.remove(min)
min++
}
int getSmallestNotSeen() {
return min
}
This pruning technique might be most effective if the stream values increase steadily.
Your solution takes O(N) space to hold the array and the sum segment tree, and O(N) time to initialise them; then O(1) and O(logĀ² N) for the two queries. It seems pretty clear that you can't do better than O(N) space in the long run to keep track of which numbers are "seen" so far, if there are going to be a lot of queries.
However, a different data structure can improve on the query times. Here are three ideas:
Self-balancing binary search tree
Initialise the tree to contain every number from 1 to N; this can be done in O(N) time by building the tree from the leaves up; the leaves have all the odd numbers, then they're joined by all the numbers which are 2 mod 4, then those are joined by the numbers which are 4 mod 8, and so on. The tree takes O(N) space.
processNext is implemented by removing the number from the tree in O(log N) time.
getSmallestNotSeen is implemented by finding the left-most node in O(log N) time.
This is an improvement if getSmallestNotSeen is called many times, but if getSmallestNotSeen is rarely called then your solution is better because it does processNext in O(1) rather than O(log N).
Doubly-linked list
Initialise a doubly-linked list containing the numbers 1 to N in order, and create an array of size N holding pointers to each node. This takes O(N) space and is done in O(N) time. Initialise a variable holding a cached minimum value to be 1.
processNext is implemented by looking up the corresponding list node in the array, and deleting it from the list. If the deleted node has no predecessor, update the cached minimum value to be the value held by the successor node. This is O(1) time.
getSmallestNotSeen is implemented by returning the cached minimum, in O(1) time.
This is also an improvement, and is strictly better asymptotically, although the constants involved might be higher; there's a lot of overhead to hold an array of size N and also a doubly-linked list of size N.
Hash-set
The time requirements for the other solutions are largely determined by their initialisation stages, which take O(N) time. Initialising an empty hash-set, on the other hand, is O(1). As before, we also initialise a variable holding a current minimum value to be 1.
processNext is implemented by inserting the number into the set, in O(1) amortised time.
getSmallestNotSeen updates the current minimum by incrementing it until it's no longer in the set, and then returns it. Membership tests on a hash-set are O(1), and the number of increments over all queries is limited by the number of times processNext is called, so this is also O(1) amortised time.
Asymptotically, this solution takes O(1) time for initialisation and queries, and it uses O(min(Q,N)) space where Q is the number of queries, while the other solutions use O(N) space regardless.
I think it should be straightforward to prove that O(min(Q,N)) space is asymptotically optimal, so the hash-set turns out to be the best option. Credit goes to Dave for combining the hash-set with a current-minimum variable to do getSmallestNotSeen in O(1) amortised time.
A way of finding the median of a given set of n numbers is to distribute them among 2 heaps. 1 is a max-heap containing the lower n/2 (ceil(n/2)) numbers and a min-heap containing the rest. If maintained in this way the median is the max of the first heap (along with the min of the second heap if n is even). Here's my c++ code that does this:
priority_queue<int, vector<int> > left;
priority_queue<int,vector<int>, greater<int> > right;
cin>>n; //n= number of items
for (int i=0;i<n;i++) {
cin>>a;
if (left.empty())
left.push(a);
else if (left.size()<=right.size()) {
if (a<=right.top())
left.push(a);
else {
left.push(right.top());
right.pop();
right.push(a);
}
}
else {
if (a>=left.top())
right.push(a);
else {
right.push(left.top());
left.pop();
left.push(a);
}
}
}
We know that the heapify operation has linear complexity . Does this mean that if we insert numbers one by one into the two heaps as in the above code, we are finding the median in linear time?
Linear time heapify is for the cost of building a heap from an unsorted array as a batch operation, not for building a heap by inserting values one at a time.
Consider a min heap where you are inserting a stream of values in increasing order. The value at the top of the heap is the smallest, so each value trickles all the way down to the bottom of the heap. Consider just the last half of the values inserted. At this time the heap will have very nearly its full height, which is log(n), so each value trickles down log(n) slots, and the cost of inserting n/2 values is O(n log(n))
If I present a stream of values in increasing order to your median finding algorithm one of the things it has to do is build a min heap from a stream of values in increasing order so the cost of the median finding is O(n log(n)). In, fact the max heap is going to be doing a lot of deletes as well as insertions, but this is just a constant factor on top so I think the overall complexity is still O(n log(n))
When there is one element, the complexity of the step is Log 1 because of a single element being in a single heap.
When there are two elements, the complexity of the step is Log 1 as we have one element in each heap.
When there are four elements, the complexity of the step is Log 2 as we have two elements in each heap.
So, when there are n elements, the complexity is Log n as we have n/2 elements in each heap and
adding an element; as well as,
removing element from one heap and adding it to another;
takes O(Log n/2) = O(Log n) time.
So for keeping track of median of n elements essentially is done by performing:
2 * ( Log 1 + Log 2 + Log 3 + ... + Log n/2 ) steps.
The factor of 2 comes from performing the same step in 2 heaps.
The above summation can be handled in two ways. One way gives a tighter bound but it is encountered less frequently in general. Here it goes:
Log a + Log b = Log a*b (By property of logarithms)
So, the summation is actually Log ((n/2)!) = O(Log n!).
The second way is:
Each of the values Log 1, Log 2, ... Log n/2 is less than or equal to Log n/2
As there are a total n/2 terms, the summation is less than (n/2) * Log (n/2)
This implies the function is upper bound by (n/2) * Log (n/2)
Or, the complexity is O(n * Log n).
The second bound is looser but more well known.
This is a great question, especially since you can find the median of a list of numbers in O(N) time using Quickselect.
But the dual priority-queue approach gives you O(N log N) unfortunately.
Riffing in binary heap wiki article here, heapify is a bottom-up operation. You have all the data in hand and this allows you to be cunning and reduce the number of swaps/comparisons to O(N). You can build an optimal structure from the get-go.
Adding elements from the top, one at a time, as you are doing here, requires reorganizing every time. That's expensive so the whole operation ends up being O(N log N).
I have seen various posts here that computes the diameter of a binary tree. One such solution can be found here (Look at the accepted solution, NOT the code highlighted in the problem).
I'm confused why the time complexity of the code would be O(n^2). I don't see how traversing the nodes of a tree twice (once for the height (via getHeight()) and once for the diameter (via getDiameter()) would be n^2 instead of n+n which is 2n. Any help would be appreciated.
As you mentioned, the time complexity of getHeight() is O(n).
For each node, the function getHeight() is called. So the complexity for a single node is O(n). Hence the complexity for the entire algorithm (for all nodes) is O(n*n).
It should be O(N) to calculate the height of every subtree rooted at every node, you only have to traverse the tree one time using an in-order traversal.
int treeHeight(root)
{
if(root == null) return -1;
root->height = max(treeHeight(root->rChild),treeHeight(root->lChild)) + 1;
return root->height;
}
This will visit each node 1 time, so has order O(N).
Combine this with the result from the linked source, and you will be able to determine which 2 nodes have the longest path between in at worst another traversal.
Indeed this describes the way to do it in O(N)
The different between this solution (the optimized one) and the referenced one is that the referenced solution re-computes tree height every time after shrinking the search size by only 1 node (the root node). Thus from above the complexity will be O(N + (N - 1) + ... + 1).
The sum
1 + 2 + ... + N
is equal to
= N(N + 1)/2
And so the complexity of sum of all the operations from the repeated calls to getHeight will be O(N^2)
For completeness sake, conversely, the optimized solution getHeight() will have complexity O(1) after the pre computation because each node will store the value as a data member of the node.
All subtree heights may be precalculated (using O(n) time), so what total time complexity of finding the diameter would be O(n).
I am little bit confused regarding worst Case time and Avg case Time complexity. My source of confusion is Here
My aim is to short data in increasing Order: I choose BST to acomplish my task of sorting.Here I am putting what I am doing for printing data in Increasing order.
1) Construct a binary search tree for given input.
Time complexity: Avg Case O(log n)
Worst Case O(H) {H is height of tree, here we can Assume Height is equal to number of node H = n}
2)After Finishing first work I am traversing BST in Inorder to print data in Increasing order.
Time complexity: O(n) {n is the number of nodes in tree}
Now I analyzed total complexity for get my desire result (data in increasing order) is for Avg Case: T(n) = O(log n) +O(n) = max(log n, n) = O(n)
For Worst Case : T(n) = O(n) +O(n) = max(n, n) = O(n)
Above point was my understanding which is Differ from Above Link concept. I know I am doing some wrong interpratation Please correct me. I would appreciate your suggestion and thought.
Please Refer this title Under Slide which I have mentined:
In (1) you provide the time per element, you need to multiply with the # of elements.
The time complexity needed to construct the binary tree is n times the complexity you suggest as you need to insert each node.