Sorting By:Binary Search Tree - algorithm

I am little bit confused regarding worst Case time and Avg case Time complexity. My source of confusion is Here
My aim is to short data in increasing Order: I choose BST to acomplish my task of sorting.Here I am putting what I am doing for printing data in Increasing order.
1) Construct a binary search tree for given input.
Time complexity: Avg Case O(log n)
Worst Case O(H) {H is height of tree, here we can Assume Height is equal to number of node H = n}
2)After Finishing first work I am traversing BST in Inorder to print data in Increasing order.
Time complexity: O(n) {n is the number of nodes in tree}
Now I analyzed total complexity for get my desire result (data in increasing order) is for Avg Case: T(n) = O(log n) +O(n) = max(log n, n) = O(n)
For Worst Case : T(n) = O(n) +O(n) = max(n, n) = O(n)
Above point was my understanding which is Differ from Above Link concept. I know I am doing some wrong interpratation Please correct me. I would appreciate your suggestion and thought.
Please Refer this title Under Slide which I have mentined:

In (1) you provide the time per element, you need to multiply with the # of elements.

The time complexity needed to construct the binary tree is n times the complexity you suggest as you need to insert each node.

Related

What is the Time Complexity of a array sorted in the ascending order if it is passed to Reversort Algorithm?

A reversort Algorithm is defined as the following:
Reversort(L):
for i := 1 to length(L) - 1
j := position with the minimum value in L between i and length(L), inclusive
Reverse(L[i..j])
I understand that the time complexity is O(n^2) for a array
But for a array which is already sorted(in ascending) what is the complexity?
Will it remain same or will it become O(n)?
Still takes quadratic time. Not for reversals, since j will always be i so each reversal takes O(1). But for finding the minimum values.
(Finding the minima could be done faster if you for example additionally kept the remaining elements in a min-heap (leading to overall O(n log n) time), but that would really have to be stated. As it's written, it's doing a full search through the remaining part each time.)

Check if an array is a min heap time complexity

I implemented a recursive algorithm to check if an array is a min-heap. I can't figure out what the worst case time complexity should be. Here's the code:
CHECK-MIN-HEAP(A, i, n)
if i > (n - 1) / 2
return true
if A[i] > A[left(i)] or A[i] > A[right(i)]
return false
return CHECK-MIN-HEAP(A, left(i), n) and CHECK-MIN-HEAP(A, right(i), n)
A brief explanation: the base case is represented by the case in which a node is a leaf. This because the element A[(n-1)/2] represent the last not-leaf node. The other base case is when the min-heap condition is violated. In the recursive case we check if the left and right subtrees are heaps.
So in best case, when the array isn't an heap, we have a constant time complexity. In the worst one, when the array is an heap, the function check all the nodes of the heap and 'cause the height of the heap is logn, the time complexity should be O(logn). Is correct? Or the time complexity is O(n)?
O(N) is obviously the correct answer here.
It is obvious because you traverse the entire array element by element looking for invalid invariants.
The best case you state is quite useless as an analysis point, it might be that the final node is invalidating the entire heap O(N).

Time Complexity when processing output

I'm struggling to figure out what the time complexity for this code would be.
def under_ten(input_list : List[int]) -> List[int]:
res = []
for i in input_list:
if i < 10:
res.append(i)
res.sort()
return res
Since the loop iterates over every element of n, I think the best case should be O(n). What I'm not sure about is how sorting the result list affects the time complexity of the entire function. Is the worst case O(nlogn) (all numbers in n are under 10, so the result list is the same size as the input list)? And what would be the average case?
EDIT: Changed input name from n to input_list and added type hints, sorry if that caused some confusion (added type hints as well).
Your first observation is correct that iterating the input collection would be an O(N) operation, where N here is the length of the array called n. The running time of the sort operation at the end would depend on how large the res array is. In the worst case scenario, every number in n would be less than 10, and therefore would end up in res. The internal algorithm Python would be using for sort() would likely be either quicksort or mergesort (q.v. this SO question). Both of these algorithms use a divide-and-conquer approach which run in O(N*lgN). So, in the worst case, your under_ten() function would run in O(N*lgN).
Let N be the length of the list and K the number of elements smaller than 10.
The complexity is O(N + K log K), assuming that append is done in amortized constant time.
In the worst case, K=N, hence O(N Log N), provided the sort truly has a worst case O(N Log N). Otherwise, it could be O(N²).

Complexity of finding the median using 2 heaps

A way of finding the median of a given set of n numbers is to distribute them among 2 heaps. 1 is a max-heap containing the lower n/2 (ceil(n/2)) numbers and a min-heap containing the rest. If maintained in this way the median is the max of the first heap (along with the min of the second heap if n is even). Here's my c++ code that does this:
priority_queue<int, vector<int> > left;
priority_queue<int,vector<int>, greater<int> > right;
cin>>n; //n= number of items
for (int i=0;i<n;i++) {
cin>>a;
if (left.empty())
left.push(a);
else if (left.size()<=right.size()) {
if (a<=right.top())
left.push(a);
else {
left.push(right.top());
right.pop();
right.push(a);
}
}
else {
if (a>=left.top())
right.push(a);
else {
right.push(left.top());
left.pop();
left.push(a);
}
}
}
We know that the heapify operation has linear complexity . Does this mean that if we insert numbers one by one into the two heaps as in the above code, we are finding the median in linear time?
Linear time heapify is for the cost of building a heap from an unsorted array as a batch operation, not for building a heap by inserting values one at a time.
Consider a min heap where you are inserting a stream of values in increasing order. The value at the top of the heap is the smallest, so each value trickles all the way down to the bottom of the heap. Consider just the last half of the values inserted. At this time the heap will have very nearly its full height, which is log(n), so each value trickles down log(n) slots, and the cost of inserting n/2 values is O(n log(n))
If I present a stream of values in increasing order to your median finding algorithm one of the things it has to do is build a min heap from a stream of values in increasing order so the cost of the median finding is O(n log(n)). In, fact the max heap is going to be doing a lot of deletes as well as insertions, but this is just a constant factor on top so I think the overall complexity is still O(n log(n))
When there is one element, the complexity of the step is Log 1 because of a single element being in a single heap.
When there are two elements, the complexity of the step is Log 1 as we have one element in each heap.
When there are four elements, the complexity of the step is Log 2 as we have two elements in each heap.
So, when there are n elements, the complexity is Log n as we have n/2 elements in each heap and
adding an element; as well as,
removing element from one heap and adding it to another;
takes O(Log n/2) = O(Log n) time.
So for keeping track of median of n elements essentially is done by performing:
2 * ( Log 1 + Log 2 + Log 3 + ... + Log n/2 ) steps.
The factor of 2 comes from performing the same step in 2 heaps.
The above summation can be handled in two ways. One way gives a tighter bound but it is encountered less frequently in general. Here it goes:
Log a + Log b = Log a*b (By property of logarithms)
So, the summation is actually Log ((n/2)!) = O(Log n!).
The second way is:
Each of the values Log 1, Log 2, ... Log n/2 is less than or equal to Log n/2
As there are a total n/2 terms, the summation is less than (n/2) * Log (n/2)
This implies the function is upper bound by (n/2) * Log (n/2)
Or, the complexity is O(n * Log n).
The second bound is looser but more well known.
This is a great question, especially since you can find the median of a list of numbers in O(N) time using Quickselect.
But the dual priority-queue approach gives you O(N log N) unfortunately.
Riffing in binary heap wiki article here, heapify is a bottom-up operation. You have all the data in hand and this allows you to be cunning and reduce the number of swaps/comparisons to O(N). You can build an optimal structure from the get-go.
Adding elements from the top, one at a time, as you are doing here, requires reorganizing every time. That's expensive so the whole operation ends up being O(N log N).

Data Structure algorithm

how to Arrange the below data structures in ascending order of the time complexity required for inserts in average case scenario.
1. Sorted Array
2. Hash Table
3. Binary Search Tree
4. B+ Tree
In this answer, I will give you a starters on each data structure, and let you complete the rest on your own.
Sorted Array: In a sorted array of size k, the problem with each
insertion is you are first need to find the index i where the
element should be inserted (easy), and then shift all elements
i,i+1,...,k to the right in order to "make place" for the new
element. This takes O(k) time, and it's actually k/2 moves on average.
So, the average complexity to insert elements to a sorted array is 1/2 + 2/2 + 3/3 + ... + n/2 = (1+...+n)/2.
Use sum of arithmetic progression to see what is its complexity.
A hash table offers O(1) Average amortized case performance for inserting elements. What happens when you do n operations, each O(1)? What will be the total coplexity?
In a Binary Search Tree (BST), each operation is O(h), where h is the current height of the tree. Luckily, when adding elements at random to a binary search tree (even non self balancing) its average height is still O(logn).
So, to get the complexity of adding all elements, you need to sum Some_Const*(log(1) + log(2) + ...+ log(n))
See hint at the end
Similarly to a BST, a B+ tree also takes O(h) time per insertion. Difference is, h is bounded to be logarithimic as well even in worst case. So, the calculation of time complexity is going to remain Some_Other_Const*(log(1) + log(2) + .. + log(n)) when calculating average case.
Hints:
log(x) + log(y) = log(x*y)
log(n!) is in O(nlogn)

Resources