Time Complexity of adding to a heap - data-structures

I have a Max Heapified Arraylist containing 'n' elements and want to add 'm' more elements to it. If tree height is k, the time complexity for the first 2^k elements would each be log n and for the next 2^k+1 would be log(n+1) each. How to generalize this and get an upper bound for adding these 'm' elements?

Related

Linear sorting using additional data structure (O(1) to find median of set, O(1) for adding element)

Suppose we have arbitrary elements (we can compare them in O(1)) in array and magic DS in which we can add element in O(1) and find the median of elements in DS in O(1). We can't remove elements from DS and there are no equal elements in array. Also, we can create as many such DS as we need.
The question: is there a way to sort the array in O(n) using this DS?
Yes, if this data structure exists then it can be used to sort in O(n) time.
Scan the array to find the minimum and maximum elements. Call this min and max.
Insert all of the array elements into the data structure, in any order.
Insert n - 1 copies of min - 1. The median is now the smallest element from the original array.
Repeat n - 1 times:
Insert two copies of max + 1.
Read off the median, which will now be the next element from the original array in ascending order.
This procedure takes O(n) time, because
Finding the min and max is O(n),
Inserting n elements is n * O(1) = O(n),
Inserting n-1 elments is (n - 1) * O(1) = O(n),
Inserting two elements and reading the median is O(1), so doing this n - 1 times is O(n).

Find the top K elements in O(N log K) time using heaps

Let's say if i have a list containing:
lst = [4,0,8,3,1,5,10]
and I'm planning to use a heap structure to help me retrieve the top k largest number where k is a user input.
I understand that heap sort is O(N log N) where we first take O(N) time to place them in a min/max heap and the O(log N) time to retrieve elements.
But the problem I'm facing now is that I'm required to retrieve the top k users in O(N log K) time. If my k is 4, i would have:
[10,8,5,4]
as my output. The thing I'm confused about is, at the early stage of forming the heap, am i supposed to heap the entire list in order to retrieve the top k elements?
The log K term would suggest that you would only want a heap of size K. Here is one possible solution.
Start with an unsorted array. Convert the first K elements to a min-heap of size K. At the top of the heap will be your smallest element. Successively replace the smallest element with each of the remaining N - K elements in the array (that do not constitute a part of the heap), in O(log K) time. After O(N) such operations, the first K elements in the array (or, the K elements of the heap you created) will now have the K largest elements in your array.
There are other solutions but this is the most straightforward.
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
for (int i : your_arraylist){
pq.add(i); //add to queue
if(pq. size() > k){
pq.poll(); //remove the top element, smallest in this case, once the queue
// reaches the size K
}
}
System.out.println(pq);

Lower bound of merging k sorted arrays of size n

As the title suggests, I am wondering what the proof for the lower bound of merging k sorted arrays of size n is? I know that the bound is O(kn*log[k]), but how was this achieved? I tried comparing to sorting an array of p elements using a decision tree but I don't see how to implement this proof.
This is pretty much easy to prove, try to think about it in a merge-sort way. To merge-sort an array of size K*N it takes O(KN*log(K*N)).
But we don't have to reach leafs of size 1, as we know when the array size is N it is sorted. For simplicity we will assume K is a power of 2.
How many times do we have to divide by 2 to reach leafs of size N ?
K times!
Visualization
So you have log(k) steps, then having to merge each step costs KN, and there are log(k) steps. Hence, the time complexity is O(NK(log(K))
Proof: Lets assume it is not a lower bound and we could achieve better. Then for any unknown array of size N*K we could split it in 2 until we reach sub-arrays of size N, merge-sort each of the arrays of size N in Nlog(N) time and in total for all the arrays K*N*log(N) time.
After having the K arrays of size N sorted, we have to merge them into a bigger array of size N*K, pay less than O(NK*(log(K)) as we assumed it is not the lower bound.
At the end you sorted an unknown array of size N*K in a complexity lesser than N*K*log(N*K) which is not possible in the comparison model.
Hence, you can't achieve better than O(NK*(log(K)) while merging the K sorted arrays of size N.
Possible implementation.
Let's create a heap data structure that store pairs (element, arrayIndex) ordered by element. Then
Add the first element of each array with the corresponding array index to this heap.
On each step, remove the top (lowest) pair p from the heap, add p.element to the result, and insert to the heap the pair (next, p.arrayIndex) with the next element from the array with p.arrayIndex index (if it is not empty).
For tracking 'next' element you need an array with k indices/pointers/iterators that are pointing to the next element of the corresponding array.
There will be at most k elements in the heap at any time, thus the insert/remove operations of the heap will have O(log(k)) complexity. Every element will be inserted and removed once from the heap. The number of elements is n*k. Overall complexity is O(n*k*log(k)).
Create a min heap of size k which stores the next item from each of the k arrays. Each node also stores which array it came from. Create your sorted array by adding the min from the heap to final_sorted_array, then adding the next element from the array that value came from to the heap.
Removing the min elt of the heap is O(log k). You have a total of NK elements so you do this NK times. Final result: O(NK log k).

What is the complexity of this approach to finding K largest of N numbers

In this post on how to find the K largest of N elements the 2nd method proposed is:
Store the first k elements in a temporary array temp[0..k-1].
Find the smallest element in temp[], let the smallest element be min.
For each element x in arr[k] to arr[n-1]
If x is greater than the min then remove min from temp[] and insert x.
Print final k elements of temp[]
While I understand the approach, I do not understand their computed
Time Complexity of O((n-k)*k).
From my perspective, you are making a linear traversal of n-k elements and doing a single comparison on each element. And then perhaps replacing one elements of the temporary array of K elements.
More specifically, where does the *k aspect of their computed
Time Complexity of O((n-k)*k) come from? Why do they multipy n-k by that?
Lets consider that at kth iteration :
arr[k] > min(temp[0..k-1]
Now you will replace min(temp[0..k-1]) with arr[k].
And now you again need to compute the updated min of temp[0..k-1], because that would have changed. It can be any number in your updated temp[0..k-1]
So in worst case, u update the min everytime and hence the O(k).
Thus, time complexity = O((n-k)*k)

How to choose the least number of weights to get a total weight in O(n) time

If there are n unsorted weights and I need to find the least number of weights to get at least weight W.
How do I find them in O(n)?
This problem has many solution methods:
Method 1 - Sorting - O(nlogn)
I guess that the most trivial one would be to sort in descending order and then to take the first K elements that give a sum of at least W. The time complexity will be though O(nlogn).
Method 2 - Max Heap - O(n + klogn)
Another method would be to use a max heap.
Creating the heap will take O(n) and then extracting elements until we got to a total sum of at least W. Each extraction will take O(logn) so the total time complexity will be O(klogn) where k is the number of elements we had to extract from the heap.
Method 3 - Using Min Heap - O(nlogk)
Adding this method that JimMischel suggested in the comments below.
Creating a min heap with the first k elements in the list that sums to at least W. Then, iterate over the remaining elements and if it's greater than the minimum (heap top) replace between them.
At this point, it might be that we have more elements of what we actually need to get to W, so we will just extract the minimums until we reach our limit. In practice, depending on the relation between
find_min_set(A,W)
currentW = 0
heap H //Create empty heap
for each Elem in A
if (currentW < W)
H.add(Elem)
currentW += Elem
else if (Elem > H.top())
currentW += (Elem-H.top())
H.pop()
H.add(Elem)
while (currentW-H.top() > W)
currentW -= H.top()
H.pop()
This method might be even faster in practice, depending on the relation between k and n. See when theory meets practice.
Method 4 - O(n)
The best method I could think of will be using some kind of quickselect while keeping track of the total weight and always partitioning with the median as a pivot.
First, let's define few things:
sum(A) - The total sum of all elements in array A.
num(A) - The number of elements in array A.
med(A) - The median of the array A.
find_min_set(A,W,T)
//partition A
//L contains all the elements of A that are less than med(A)
//R contains all the elements of A that are greater or equal to med(A)
L, R = partition(A,med(A))
if (sum(R)==W)
return T+num(R)
if (sum(R) > W)
return find_min_set(R,W,T)
if (sum(R) < W)
return find_min_set(L,W-sum(R),num(R)+T)
Calling this method by find_min_set(A,W,0).
Runtime Complexity:
Finding median is O(n).
Partitioning is O(n).
Each recursive call is taking half of the size of the array.
Summing it all up we get a follow relation: T(n) = T(n/2) + O(n) which is same as the average case of quickselect = O(n).
Note: When all values are unique both worst-case and average complexity is indeed O(n). With possible duplicates values, the average complexity is still O(n) but the worst case is O(nlogn) with using Median of medians method for selecting the pivot.

Resources