Linear sorting using additional data structure (O(1) to find median of set, O(1) for adding element) - algorithm

Suppose we have arbitrary elements (we can compare them in O(1)) in array and magic DS in which we can add element in O(1) and find the median of elements in DS in O(1). We can't remove elements from DS and there are no equal elements in array. Also, we can create as many such DS as we need.
The question: is there a way to sort the array in O(n) using this DS?

Yes, if this data structure exists then it can be used to sort in O(n) time.
Scan the array to find the minimum and maximum elements. Call this min and max.
Insert all of the array elements into the data structure, in any order.
Insert n - 1 copies of min - 1. The median is now the smallest element from the original array.
Repeat n - 1 times:
Insert two copies of max + 1.
Read off the median, which will now be the next element from the original array in ascending order.
This procedure takes O(n) time, because
Finding the min and max is O(n),
Inserting n elements is n * O(1) = O(n),
Inserting n-1 elments is (n - 1) * O(1) = O(n),
Inserting two elements and reading the median is O(1), so doing this n - 1 times is O(n).

Related

Find specific numbers of frequency in an unsorted array

I have a question:
I have an unsorted array with n numbers,
I need to find the numbers that appear more than 10% in the array.
Can u please write me a pseudo code with the Time complexity
An example:
array A = {12,11,1,3,1,4,4,7,8,9,10}
The answer is 1,3.
You can use a Hash Table (Hash Map) to solve this.
Iterate you array,
if the hash table does not contain your element (number)
then add it with a counter set to 1
else increment the counter by 1.
Iterate your hash table and keep every entry where the counter is more than 10% of the size of the array.
Time complexity is the cost of iterating the array (n) plus iterating the hash table (worst case : n) : O(n) = 2n
An other solution would be to sort the array then to iterate it counting each element and keeping the element if the count is more than 10%
Time complexity is the cost of a sort (nlog(n)) plus the cost of iterating the array (n) : O(n) = n + nlog(n)

Lower bound of merging k sorted arrays of size n

As the title suggests, I am wondering what the proof for the lower bound of merging k sorted arrays of size n is? I know that the bound is O(kn*log[k]), but how was this achieved? I tried comparing to sorting an array of p elements using a decision tree but I don't see how to implement this proof.
This is pretty much easy to prove, try to think about it in a merge-sort way. To merge-sort an array of size K*N it takes O(KN*log(K*N)).
But we don't have to reach leafs of size 1, as we know when the array size is N it is sorted. For simplicity we will assume K is a power of 2.
How many times do we have to divide by 2 to reach leafs of size N ?
K times!
Visualization
So you have log(k) steps, then having to merge each step costs KN, and there are log(k) steps. Hence, the time complexity is O(NK(log(K))
Proof: Lets assume it is not a lower bound and we could achieve better. Then for any unknown array of size N*K we could split it in 2 until we reach sub-arrays of size N, merge-sort each of the arrays of size N in Nlog(N) time and in total for all the arrays K*N*log(N) time.
After having the K arrays of size N sorted, we have to merge them into a bigger array of size N*K, pay less than O(NK*(log(K)) as we assumed it is not the lower bound.
At the end you sorted an unknown array of size N*K in a complexity lesser than N*K*log(N*K) which is not possible in the comparison model.
Hence, you can't achieve better than O(NK*(log(K)) while merging the K sorted arrays of size N.
Possible implementation.
Let's create a heap data structure that store pairs (element, arrayIndex) ordered by element. Then
Add the first element of each array with the corresponding array index to this heap.
On each step, remove the top (lowest) pair p from the heap, add p.element to the result, and insert to the heap the pair (next, p.arrayIndex) with the next element from the array with p.arrayIndex index (if it is not empty).
For tracking 'next' element you need an array with k indices/pointers/iterators that are pointing to the next element of the corresponding array.
There will be at most k elements in the heap at any time, thus the insert/remove operations of the heap will have O(log(k)) complexity. Every element will be inserted and removed once from the heap. The number of elements is n*k. Overall complexity is O(n*k*log(k)).
Create a min heap of size k which stores the next item from each of the k arrays. Each node also stores which array it came from. Create your sorted array by adding the min from the heap to final_sorted_array, then adding the next element from the array that value came from to the heap.
Removing the min elt of the heap is O(log k). You have a total of NK elements so you do this NK times. Final result: O(NK log k).

What is the complexity of this approach to finding K largest of N numbers

In this post on how to find the K largest of N elements the 2nd method proposed is:
Store the first k elements in a temporary array temp[0..k-1].
Find the smallest element in temp[], let the smallest element be min.
For each element x in arr[k] to arr[n-1]
If x is greater than the min then remove min from temp[] and insert x.
Print final k elements of temp[]
While I understand the approach, I do not understand their computed
Time Complexity of O((n-k)*k).
From my perspective, you are making a linear traversal of n-k elements and doing a single comparison on each element. And then perhaps replacing one elements of the temporary array of K elements.
More specifically, where does the *k aspect of their computed
Time Complexity of O((n-k)*k) come from? Why do they multipy n-k by that?
Lets consider that at kth iteration :
arr[k] > min(temp[0..k-1]
Now you will replace min(temp[0..k-1]) with arr[k].
And now you again need to compute the updated min of temp[0..k-1], because that would have changed. It can be any number in your updated temp[0..k-1]
So in worst case, u update the min everytime and hence the O(k).
Thus, time complexity = O((n-k)*k)

How to choose the least number of weights to get a total weight in O(n) time

If there are n unsorted weights and I need to find the least number of weights to get at least weight W.
How do I find them in O(n)?
This problem has many solution methods:
Method 1 - Sorting - O(nlogn)
I guess that the most trivial one would be to sort in descending order and then to take the first K elements that give a sum of at least W. The time complexity will be though O(nlogn).
Method 2 - Max Heap - O(n + klogn)
Another method would be to use a max heap.
Creating the heap will take O(n) and then extracting elements until we got to a total sum of at least W. Each extraction will take O(logn) so the total time complexity will be O(klogn) where k is the number of elements we had to extract from the heap.
Method 3 - Using Min Heap - O(nlogk)
Adding this method that JimMischel suggested in the comments below.
Creating a min heap with the first k elements in the list that sums to at least W. Then, iterate over the remaining elements and if it's greater than the minimum (heap top) replace between them.
At this point, it might be that we have more elements of what we actually need to get to W, so we will just extract the minimums until we reach our limit. In practice, depending on the relation between
find_min_set(A,W)
currentW = 0
heap H //Create empty heap
for each Elem in A
if (currentW < W)
H.add(Elem)
currentW += Elem
else if (Elem > H.top())
currentW += (Elem-H.top())
H.pop()
H.add(Elem)
while (currentW-H.top() > W)
currentW -= H.top()
H.pop()
This method might be even faster in practice, depending on the relation between k and n. See when theory meets practice.
Method 4 - O(n)
The best method I could think of will be using some kind of quickselect while keeping track of the total weight and always partitioning with the median as a pivot.
First, let's define few things:
sum(A) - The total sum of all elements in array A.
num(A) - The number of elements in array A.
med(A) - The median of the array A.
find_min_set(A,W,T)
//partition A
//L contains all the elements of A that are less than med(A)
//R contains all the elements of A that are greater or equal to med(A)
L, R = partition(A,med(A))
if (sum(R)==W)
return T+num(R)
if (sum(R) > W)
return find_min_set(R,W,T)
if (sum(R) < W)
return find_min_set(L,W-sum(R),num(R)+T)
Calling this method by find_min_set(A,W,0).
Runtime Complexity:
Finding median is O(n).
Partitioning is O(n).
Each recursive call is taking half of the size of the array.
Summing it all up we get a follow relation: T(n) = T(n/2) + O(n) which is same as the average case of quickselect = O(n).
Note: When all values are unique both worst-case and average complexity is indeed O(n). With possible duplicates values, the average complexity is still O(n) but the worst case is O(nlogn) with using Median of medians method for selecting the pivot.

Construct an array with from an existing array

Given an array of integers A[1...n-1] where N is the length of array A. Construct an array B such that B[i] = min(A[i], A[i+1], ..., A[i+K-1]), where K will be given. Array B will have N-K+1 elements.
We can solve the problem using min-heaps Construct min-heap for k elements - O(k). For every next element delete the first element and insert the new element and heapify.
Hence Worst Case Time - O( (n-k+1)*k ) + O(k) Space - O(k)
Can we do it better?
We can do better if in the algorithm from OP we change expensive "heapify" procedure to much cheaper "upheap" or "downheap". This gives O(n * log(k)) time complexity.
Or, if we iterate through input array and put each element to the min-queue of size 'k', we can do it in O(n) time. Min-queue is a queue that can perform find-min in O(1) time. It may be implemented as a pair of min-stacks. See this answer for details.

Resources