Range median query in an array in O(1) time using preprocessing - algorithm

In classroom we learnt about RMQ to LCA to RMQ problems and how to support the operation range minimum query in O(1) time. As an exercise my professor has assigned me to support the operation : Range median query in O(n) space, O(n) preprocessing time and O(1) select time.
Let's say I have an array A that contains 8, 7, 9, 2, 6, 4, 5, 3. Given median(i, j) we should get the median between the ith and jth elements (inclusive) after sorting the array. A sorted is 2, 3, 4, 5, 6, 7, 8, 9. For example median(2,6) = A[4] = 6 (because median of 4, 5, 6, 7, 8 is 6).
I found other solutions that suggest using segment trees but the complexity is not O(1) in those cases. Is it possible to solve this problem in a similar manner that we solve the RMQ to LCA to RMQ problem?

One option would be to use a non-comparative sorting algorithm. Examples I know of being radix sort (O(wn), where w is the word size) and counting sort (O(n+k), where k is the maximum key value). Both of these are linear with respect to input size.
Then, you could just look up the correct position within the list, which is an O(1) operation. Both sorting methods are within your space requirements -- radix sort is O(n+k) and counting sort is O(k).

This isn't possible with comparisons. If it were, you could comparison sort N elements in O(N) time by preprocessing the input and computing median(i, i) for each i from 0 to N-1.
You probably misunderstood the task you were assigned - you were probably supposed to compute medians for subarrays of the original array, not of a sorted version of the array.

Related

Why is the Big O notation of merge sort not O(n) when it requires looping through every element in the array to merge?

Let's say I have an unsorted array Unsorted_Arr= [2, 8, 1, 3, 6, 7, 5, 4].
Right before the very last pass of merging, I would have two arrays, Arr_Left = [1, 2, 3, 8] and Arr_Right = [4, 5, 6, 7]. To merge them, I would need to iterate through all the n elements of Arr_Right fully, and iterate through n-1 elements of Arr_Left. In total, I would have traversed through n-1 elements of the original Unsorted_Arr. Drop the -1, and I have a time-complexity of O(n) for the merge.
While I understand why the recursive portion of merge sort is log n, since our code contains a portion of code that runs at O(n), shouldn't the worst case scenario of a merge sort be O(n)?
But those two arrays are themselves not necessarily already sorted, so you have to split them and then re-merge them as well. And you'd have to repeat that down until there was only one or zero elements in each array (because a 1 or 0 element array is always sorted) and then re-merge them all.
And at each level, that's at least 2*n operations (for all of the split arrays at that level), which is O(n) for each level.
So how many levels deep would you have to go with the splitting before every array was only 1 or 0 in length? That's Log(n).
Combining the number of levels that you have to split and then re-merge all of the sublists (O(Log(n))) with how many operations have to be performed at each level (O(n)) becomes: O(Log(n)) * O(n) which reduces to O(n*Log(n)).

Given 1 billion numbers we need to find the largest 1 million numbers

I have been stuck in one question.
Given 1 billion numbers we need to find the largest 1 million numbers. One approach is to sort the
numbers and then take the first million numbers from that is in O(n log n). Propose an algorithm that has
expected O(n) time complexity.
Is it Heap sort which can do that with having O(n) complexity?
The general version of the problem you're trying to solve here seems to be the following:
Given n numbers, report the largest k of them in (possibly expected) time O(n).
If you just need to find the top k elements and the ordering doesn't matter, there's a clever O(n)-time algorithm for this problem based on using fast selection algorithms. As a refresher, a selection algorithm takes as input an array A and a number m, then reorders the array A so that the m smallest elements are in the first m slots and the remaining elements occupy the larger slots. The quickselect algorithm does this in (expected) time O(m) and is fast in practice; the median-of-medians algorithm does this in worst-case O(m) time but is slower in practice. While these algorithms typically are framed in terms of finding the smallest k elements, they work just as well finding the largest k elements.
Using this algorithm as a subroutine, here's how we can find the top k elements in time and space O(m):
Initialize a buffer of 2k elements.
Copy the first k elements of the array into the buffer.
While there are elements remaining in the array:
Copy the next k of them into the buffer.
Use a selection algorithm to place the k largest elements
of the buffer in the first k slots of the buffer.
Discard the remaining elements of the buffer.
Return the contents of the buffer.
To see why this works, notice that after each iteration of the loop, we maintain the invariant that the buffer holds the k largest elements of the ones that have been seen so far (though not necessarily in sorted order). Therefore, the algorithm will identify the top k elements of the input and return them in some order.
In terms of time complexity - there's O(k) work to create the buffer, and across all iterations of the loop we do O(n) work copying elements into the buffer. Each call to the selection algorithm takes (expected) time O(k), and there are O(n / k) calls to the algorithm for a net runtime of O(n + k). Under the assumption that k < n, this gives an overall runtime of O(n), with only O(k) total space required.
No general sorting algorithm can do this in O(n) time. Furthermore, without additional constraints (e.g., the billion numbers are taken from the numbers 1 to 1,000,000) there is no sorting algorithm at all that is going to work for this.
However, there is a simple O(n) algorithm to do this:
initialize a return buffer with 1,000,000 empty cells
for each item in your list of 1,000,000,000, do the following:
check each of the 1,000,000 cells in order
if the number from the input is bigger than the number in your buffer, swap them and keep going
if you're looking at a blank cell, put the number you're holding down
if you get to the end of the list and are holding a number, throw it out
Here's an example with a list of 10 things and we want the biggest 5:
Input: [6, 2, 4, 4, 8, 2, 4, 1, 9, 2]
Buffer: [-, -, -, -, -]
[6, -, -, -, -] … see a blank, drop the 6
[6, 2, -, -, -] … 2 < 6, skip, then see a blank, drop the 2
[6, 4, 2, -, -] … 4 < 6 but 4 > 2, swap out 2, see blank, drop 2
[6, 4, 4, 2, -] … 4 <= 4,6 but 4 > 2, swap out 2, see blank, drop 2
[8, 6, 4, 4, 2] … 8 > 6, swap, then swap 6 for 4, etc.
[8, 6, 4, 4, 2] … 2 <= everything, drop it on the floor
[8, 6, 4, 4, 4] … 4 <= everything but 2, swap, then drop 2 on floor
[8, 6, 4, 4, 4] … 1 <= everything, drop it on the floor
[9, 8, 6, 4, 4] … 9 > everything, swap with 8 then drop a 4 on the floor
[9, 8, 6, 4, 4] … 2 <= everything, drop it on the floor
You do 1,000,000 comparisons and potentially up to 1,000,000 swaps for every element in the input (consider input in sorted ascending order). That means you do work proportional to 1,000,000 * n, a linear amount of work in the size of the input n.
You can generally do better than sorting. Most people would solve this problem by using a heap as the structure. The time complexity to build the heap will be O(n). But you will then have to do a million "pop" operations, the time complexity for each pop being O(log n), but you are not doing the full n pops (only n/1000 pops in this case).
I say you can "generally" do better than sorting because most sorting algorithms in libraries are O(n log n). But there is the "distribution sort" that is actually O(n + k), where k is the number of possible values in the range you are sorting and depending on the value of k, you might do better sorting.
Update
To incorporate the suggestion made by #pjs, create a "minimum" heap with the first million values from the billion where a pop operation removes the minimum value from the heap. Then for the next 999,000,000 values, check to see if each one is greater than the current minimum value on the heap and if so, pop the current minimum value from the heap and push the new value. When you are done, you will be left with the 1,000,000 largest values.

Time complexity with insertion sort for 2^N array?

Consider an array of integers, which has a size of 2^N, where the element at index X (0 ≤ X < 2N) is X xor 3 (that is, two least significant bits of X are flipped). What is the running time of the insertion sort on this array?
Examine the structure of what the lists looks like:
For n = 2:
{3, 2, 1, 0}
For n = 3 :
{3, 2, 1, 0, 7, 6, 5, 4}
For insertion sort, you're maintaining the invariant that the list from 1 up to your current index is sorted, so you're task at each step is to place the current element into it's correct place among the sorted elements before it. In the worst case, you will have to traverse all previous indices before you can insert the current value (think of the case where the list is in reverse sorted order). But it's clear from the structure above that for a list with the property that each value is equivalent to the index ^ 3, that the furthest back in the list that you would have to go from any given index is 3. So you've reduced the possibility that you'd have to do O(n) work at the insertion step to a constant factor. But you still have to do O(n) work to examine each element of the list. So, for this particular case, the running time of insertion sort is linear in the size of the input, whereas in the worst case it is quadratic.

Inset number from a non-sorted list of numbers into a sorted list of number

I have a list A which its elements are sorted from smallest to biggest. For example:
A = 1, 5, 9, 11, 14, 20, 46, 99
I want to insert the elements of a non-sorted list of numbers inside A while keeping the A sorted. For example:
B = 0, 77, 88, 10, 4
is going to be inserted in A as follows:
A = 0, 1, 4, 5, 9, 10, 14, 20, 46, 77, 88, 99
What is the best possible solution to this problem?
Best possible is too subjective depending on the definition of best. From big-O point of view, having an array A of length n1 and array B of length n2, you can achieve it in max(n2 * log(n2), n1 + n2).
This can be achieved by sorting the array B in O(n log n) and then merging two sorted arrays in O(n + m)
Best solution depends on how you define the best.
Even for time complexity, it still depends on your input size of A and B. Assume input size A is m and input size of B is n.
As Salvador mentioned, sort B in O(nlogn) and merge with A in O(m + n) is a good solution. Notice that you can sort B in O(n) if you adopt some non-comparison based sort algorithm like counting sort, radix sort etc.
I provide another solution here: loop every elements in B and do a binary search in A to find the insertion position and then insert. The time complexity is O(nlog(m + n)).
Edit 1: As #moreON points out, the binary search and then insert approach assume you list implementation support at least amortized O(1) for insert and random access. Also found that the time complexity should be O(nlog(m + n)) instead of O(nlogm) as the binary search took more time when more elements are added.

Find minimum length interval that has K as its factor

Given an integer K and a list of N integers. We need to find all possible shortest intervals in the list, such that the product of the integers in each interval, is a multiple of K.
Example : Let N=6, K=5 and array be [2,9,4,3,16] then here minimum length of interval is 2 whose product are multiple of K.
Intervals: [1, 2] , [2, 3] , [3, 4] , [4, 5].
Now I need to find both minimum length and all intervals start and end.
But the problem is constraints are large, 1≤N≤2×10^5 , 1≤K≤10^17 and array elements are upto 10^15.
You can use a segment tree to be able to compute product(a[i...j])%K in O(log N).
From the principle that if product(a[i...j])%K==0, then product(a[i...j+k])%K==0, you can, for each i, perform a binary search to find the first j where product(a[i..j])%K==0.
In the first pass, find what's the minimum length. Then do another pass finding and printing which i's have that length.
That's O(n log^2 n). For 2*10^5 that should be enough. Specially given the answer can have O(n^2) items (e.g. n/2 subarrays with n/2 items each).

Resources