K'th Min From a set of intervals - algorithm

You have given a set of intervals like {2,7} , {3,8}, {9,11} , {-4,-1} so on. The question is to find the k'th min from these set of intervals.
Also the duplicates are counted twice. For example if intervals are {1,4} and {2,6} and k = 3 then the answer is 2 because if we flatten the intervals and sort merge them then we get the sequence
1,2,2,3,3,4,4,5,6
Where 3rd min is 3.
There can be a lot of ways to solve this problem. However I am struggling to find the one with minimum time / space complexity.

Flat the intervals.
Sort the flatten sequence.
Iterate over the sorted sequence, until you find the k-th element,
while ignoring duplicate values.
Now let's do some analysis, where we set N the number of total numbers present in your intervals and M the average number of duplicate values a number will have (will be 1 for a unique flatten sequence).
Space Complexity:
O(N)
where you could do better, if you have many duplicate elements, by iterating over the flatten sequence, while discarding the duplicate elements.
Time Complexity:
O(k*M + NlogN)
Flattening takes O(N)
Sorting takes O(NlogN)
Iteration takes O(k*M)

Related

find multiple elements of different ranks from unsorted array

I am working with very interesting [problem][1]:
Given an unsorted array of n distinct elements, We want to find this set of
logn elements: those at positions 1,2,4,8,16,..., n/2 in the sorted order. (The element at position 1 in the sorted order is the minimum, the one in position 2 is the 2nd smallest, ..., and the one in position n is the largest.)
Assume n is a power of 2 . How fast can you find all these logn elements?
My thought process:
we can find elements of any rank from an array in O(n) using the median of medians. I will not find elements in the order written above which is 1,2,4,8,16,..., n/2. I will first take the middle of asked elements (say k) which is at index (logn)/2 and then i will divide original unsorted array into two parts using two parts using k and i will work on these two parts indiviually. Using this approach i am getting loglogn levels and at each level n amount of work is to be done hence time complexity is nloglogn.
But the options given link does not have this option.
[1](page 2)
https://d1b10bmlvqabco.cloudfront.net/attach/ixj45csz3f961q/grs52ylqx9y/j28yy7bo8jvm/final.pdf

What will be the running time of merging k sorted length n arrays?

Let's assume I want k sorted length-n arrays into a single sorted length-kn array. Consider the algorithm that first divides the k arrays into k/2 pairs of arrays, and uses the Merge subroutine to combine each pair, resulting in k/2 sorted length-2n arrays. The algorithm repeats this step until there is only one length-kn sorted array.
My question is what will be the running time of this procedure, as a function of k and n, ignoring constant factors and lower-order
terms?
The first merge pass iteration deal with the k/2 array of length n pair each taking time proportional to 2n, so it takes Θ(nk) time.
The second merge pass iteration deal with a k/4 array of length 2n pair each taking time proportional to 4n, so it also takes Θ(nk) time.
since we need only log2k merge passes, we obtain the running time of Θ(nklogk).

How can i find the minimum interval that contains half of n numbers?

If I have n numbers , how do I find the minimum interval [a,b] that contains half of those numbers ?
Sort numbers
Set left index to 0, right to n/2-1, get difference A[right]-A[left]
Walk n/2 steps in for-loop, incrementing both indexes, calculating difference again, remember the smallest one and corresponding indexes.
Sort the numbers increasingly and compute all the differences A[i+n/2] - A[i]. The solution is given by the index that minimizes the difference.
Explanation:
There is no need to search among the intervals that contain less than n/2 numbers (because they do not satisfy the conditions) nor those that contain more elements (because if you find a suitable interval, it won't be minimal because you can remove the extreme elements).
When the elements are sorted, any sequence in the array is bounded by its first and last elements. So it suffices to sort the numbers and slide a window of n/2 elements.
Now it is more challenging to tell if this O(n log n) approach is optimal.
How about the following?
Sort the given series of numbers in ascending order.
Start a loop with i from 1st to ([n/2])th number
Calculate the difference d between i + ([n/2])th and ith number. Store the numbers i, i + [n/2] and d in an iteratable collection arr.
End loop
Find the minimum value of d from the array arr. The values of i and i + [n/2] corresponding to this d is your smallest range.

Sorting m sets of total O(n) elements in O(n)

Suppose we have m sets S1,S2,...,Sm of elements from {1...n}
Given that m=O(n) , |S1|+|S2|+...+|Sm|=O(n)
sort all the sets in O(n) time and O(n) space.
I was thinking to use counting sort algorithm on each set.
Counting sort on each set will be O(S1)+O(S2)+...+O(Sm) < O(n)
and because that in it's worst case if one set consists of n elements it will still take O(n).
But will it solve the problem and still hold that it uses only O(n) space?
Your approach won't necessarily work in O(n) time. Imagine you have n sets of one element each, where each set just holds n. Then each iteration of counting sort will take time Θ(n) to complete, so the total runtime will be Θ(n2).
However, you can use a modified counting sort to solve this by effectively doing counting sort on all sets at the same time. Create an array of length n that stores lists of numbers. Then, iterate over all the sets and for each element, if the value is k and the set number is r, append the number r to array k. This process essentially builds up a histogram of the distribution of the elements in the sets, where each element is annotated with the set that it came from. Then, iterate over the arrays and reconstruct the sets in sorted order using logic similar to counting sort.
Overall, this algorithm takes time Θ(n), since it takes time Θ(n) to initialize the array, O(n) total time to distribute the elements, and O(n) time to write them back. It also uses only Θ(n) space, since there are n total arrays and across all the arrays there are a total of n elements distributed.
Hope this helps!

A data structure for counting integers within some range?

Question:
Given n integers in the range [1, k], preprocesses its input and then
answers any query about how many of the n integers have values between a and b, where 1 ≤ a, b ≤ k
are two given parameters. Your algorithm should use O(n + k) preprocessing time.
Your algorithm is reasonably good, but it can be made much faster. Specifically, your algorithm has O(1) preprocessing time, but then spends O(n) time per query because of the linear cost of the time required to do the partitioning step.
Let's consider an alternative approach. Suppose that all of your values were in sorted order. In this case, you could find the number of elements in a range very quickly by just doing two binary searches - a first binary search to find the index of the lower bound, and a second search to find the upper bound - and could just subtract the indices. This would take time O(log n). If you can preprocess the input array to sort it in time O(n + k), then this approach will result in exponentially faster lookup times.
To do this sorting, as #minitech has pointed out, you can use the counting sort algorithm, which sorts in time O(n + k) for integers between 1 and k. Consequently, using both counting sort and the binary search together gives O(n + k) setup time and O(log n) query time.
If you are willing to trade memory for efficiency, though, you can speed this up even further. Let's suppose that k is a reasonably small number (say, not more than 100). Then if you are okay using O(k) space, you can answer these queries in O(1) time. The idea is as follows: build up a table of k elements that represents, for each element k, how many elements of the original array are smaller than k. If you have this array, you can find the total number of elements in some subrange by looking up how many elements are less than b and how many elements are less than a (each in O(1) time), then subtracting them.
Of course, to do this, you have to actually build up this table in time O(n + k). This can be done as follows. First, create an array of k elements, then iterate across the original n-element array and for each element increment the spot in the table corresponding to this number. When you're done (in time O(n + k)), you will have filled in this table with the number of times that each of the values in the range 1 - k exists in the original array (this is, incidentally, how counting sort works). Next, create a second table of k elements that will hold the cumulative frequency. Then, iterate across the histogram you built in the first step, and fill in the cumulative frequency table with the cumulative total number of elements encountered so far as you walk across the histogram. This last step takes time O(k), for a grand total of time O(n + k) for setup. You can now answer queries in time O(1).
Hope this helps!
Here is another simple algorithm:
First allocate an array A of size k, then iterate over n elements and for each integer x increment A[x] by one. this will take O(n) time.
Then compute prefix sum of array A, and store them as array B. this will take O(k).
now for any query of points(a, b) you can simply return: B[b]-B[a]+A[a]

Resources