Greedy Attempt for covering all the numbers with the given intervals - algorithm

Let S be a set of intervals (containing n number of intervals) of the natural numbers that might overlap and N be a list of numbers (containing n number of numbers).
I want to find the smallest subset (let's call P) of S such that for each number
in our list N, there exists at least one interval in P that contains it. The intervals in P are allowed to overlap.
Trivial example:
S = {[1..4], [2..7], [3..5], [8..15], [9..13]}
N = [1, 4, 5]
// so P = {[1..4], [2..7]}
I think a dynamic algorithm might not work always, so if anybody knows of a solution to this problem (or a similar one that can be converted into), that would be great. I am trying to make a O(n^2 solution)
Here is one greedy approach
P = {}
for each q in N: // O(n)
if q in P // O(n)
continue
for each i in S // O(n)
if q in I: // O(n)
P.add(i)
break
But that is O(n^4).. Any help with creating a greedy approach that is O(n^2) would be great!
Thanks!
* Update: * I've been slamming at this problem and I think I have an O(n^2) solution!!
Let me know if you think I'm right!!!
N = MergeSort (N)
upper, lower = infinity, -1
P = empty set
for each q in N do
if (q>=lower and q<=upper)=False
max_interval = [-infinity, infinity]
for each r in S do
if q in r then
if r.rightEndPoint > max_interval.rightEndPoint
max_interval = r
P.append(max_interval)
lower = max_interval.leftEndPoint
upper = max_interval.rightEndPoint
S.remove(max_interval)
I think this should work!! I'm trying to find a counter solution; but yeah!!

This problem is similar to set cover problem, which is NP-complete (i.e., arguably has no solution faster than exponential). What makes it different is that intervals always cover adjacent elements (not arbitrary subset of N), which opens ways for faster solutions.
http://en.wikipedia.org/wiki/Set_cover_problem
I think that the solution proposed by Mike is good enough. But I think I have quite straightforward O(N^2) greedy algo. It starts like the Mike's one (moreover, I believe Mike's solution can also be improved in similar way):
You sort your N numbers and place them sorted into array ELEM; COMPLEXITY O(N*lg N);
Using binary search, for each interval S[i] you identify starting and ending index of elements in ELEM that are covered by S[i]. Say, you place this pair of numbers into array COVER, the difference between the two indices tells you how many elements you cover, for simplicity, let us place it array COVER_COUNT; COMPLEXITY O(N*lg N);
You introduce index pointer p, that shows till which element in ELEM, your N is already covered. you set p = 0, meaning that all elements up to 0-th (excluded) are initially covered (i.e., no elements); Complexity O(1). Moreover you introduce boolean array IS_INCLUDED, that reflects if interval S[i] is already included in your coverage set. Complexity O(N)
Then you start from the 0-th element in ELEM and see what is the interval that contains ELEM[0] and has greater coverage COVER_COUNT[i]. Imagine that it is i-th interval. We then mark it as included by setting IS_INCLUDED[i] to true. Then you set p to end[i] + 1 where end[i] is the ending index in COVER[i] pair (indeed now all elements til end[i] are covered). Then, knowing p you update all elements in COVER_COUNT so that they reflect how many elements of not yet covered elements each interval covers (this can be easily done in O(N) time). Then you perform the same step for ELEM[p] and continues till p >= ELEM.length. It can be observed that the overall complexity is O(N^2).
You finish in O(n^2) and in IS_INCLUDED has true for intervals of S included in optimal cover set
Let me know if this solution seems reasonable to you and if I calculated everything well.
P.S. Just wanted to add that the optimality of ythe solution found by algo can be proved by induction and contradiction. By contradiction, it is easy to show that at least one optimal solution includes the longest interval of those covering element ELEM[0]. If so, by induction we can show that for each next element in algo, we can keep on following the strategy of selelcting the interval that is the longest with respect to the number of remaining elements covered and that covers the leftmost yet uncovered element.

I am not sure, but mb some think like this.
1) For each interval create a list with elements from N witch contain in interval, it will take O(n^2) lets call it Q[i] for S[i]
2) Then sort our S by length of Q[i], O(n*lg(n))
3) Go throw this array excluding Q[i] from N O(n) and from Q[i+1]...Q[n] = O(n^2)
4) Repeat 2 while N is not empty.
It's not O(n^2), it's O(n^3) but if you can use hashmap, i think you can improve this.

Related

Find the pair of numbers in an unsorted array with a sum closest to an arbitrary target [duplicate]

This question already has answers here:
Linear time algorithm for 2-SUM
(13 answers)
Closed 8 years ago.
This is a generalization of the 2Sum problem
Given an unsorted array of numbers, how do you find the pair of numbers with a sum closest to an arbitrary target. Note that an exact match may not exist, so the O(n) hashtable solution doesn't fit here.
I can solve the problem in O(n*log(n)) time for a target of 0 as so:
Sort the numbers by absolute value.
Iterate across the sorted array, keeping the minimum of the sums of adjacent values.
This works because the three cases of pairs (+/+, +/-, -/-) are all handled by the logic of absolute value. That is, the sum of pairs of the same sign is minimized when they are closest to 0, and the sum of pairs of different sign is minimized when the components of the pair are closest to each other. Both of these cases are represented when sorting by absolute value.
How can I generalize this to an arbitrary target sum?
Step 1: Sort the array in non-decreasing order. Complexity: O( n lg n )
Step 2: Scan inwards from both ends. Complexity: O( n )
On the sorted array A, let l point to the left-most (i.e. minimum) element and r point to the right-most (i.e. maximum) element.
while true:
curSum = A[ l ] + A[ r ]
diff = currSum - target
if( diff > 0 ):
r = r - 1
else:
l = l + 1
If ever diff == 0, then you got a perfect match for 2Sum.
Otherwise, look for change of sign in diff, i.e. transition from positive to negative, or negative to positive. Whenever the transition happens, either just before or just after the change, currSum is closest to target.
Overall Complexity: O( n log n ) because of step 1.
If you are going to sort the numbers anyway, your algorithm already will have a complexity of at least O(n*log(n)). So here is what you can do: for each number v you can perform a binary search to find the least number u in the array that gets the sum u + v more than target. Now check u + v and u + t where t is the predecessor of v in the sorted array.
The complexity of this is n times the complexity of binary search i.e. O(n * log (n)) thus your overall complexity remains O(n*log(n)). Also this solution is way easier to implement than what you suggest.
As pointed out by amit in a comment to the question you can do the second phase with linear complexity, improving its speed. Still the overall computational complexity will remain the same and it is a little bit harder to implement the solution.

Maximum non-overlapping intervals in a interval tree

Given a list of intervals of time, I need to find the set of maximum non-overlapping intervals.
For example,
if we have the following intervals:
[0600, 0830], [0800, 0900], [0900, 1100], [0900, 1130],
[1030, 1400], [1230, 1400]
Also it is given that time have to be in the range [0000, 2400].
The maximum non-overlapping set of intervals is [0600, 0830], [0900, 1130], [1230, 1400].
I understand that maximum set packing is NP-Complete. I want to confirm if my problem (with intervals containing only start and end time) is also NP-Complete.
And if so, is there a way to find an optimal solution in exponential time, but with smarter preprocessing and pruning data. Or if there is a relatively easy to implement fixed parameter tractable algorithm. I don't want to go for an approximation algorithm.
This is not a NP-Complete problem. I can think of an O(n * log(n)) algorithm using dynamic programming to solve this problem.
Suppose we have n intervals. Suppose the given range is S (in your case, S = [0000, 2400]). Either suppose all intervals are within S, or eliminate all intervals not within S in linear time.
Pre-process:
Sort all intervals by their begin points. Suppose we get an array A[n] of n intervals.
This step takes O(n * log(n)) time
For all end points of intervals, find the index of the smallest begin point that follows after it. Suppose we get an array Next[n] of n integers.
If such begin point does not exist for the end point of interval i, we may assign n to Next[i].
We can do this in O(n * log(n)) time by enumerating n end points of all intervals, and use a binary search to find the answer. Maybe there exists linear approach to solve this, but it doesn't matter, because the previous step already take O(n * log(n)) time.
DP:
Suppose the maximum non-overlapping intervals in range [A[i].begin, S.end] is f[i]. Then f[0] is the answer we want.
Also suppose f[n] = 0;
State transition equation:
f[i] = max{f[i+1], 1 + f[Next[i]]}
It is quite obvious that the DP step take linear time.
The above solution is the one I come up with at the first glance of the problem. After that, I also think out a greedy approach which is simpler (but not faster in the sense of big O notation):
(With the same notation and assumptions as the DP approach above)
Pre-process: Sort all intervals by their end points. Suppose we get an array B[n] of n intervals.
Greedy:
int ans = 0, cursor = S.begin;
for(int i = 0; i < n; i++){
if(B[i].begin >= cursor){
ans++;
cursor = B[i].end;
}
}
The above two solutions come out from my mind, but your problem is also referred as the activity selection problem, which can be found on Wikipedia http://en.wikipedia.org/wiki/Activity_selection_problem.
Also, Introduction to Algorithms discusses this problem in depth in 16.1.

How do you find multiple ki smallest elements in array?

I am struggling with my homework and need a little push- the question is to design an algorithm that will in O(nlogm) time find multiple smallest elements 1<k1<k2<...<kn and you have m *k. I know that a simple selection algorithm takes o(n) time to find the kth element, but how do you reduce the m in your recurrence? I though to do both k1 and kn in each run, but that will only take out 2, not m/2.
Would appreciate some directions.
Thanks
If I understand the question correctly, you have a vector K containing m indices, and you want to find the k'th ranked element of A for each k in K. If K contains the smallest m indices (i.e. k=1,2,...,m) then this can be done easily in linear time T=O(n) by using quickselect to find the element k_m (since all the smaller elements will be on the left at the end of quickselect). So I'm assuming that K can contain any set of m indices.
One way to accomplish this is by running quickselect on all of K at the same time. Here is the algorithm
QuickselectMulti(A,K)
If K is empty, then return an empty result set
Pick a pivot p from A at random
Partition A into sets A0<p and A1>p.
i = A0.size + 1
if K contains i, then remove i from K and add (i=>p) to the result set.
Partition K into sets K0<i and K1>i
add QuickselectMulti(A0,K0) to the result set
subtract i from each k in K1
call QuickselectMulti(A1,K1), add i to each index of the output, and add this to the result set
return the result set
If K contains just one element, this is the same as randomized quickselect. To see why the running time is O(n log m) on average, first consider what happens when each pivot exactly splits both A and K in half. In this case, you get two recursive calls, so you have
T = n + 2T(n/2,m/2)
= n + n + 4T(n/4,m/4)
= n + n + n + 8T(n/8,m/8)
Since m drops in half each time, then n will show up log m times in this summation. To actually derive the expected running time requires a little more work, because you can't assume that the pivot will split both arrays in half, but if you work through the calculations, you will see that the running time is in fact O(n log m) on average.
On edit: The analysis of this algorithm can make this simpler by choosing the pivot by running p=Quickselect(A,k_i) where k_i is the middle element of K, rather than choosing p at random. This will guarantee that K gets split in half each time, and so the number of recursive calls will be exactly log m, and since quickselect runs in linear time, the result will still be O(n log m).

Efficient multiselection algorithm

I have to implement an algorithm that solves the multi-selection problem.
The multiselection problem is:
Given a set S of n elements drawn from a linearly ordered set, and a set K = {k1, k2,...,kr} of positive integers between 1 and n, the multiselection problem is to select the ki-th smallest element for all values of i, 1 <= i <= r
I need to solve the average case on Θ(n log r)
I've found a paper that implements the solution I need, but it assumes that there are no repeated numbers on the set S. The problem is that I can't assume that and I don't know how to adapt the algorithm of that paper to support repeated numbers.
The paper is here: http://www.ccse.kfupm.edu.sa/~suwaiyel/publications/multiselection_parCom.pdf
and the algorithm is on the second page. Any tips are welcome!
For posterity: the algorithm to which Ivan refers is to sort K, then solve the problem recursively as follows. Use QuickSelect to find the ki-th smallest element x where i is ceil(r/2), then recurse on the smaller halves of K and S, and the larger halves of K and S, splitting K about i and S about x.
Finding algorithms that work in the presence of degeneracy (here, equal elements) is often not a high priority for authors of theoretical works, because it makes the presentation of the common case more difficult and doesn't often play a role in determining the computational complexity of the problem. This is essentially a one-dimensional problem, and the black box solution is easy; replace the i-th element of the input yi by (yi, i) and break ties in the comparisons using the second component.
In practice, we can do better. Instead of recursing on {y : y in S, y < x} and {y : y in S, y > x}, use a three-way partitioning algorithm about x (see, e.g., every sufficiently complete treatment of QuickSort), then divide the array S by index instead of value.

Given a sorted array, find the maximum subarray of repeated values

Yet another interview question asked me to find the maximum possible subarray of repeated values given a sorted array in shortest computational time possible.
Let input array be A[1 ... n]
Find an array B of consecutive integers in A such that:
for x in range(len(B)-1):
B[x] == B[x+1]
I believe that the best algorithm is dividing the array in half and going from the middle outwards and comparing from the middle the integers with one another and finding the longest strain of the same integers from the middle. Then I would call the method recursively by dividing the array in half and calling the method on the two halves.
My interviewer said my algorithm is good but my analysis that the algorithm is O(logn) is incorrect but never got around to telling me what the correct answer is. My first question is what is the Big-O analysis of this algorithm? (Show as much work as possible please! Big-O is not my forte.) And my second question is purely for my curiosity whether there is an even more time efficient algorithm?
The best you can do for this problem is an O(n) solution, so your algorithm cannot possibly be both correct and O(lg n).
Consider for example, the case where the array contains no repeated elements. To determine this, one needs to examine every element, and examining every element is O(n).
This is a simple algorithm that will find the longest subsequence of a repeated element:
start = end = 0
maxLength = 0
i = 0
while i + maxLength < a.length:
if a[i] == a[i + maxLength]:
while i + maxLength < a.length and a[i] == a[i + maxLength]:
maxLength += 1
start = i
end = i + maxLength
i += maxLength
return a[start:end]
If you have reason to believe the subsequence will be long, you can set the initial value of maxLength to some heuristically selected value to speed things along, and then only look for shorter sequences if you don't find one (i.e. you end up with end == 0 after the first pass.)
I think we all agree that in the worst case scenario, where all of A is unique or where all of A is the same, you have to examine every element in the array to either determine there are no duplicates or determine all the array contains one number. Like the other posters have said, that's going to be O(N). I'm not sure divide & conquer helps you much with algorithmic complexity on this one, though you may be able to simplify the code a bit by using recursion. Divide & conquer really helps cut down on Big O when you can throw away large portions of the input (e.g. Binary Search), but in the case where you potentially have to examine all the input, it's not going to be much different.
I'm assuming the result here is you're just returning the size of the largest B you've found, though you could easily modify this to return B instead.
So on the algorithm front, given that A is sorted, I'm not sure there's going to be any answer faster/simpler answer than just walking through the array in order. It seems like the simplest answer is to have 2 pointers, one starting at index 0 and one starting at index 1. Compare them and then increment them both; each time they're the same you tick a counter upward to give you the current size of B and when they differ you reset that counter to zero. You also keep around a variable for the max size of a B you've found so far and update it every time you find a bigger B.
In this algorithm, n elements are visited with a constant number of calculations per each visited element, so the running time is O(n).
Given sorted array A[1..n]:
max_start = max_end = 1
max_length = 1
start = end = 1
while start < n
while A[start] == A[end] && end < n
end++
if end - start > max_length
max_start = start
max_end = end - 1
max_length = end - start
start = end
Assuming that the longest consecutive integers is only of length 1, you'll be scanning through the entire array A of n items. Thus, the complexity is not in terms of n, but in terms of len(B).
Not sure if the complexity is O(n/len(B)).
Checking the 2 edge case
- When n == len(B), you get instant result (only checking A[0] and A[n-1]
- When n == 1, you get O(n), checking all elements
- When normal case, I'm too lazy to write the algo to analyze...
Edit
Given that len(B) is not known in advance, we must take the worst case, i.e. O(n)

Resources