Interval selection algorithm with 2 overlappings at most - algorithm

I am trying to solve this problem, in which N (N<=1000) intervals are given, and the algorithm should compute the (size) largest subset of intervals such that no three intervals in the subset share a common point. Right now I am looking only for a Dynamic Programming algorithm.
My idea was to sort the intervals by finishing time and then to consider the following subproblem:
A[i] - the solution to the problem (i.e max. number of intervals) using only the first i intervals.
And now my question is about the recurrence: if the ith interval is taken, then I can't figure out the recurrence.
Could someone explain where I did wrong (was it the subproblem definition or do I simply miss something at the recurrence subproblem)?
EDIT (new idea?)
After some research, I have found this DP solution for the general Job Selection algorithm. So the next idea would be to have the q array, where q[i] = the index of the last interval that has no overlapping with the i-th interval.
And then for sure, the formula for computing A[i] when the i-th interval is taken would be something like (note the missing part):
A[i] = 1 + A[q[i]] + [missing-part];
//1 stands for the i-th interval
//A[q[i]] stands for the solution of the intervals that do not overlap with with the i-th interval
//[missing-part] is the maximum number of intervals from the intervals that overlap with the i-th intervals that are safe to be added.
The question remain: how to compute the missing part?
EDIT (greedy solution, not the wanted solution)
The greedy solution is pretty straightforward, very similar to the Job Selection Problem, with the additional check that when adding a new interval, there must remain no unprocessed intervals that break the condition of the problem.

Related

Can I get an explanation for how optimal substructure is used to find the longest increasing subsequence in this powerpoint slide?

I'm learning about finding optimal solutions in my algorithms class at the moment and one of the topics is about finding optimal substructures in problems.
My understanding of it so far is that we see if we can find an optimal solution for a problem of size n. If we can, then we increase the size of the problem by 1 so it's n+1. If the optimal solution for n+1 includes the entire optimal solution of n plus the new solution introduced by the +1, then we have optimal substructure.
I was given an example of using optimal substructure to find the longest increasing subsequence given a set of numbers. This is shown on the powerpoint slide here:
Can someone explain to me the notation on the bottom of the slide and give me a proof that this problem can be solved using optimal substructure?
Lower(i) means a set of positions j in S to the left of the current index i such that Sj is less than Si. In other words, elements Sj and Si are in increasing order, even though there may be other elements in between them.
The expression with the brace on the left explains how we construct the answer:
First line says that if the set Lower(i) is empty (i.e. Si is the largest number in the sequence so far) then the answer is 1. This is the base case: a single number is treated as one-element sequence
Second line says that if Lower(i) is not empty, then we pick the max element from it, and add 1. In other words, we look to the left of the number Si for another number Sj that is smaller than Si, and ends the longest ascending subsequence among Lower(i).
All of this is incredibly long way of writing these six lines of pseudocode:
L[0] = 1
for i = 1..N
L[i] = 1
for j = i..0
if S[i] > S[j] // Member of Lower(i) ?
L[i] = MAX(L[i], L[j]+1)
Just to add to #dasblinkenlight answer:
This is an iterative approach based on optimal substructure because at any given iteration i, we will figure out the length of the longest increasing subsequence ending at index i. Hence by the time we reach this iteration all corresponding LIS are already established for any index j < i. Using this information we find the answer for index i, i+1 and so on. Now the original question is asking for the LIS, but it has to have an ending index, so it is enough to take the maximum LIS among all indexes.
Such approach is strongly correlated with Mathematical Induction and quite broad programming/algorithm method Dynamic Programming.
P.S.
There exists another, slightly more complicated approach, which allows to compute LIS in a more efficient way using binary search. The algorithm from the slides is O(n^2), when O(n*log(n)) algorithm does exist as well.

Chose which tasks to do to earn the biggest amount of money possible

I can't get my head around this problem:
You're given n tasks
Each task has time t, which represents the time needed to finish the task(t[i] is the time needed for i-th task)
r[i] represents deadline for the i-th task (we start at time=0, and r[i] is an int that represent when the task must be finished)
if the task is done before deadline, you get a reward which is p[i] for the task i
You need to calculate the maximum reward you can get by doing a combination of theese tasks
A reward is given only if a deadline is met
All values are integers
The solution must be of the lowest complexity possible
I tried to apply greedy method, but I realized that algorithm wouldn't always give the ooptimal solution. I can write a brute force algorithm, but that's not the point. I think that dynamic programming maybe can be used, but I don't know how.
If all the times are equal, then the system can be shown to be a matroid and the greedy algorithm will give optimal results (see chapter 16 of "Introduction To Algorithms" by Charles E. Leiserson, Thomas H. Cormen, Clifford Stein, and Ronald Rivest).
However, suppose times are not equal, in general in this case the problem is NP-hard.
To see why, consider the case when all the deadlines are a fixed value. Then the problem is equivalent to finding the best way of packing items into a fixed time budget and is equivalent to the knapsack problem which is known to be NP-complete.
In your particular case, the times are integers so you may be able to adapt the dynamic programming approach for Knapsack.
I would recommend attempting to solve this using dynamic programming based on the subproblem f(t) which is the smallest penalty for scheduling all tasks with deadlines less than or equal to t.
I solved it by using dynamic programming. f[i] is the maximum reward you can get by the i-th hour. The solution is f[max(r)], where max(r) is the largest value from the deadlines.
In my solution, you also need a list X[i], where X[i] represents the optimal list of tasks you should do by the i-th hour to get the biggest award.
Here's pseudocode:
LIST x0...n = empty; // x0, x1....xn are all different lists
f[0] = 0;
for i=1 to max(r) do
max = f[i-1];
x[i] = x[i-1];
for j=1 to n do
if t[j] <= i and r[j] >= i and j.isNotElementOf(x[f[i-t[j]]]) then
reward = p[j] + f[i-t[j]];
if reward > max then
max = reward;;
x[i] = x[i-t[j]];
x[i].add(j);
f[i] = max;
return f[max(r)];

Google Interview : Find the maximum sum of a polygon [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Given a polygon with N vertexes and N edges. There is an int number(could be negative) on every vertex and an operation in set (*,+) on every edge. Every time, we remove an edge E from the polygon, merge the two vertexes linked by the edge (V1,V2) to a new vertex with value: V1 op(E) V2. The last case would be two vertexes with two edges, the result is the bigger one.
Return the max result value can be gotten from a given polygon.
For the last case we might not need two merge as the other number could be negative, so in that case we would just return the larger number.
How I am approaching the problem:
p[i,j] denotes the maximum value we can obtain by merging nodes from labelled i to j.
p[i,i] = v[i] -- base case
p[i,j] = p[i,k] operator in between p[k+1,j] , for k between i to j-1.
and then p[0,n] will be my answer.
Second point , i will have to start from all the vertices and do the same as above as this will be cyclic n vertices n edges.
The time complexity for this is n^3 *n i.e n^4 .
Can i do better then this ?
As you have identified (tagged) correctly, this indeed is very similar to the matrix multiplication problem (in what order do I multiply matrixes in order to do it quickly).
This can be solved polynomially using a dynamic algorithm.
I'm going to instead solve a similar, more classic (and identical) problem, given a formula with numbers, addition and multiplications, what way of parenthesizing it gives the maximal value, for example
6+1 * 2 becomes (6+1)*2 which is more than 6+(1*2).
Let us denote our input a1 to an real numbers and o(1),...o(n-1) either * or +. Our approach will work as follows, we will observe the subproblem F(i,j) which represents the maximal formula (after parenthasizing) for a1,...aj. We will create a table of such subproblems and observe that F(1,n) is exactly the result we were looking for.
Define
F(i,j)
- If i>j return 0 //no sub-formula of negative length
- If i=j return ai // the maximal formula for one number is the number
- If i<j return the maximal value for all m between i (including) and j (not included) of:
F(i,m) (o(m)) F(m+1,j) //check all places for possible parenthasis insertion
This goes through all possible options. TProof of correctness is done by induction on the size n=j-i and is pretty trivial.
Lets go through runtime analysis:
If we do not save the values dynamically for smaller subproblems this runs pretty slow, however we can make this algorithm perform relatively fast in O(n^3)
We create a n*n table T in which the cell at index i,j contains F(i,j) filling F(i,i) and F(i,j) for j smaller than i is done in O(1) for each cell since we can calculate these values directly, then we go diagonally and fill F(i+1,i+1) (which we can do quickly since we already know all the previous values in the recursive formula), we repeat this n times for n diagonals (all the diagonals in the table really) and filling each cell takes (O(n)), since each cell has O(n) cells we fill each diagonals in O(n^2) meaning we fill all the table in O(n^3). After filling the table we obviously know F(1,n) which is the solution to your problem.
Now back to your problem
If you translate the polygon into n different formulas (one for starting at each vertex) and run the algorithm for formula values on it, you get exactly the value you want.
I think you can reduce the need for a brute force search. For example: if there is a chain of
x + y + z
You can replace it with a single vertex whose value is the sum, you can't do better than that. You need to do the multiplying after the addition when you're dealing with +ve integers. So if it's all positive then simply reduce all + chains and then mutliply.
So that leaves the cases where there are -ve numbers. Seems to me that the strategy for a single -ve number is pretty obvious, for two -ve numbers there are a few cases (remembering that - x - is positive) and for more than 2 -ve numbers it seems to get tricky :-)

Partition a set into k groups with minimum number of moves

You have a set of n objects for which integer positions are given. A group of objects is a set of objects at the same position (not necessarily all the objects at that position: there might be multiple groups at a single position). The objects can be moved to the left or right, and the goal is to move these objects so as to form k groups, and to do so with the minimum distance moved.
For example:
With initial positions at [4,4,7], and k = 3: the minimum cost is 0.
[4,4,7] and k = 2: minimum cost is 0
[1,2,5,7] and k = 2: minimum cost is 1 + 2 = 3
I've been trying to use a greedy approach (by calculating which move would be shortest) but that wouldn't work because every move involves two elements which could be moved either way. I haven't been able to formulate a dynamic programming approach as yet but I'm working on it.
This problem is a one-dimensional instance of the k-medians problem, which can be stated as follows. Given a set of points x_1...x_n, partition these points into k sets S_1...S_k and choose k locations y_1...y_k in a way that minimizes the sum over all x_i of |x_i - y_f(i)|, where y_f(i) is the location corresponding of the set to which x_i is assigned.
Due to the fact that the median is the population minimizer for absolute distance (i.e. L_1 norm), it follows that each location y_j will be the median of the elements x in the corresponding set S_j (hence the name k-medians). Since you are looking at integer values, there is the technicality that if S_j contains an even number of elements, the median might not be an integer, but in such cases choosing either the next integer above or below the median will give the same sum of absolute distances.
The standard heuristic for solving k-medians (and the related and more common k-means problem) is iterative, but this is not guaranteed to produce an optimal or even good solution. Solving the k-medians problem for general metric spaces is NP-hard, and finding efficient approximations for k-medians is an open research problem. Googling "k-medians approximation", for example, will lead to a bunch of papers giving approximation schemes.
http://www.cis.upenn.edu/~sudipto/mypapers/kmedian_jcss.pdf
http://graphics.stanford.edu/courses/cs468-06-winter/Papers/arr-clustering.pdf
In one dimension things become easier, and you can use a dynamic programming approach. A DP solution to the related one-dimensional k-means problem is described in this paper, and the source code in R is available here. See the paper for details, but the idea is essentially the same as what #SajalJain proposed, and can easily be adapted to solve the k-medians problem rather than k-means. For j<=k and m<=n let D(j,m) denote the cost of an optimal j-medians solution to x_1...x_m, where the x_i are assumed to be in sorted order. We have the recurrence
D(j,m) = min (D(j-1,q) + Cost(x_{q+1},...,x_m)
where q ranges from j-1 to m-1 and Cost is equal to the sum of absolute distances from the median. With a naive O(n) implementation of Cost, this would yield an O(n^3k) DP solution to the whole problem. However, this can be improved to O(n^2k) due to the fact that the Cost can be updated in constant time rather than computed from scratch every time, using the fact that, for a sorted sequence:
Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_1...x_h)-x_1 if h is odd
Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_2...x_h)-x_1 if h is even
See the writeup for more details. Except for the fact that the update of the Cost function is different, the implementation will be the same for k-medians as for k-means.
http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf
as I understand, the problems is:
we have n points on a line.
we want to place k position on the line. I call them destinations.
move each of n points to one of the k destinations so the sum of distances is minimum. I call this sum, total cost.
destinations can overlap.
An obvious fact is that for each point we should look for the nearest destinations on the left and the nearest destinations on the right and choose the nearest.
Another important fact is all destinations should be on the points. because we can move them on the line to right or to left to reach a point without increasing total distance.
By these facts consider following DP solution:
DP[i][j] means the minimum total cost needed for the first i point, when we can use only j destinations, and have to put a destination on the i-th point.
to calculate DP[i][j] fix the destination before the i-th point (we have i choice), and for each choice (for example k-th point) calculate the distance needed for points between the i-th point and the new point added (k-th point). add this with DP[k][j - 1] and find the minimum for all k.
the calculation of initial states (e.g. j = 1) and final answer is left as an exercise!
Task 0 - sort the position of the objects in non-decreasing order
Let us define 'center' as the position of the object where it is shifted to.
Now we have two observations;
For N positions the 'center' would be the position which is nearest to the mean of these N positions. Example, let 1,3,6,10 be the positions. Then mean = 5. Nearest position is 6. Hence the center for these elements is 6. This gives us the position with minimum cost of moving when all elements need to be grouped into 1 group.
Let N positions be grouped into K groups "optimally". When N+1 th object is added, then it will disturb only the K th group, i.e, first K-1 groups will remain unchanged.
From these observations, we build a dynamic programming approach.
Let Cost[i][k] and Center[i][k] be two 2D arrays.
Cost[i][k] = minimum cost when first 'i' objects are partitioned into 'k' groups
Center[i][k] stores the center of the 'i-th' object when Cost[i][k] is computed.
Let {L} be the elements from i-L,i-L+1,..i-1 which have the same center.
(Center[i-L][k] = Center[i-L+1][k] = ... = Center[i-1][k]) These are the only objects that need to be considered in the computation for i-th element (from observation 2)
Now
Cost[i][k] will be
min(Cost[i-1][k-1] , Cost[i-L-1][k-1] + computecost(i-L, i-L+1, ... ,i))
Update Center[i-L ... i][k]
computecost() can be found trivially by finding the center (from observation 1)
Time Complexity:
Sorting O(NlogN)
Total Cost Computation Matrix = Total elements * Computecost = O(NK * N)
Total = O(NlogN + N*NK) = O(N*NK)
Let's look at k=1.
For k=1 and n odd, all points should move to the center point. For k=1 and n even, all points should move to either of the center points or any spot between them. By 'center' I mean in terms of number of points to either side, i.e. the median.
You can see this because if you select a target spot, x, with more points to its right than it's left, then a new target 1 to the right of x would result in a cost reduction (unless there is exactly one more point to the right than the left and the target spot is a point, in which case n is even and the target is on/between the two center points).
If your points are already sorted, this is an O(1) operation. If not, I believe it's O(n) (via an order statistic algorithm).
Once you've found the spot that all points are moving to, it's O(n) to find the cost.
Thus regardless of whether the points are sorted or not, this is O(n).

Finding the minimal coverage of an interval with subintervals

Suppose I have an interval (a,b), and a number of subintervals {(ai,bi)}i whose union is all of (a,b). Is there an efficient way to choose a minimal-cardinality subset of these subintervals which still covers (a,b)?
A greedy algorithm starting at a or b always gives the optimal solution.
Proof: consider the set Sa of all the subintervals covering a. Clearly, one of them has to belong to the optimal solution. If we replace it with a subinterval (amax,bmax) from Sa whose right endpoint bmax is maximal in Sa (reaches furthest to the right), the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution, so it can be covered with no more subintervals than the analogous uncovered interval from the optimal solution. Therefore, a solution constructed from (amax,bmax) and the optimal solution for the remaining interval (bmax,b) will also be optimal.
So, just start at a and iteratively pick the interval reaching furthest right (and covering the end of previous interval), repeat until you hit b. I believe that picking the next interval can be done in log(n) if you store the intervals in an augmented interval tree.
Sounds like dynamic programming.
Here's an illustration of the algorithm (assume intervals are in a list sorted by ending time):
//works backwards from the end
int minCard(int current, int must_end_after)
{
if (current < 0)
if (must_end_after == 0)
return 0; //no more intervals needed
else
return infinity; //doesn't cover (a,b)
if (intervals[current].end < must_end_after)
return infinity; //doesn't cover (a,b)
return min( 1 + minCard(current - 1, intervals[current].start),
minCard(current - 1, must_end_after) );
//include current interval or not?
}
But it should also involve caching (memoisation).
There are two cases to consider:
Case 1: There are no over-lapping intervals after the finish time of an interval. In this case, pick the next interval with the smallest starting time and the longest finishing time. (amin, bmax).
Case 2: There are 1 or more intervals overlapping with the last interval you're looking at. In this case, the start time doesn't matter because you've already covered that. So optimize for the finishing time. (a, bmax).
Case 1 always picks the first interval as the first interval in the optimal set as well (the proof is the same as what #RafalDowgrid provided).
You mean so that the subintervals still overlap in such a way that (a,b) remains completely covered at all points?
Maybe splitting up the subintervals themselves into basic blocks associated with where they came from, so you can list options for each basic block interval accounting for other regions covered by the subinterval also. Then you can use a search based on each sub-subinterval and at least be sure no gaps are left.
Then would need to search.. efficiently.. that would be harder.
Could eliminate any collection of intervals that are entirely covered by another set of smaller number and work the problem after the preprocessing.
Wouldn't the minimal for the whole be minimal for at least one half? I'm not sure.
Found a link to a journal but couldn't read it. :(
This would be a hitting set problem and be NP_hard in general.
Couldn't read this either but looks like opposite kind of problem.
Couldn't read it but another link that mentions splitting intervals up.
Here is an available reference on Randomized Algorithms for GeometricOptimization Problems.
Page 35 of this pdf has a greedy algorithm.
Page 11 of Karp (1972) mentions hitting-set and is cited alot.
Google result. Researching was fun but I have to go now.

Resources