Given N jobs where every job is represented by following three elements of it.
1) Start Time
2) Finish Time.
3) Profit or Value Associated.
Find the maximum profit subset of jobs such that no two jobs in the subset overlap.
I know a dynamic programming solution which has a complexity of O(N^2) (close to LIS where we have to just check the previous elements with which we can merge the current interval and take the interval whose merging gives maximum till the i th element ).This solution can be further improved to O(N*log N ) using Binary search and simple sorting!
But my friend was telling me that it can be even solved by using Segment Trees and binary search! I have no clue as to where I am going to use Segment Tree and how .??
Can you help?
On request,sorry not commented
What I am doing is sorting on the basis of the starting index, storing the maximum obtainable value till i at DP[i] by merging previous intervals and their maximum obtainable value !
void solve()
{
int n,i,j,k,high;
scanf("%d",&n);
pair < pair < int ,int>, int > arr[n+1];// first pair represents l,r and int alone shows cost
int dp[n+1];
memset(dp,0,sizeof(dp));
for(i=0;i<n;i++)
scanf("%d%d%d",&arr[i].first.first,&arr[i].first.second,&arr[i].second);
std::sort(arr,arr+n); // by default sorting on the basis of starting index
for(i=0;i<n;i++)
{
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high , dp[j]+arr[i].second);
}
dp[i]=high;
}
for(i=0;i<n;i++)
dp[n-1]=std::max(dp[n-1],dp[i]);
printf("%d\n",dp[n-1]);
}
int main()
{solve();return 0;}
EDIT:
My working code finally took me 3 hours to debug it though! Morover this code is slower than the binary search and sorting one due to a larger constant and bad implementation :P (just for reference)
#include<stdio.h>
#include<algorithm>
#include<vector>
#include<cstring>
#include<iostream>
#include<climits>
#define lc(idx) (2*idx+1)
#define rc(idx) (2*idx+2)
#define mid(l,r) ((l+r)/2)
using namespace std;
int Tree[4*2*10000-1];
void update(int L,int R,int qe,int idx,int value)
{
if(value>Tree[0])
Tree[0]=value;
while(L<R)
{
if(qe<= mid(L,R))
{
idx=lc(idx);
R=mid(L,R);
}
else
{
idx=rc(idx);
L=mid(L,R)+1;
}
if(value>Tree[idx])
Tree[idx]=value;
}
return ;
}
int Get(int L,int R,int idx,int q)
{
if(q<L )
return 0;
if(R<=q)
return Tree[idx];
return max(Get(L,mid(L,R),lc(idx),q),Get(mid(L,R)+1,R,rc(idx),q));
}
bool cmp(pair < pair < int , int > , int > A,pair < pair < int , int > , int > B)
{
return A.first.second< B.first.second;
}
int main()
{
int N,i;
scanf("%d",&N);
pair < pair < int , int > , int > P[N];
vector < int > V;
for(i=0;i<N;i++)
{
scanf("%d%d%d",&P[i].first.first,&P[i].first.second,&P[i].second);
V.push_back(P[i].first.first);
V.push_back(P[i].first.second);
}
sort(V.begin(),V.end());
for(i=0;i<N;i++)
{
int &l=P[i].first.first,&r=P[i].first.second;
l=lower_bound(V.begin(),V.end(),l)-V.begin();
r=lower_bound(V.begin(),V.end(),r)-V.begin();
}
sort(P,P+N,cmp);
int ans=0;
memset(Tree,0,sizeof(Tree));
for(i=0;i<N;i++)
{
int aux=Get(0,2*N-1,0,P[i].first.first)+P[i].second;
if(aux>ans)
ans=aux;
update(0,2*N-1,P[i].first.second,0,ans);
}
printf("%d\n",ans);
return 0;
}
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]+arr[i].second);
}
dp[i]=high;
This can be done in O(log n) with a segment tree.
First of all, let's rewrite it a bit. The max you are taking is a bit complicated, because it takes the maximum of a sum involving both i and j. But i is constant in this part, so let's take it out.
high=dp[0];
for(j=1;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]);
}
dp[i]=high + arr[i].second;
Great, now we have reduced the problem to determining the maximum in [0, i - 1] out of the values that satisfy your if condition.
If we didn't have the if, it would be a simple application of segment trees.
Now there are two choices.
1. Deal with O(log V) query time and O(V) memory for the segment tree
Where V is the maximum size of an interval's endpoint.
You can build a segment tree to which you insert interval start points as you move your i. Then you query over the range of values. Something like this, where the segment tree is initialized to -infinity and of size O(V).
Update(node, index, value):
if node.associated_interval == [index, index]:
node.max = value
return
if index in node.left.associated_interval:
Update(node.left, index, value)
else:
Update(node.right, index, value)
node.max = max(node.left.max, node.right.max)
Query(node, left, right):
if [left, right] does not intersect node.associated_interval:
return -infinity
if node.associated_interval included in [left, right]:
return node.max
return max(Query(node.left, left, right),
Query(node.right, left, right))
[...]
high=Query(tree, 0, arr[i].first.first)
dp[i]=high + arr[i].second;
Update(tree, arr[i].first.first, dp[i])
2. Reducing to O(log n) query time and O(n) memory for the segment tree
Since the number of intervals might be significantly less than their length, it's reasonable to think that we might be able to encode them better somehow, so that their length is also O(n). Indeed, we can.
This involves normalizing your intervals in the range [1, 2*n]. Consider the following intervals
8 100
3 50
90 92
Let's plot them on a line. They'd look like this:
3 8 50 90 92 100
Now replace each of them with their index:
1 2 3 4 5 6
3 8 50 90 92 100
And write your new intervals:
2 6
1 3
4 5
Note that they retain the properties of your initial intervals: the same ones overlap, the same ones are included in each other etc.
This can be done with a sort. You can now apply the same segment tree algorithm, except you declare the segment tree for the size 2*n.
[Updates at bottom (including solution source code)]
I have a challenging business problem that a computer can help solve.
Along a mountainous region flows a long winding river with strong currents. Along certain parts of the river are plots of environmentally sensitive land suitable for growing a particular type of rare fruit that is in very high demand. Once field laborers harvest the fruit, the clock starts ticking to get the fruit to a processing plant. It's very costly to try and send the fruits upstream or over land or air. By far the most cost effective mechanism to ship them to the plant is downstream in containers powered solely by the river's constant current. We have the capacity to build 10 processing plants and need to locate these along the river to minimize the total time the fruits spend in transit. The fruits can take however long before reaching the nearest downstream plant but that time directly hurts the price at which they can be sold. Effectively, we want to minimize the sum of the distances to the nearest respective downstream plant. A plant can be located as little as 0 meters downstream from a fruit access point.
The question is: In order to maximize profits, how far up the river should we build the 10 processing plants if we have found 32 fruit growing regions, where the regions' distances upstream from the base of the river are (in meters):
10, 40, 90, 160, 250, 360, 490, ... (n^2)*10 ... 9000, 9610, 10320?
[It is hoped that all work going towards solving this problem and towards creating similar problems and usage scenarios can help raise awareness about and generate popular resistance towards the damaging and stifling nature of software/business method patents (to whatever degree those patents might be believed to be legal within a locality).]
UPDATES
Update1: Forgot to add: I believe this question is a special case of this one.
Update2: One algorithm I wrote gives an answer in a fraction of a second, and I believe is rather good (but it's not yet stable across sample values). I'll give more details later, but the short is as follows. Place the plants at equal spacings. Cycle over all the inner plants where at each plant you recalculate its position by testing every location between its two neighbors until the problem is solved within that space (greedy algorithm). So you optimize plant 2 holding 1 and 3 fixed. Then plant 3 holding 2 and 4 fixed... When you reach the end, you cycle back and repeat until you go a full cycle where every processing plant's recalculated position stops varying.. also at the end of each cycle, you try to move processing plants that are crowded next to each other and are all near each others' fruit dumps into a region that has fruit dumps far away. There are many ways to vary the details and hence the exact answer produced. I have other candidate algorithms, but all have glitches. [I'll post code later.] Just as Mike Dunlavey mentioned below, we likely just want "good enough".
To give an idea of what might be a "good enough" result:
10010 total length of travel from 32 locations to plants at
{10,490,1210,1960,2890,4000,5290,6760,8410,9610}
Update3: mhum gave the correct exact solution first but did not (yet) post a program or algorithm, so I wrote one up that yields the same values.
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 processing-plants.c -o processing-plants
$ ./processing-plants
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//a: Data set of values. Add extra large number at the end
int a[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999
};
//numofa: size of data set
int numofa=sizeof(a)/sizeof(int);
//a2: will hold (pt to) unique data from a and in sorted order.
int *a2;
//max: size of a2
int max;
//num_fixed_loc: at 10 gives the solution for 10 plants
int num_fixed_loc;
//xx: holds index values of a2 from the lowest error winner of each cycle memoized. accessed via memoized offset value. Winner is based off lowest error sum from left boundary upto right ending boundary.
//FIX: to be dynamically sized.
int xx[1000000];
//xx_last: how much of xx has been used up
int xx_last=0;
//SavedBundle: data type to "hold" memoized values needed (total traval distance and plant locations)
typedef struct _SavedBundle {
long e;
int xx_offset;
} SavedBundle;
//sb: (pts to) lookup table of all calculated values memoized
SavedBundle *sb; //holds winning values being memoized
//Sort in increasing order.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
/****************************
Most interesting code in here
****************************/
long full_memh(int l, int n) {
long e;
long e_min=-1;
int ti;
if (sb[l*max+n].e) {
return sb[l*max+n].e; //convenience passing
}
for (int i=l+1; i<max-1; i++) {
e=0;
//sum first part
for (int j=l+1; j<i; j++) {
e+=a2[j]-a2[l];
}
//sum second part
if (n!=1) //general case, recursively
e+=full_memh(i, n-1);
else //base case, iteratively
for (int j=i+1; j<max-1; j++) {
e+=a2[j]-a2[i];
}
if (e_min==-1) {
e_min=e;
ti=i;
}
if (e<e_min) {
e_min=e;
ti=i;
}
}
sb[l*max+n].e=e_min;
sb[l*max+n].xx_offset=xx_last;
xx[xx_last]=ti; //later add a test or a realloc, etc, if approp
for (int i=0; i<n-1; i++) {
xx[xx_last+(i+1)]=xx[sb[ti*max+(n-1)].xx_offset+i];
}
xx_last+=n;
return e_min;
}
/*************************************************************
Call to calculate and print results for given number of plants
*************************************************************/
int full_memoization(int num_fixed_loc) {
char *str;
long errorsum; //for convenience
//Call recursive workhorse
errorsum=full_memh(0, num_fixed_loc-2);
//Now print
str=(char *) malloc(num_fixed_loc*20+100);
sprintf (str,"\n%4d %6d {%d,",num_fixed_loc-1,errorsum,a2[0]);
for (int i=0; i<num_fixed_loc-2; i++)
sprintf (str+strlen(str),"%d%c",a2[ xx[ sb[0*max+(num_fixed_loc-2)].xx_offset+i ] ], (i<num_fixed_loc-3)?',':'}');
printf ("%s",str);
return 0;
}
/**************************************************
Initialize and call for plant numbers of many sizes
**************************************************/
int main (int x, char **y) {
int t;
int i2;
qsort(a,numofa,sizeof(int),sortfunc);
t=1;
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1])
t++;
max=t;
i2=1;
a2=(int *)malloc(sizeof(int)*t);
a2[0]=a[0];
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1]) {
a2[i2++]=a[i];
}
sb = (SavedBundle *)calloc(sizeof(SavedBundle),max*max);
for (int i=3; i<=max; i++) {
full_memoization(i);
}
free(sb);
return 0;
}
Let me give you a simple example of a Metropolis-Hastings algorithm.
Suppose you have a state vector x, and a goodness-of-fit function P(x), which can be any function you care to write.
Suppose you have a random distribution Q that you can use to modify the vector, such as x' = x + N(0, 1) * sigma, where N is a simple normal distribution about 0, and sigma is a standard deviation of your choosing.
p = P(x);
for (/* a lot of iterations */){
// add x to a sample array
// get the next sample
x' = x + N(0,1) * sigma;
p' = P(x');
// if it is better, accept it
if (p' > p){
x = x';
p = p';
}
// if it is not better
else {
// maybe accept it anyway
if (Uniform(0,1) < (p' / p)){
x = x';
p = p';
}
}
}
Usually it is done with a burn-in time of maybe 1000 cycles, after which you start collecting samples. After another maybe 10,000 cycles, the average of the samples is what you take as an answer.
It requires diagnostics and tuning. Typically the samples are plotted, and what you are looking for is a "fuzzy caterpilar" plot that is stable (doesn't move around much) and has a high acceptance rate (very fuzzy). The main parameter you can play with is sigma.
If sigma is too small, the plot will be fuzzy but it will wander around.
If it is too large, the plot will not be fuzzy - it will have horizontal segments.
Often the starting vector x is chosen at random, and often multiple starting vectors are chosen, to see if they end up in the same place.
It is not necessary to vary all components of the state vector x at the same time. You can cycle through them, varying one at a time, or some such method.
Also, if you don't need the diagnostic plot, it may not be necessary to save the samples, but just calculate the average and variance on the fly.
In the applications I'm familiar with, P(x) is a measure of probability, and it is typically in log-space, so it can vary from 0 to negative infinity.
Then to do the "maybe accept" step it is (exp(logp' - logp))
Unless I've made an error, here are exact solutions (obtained through a dynamic programming approach):
N Dist Sites
2 60950 {10,4840}
3 40910 {10,2890,6760}
4 30270 {10,2250,4840,7840}
5 23650 {10,1690,3610,5760,8410}
6 19170 {10,1210,2560,4410,6250,8410}
7 15840 {10,1000,2250,3610,5290,7290,9000}
8 13330 {10,810,1960,3240,4410,5760,7290,9000}
9 11460 {10,810,1690,2890,4000,5290,6760,8410,9610}
10 9850 {10,640,1440,2250,3240,4410,5760,7290,8410,9610}
11 8460 {10,640,1440,2250,3240,4410,5290,6250,7290,8410,9610}
12 7350 {10,490,1210,1960,2890,3610,4410,5290,6250,7290,8410,9610}
13 6470 {10,490,1000,1690,2250,2890,3610,4410,5290,6250,7290,8410,9610}
14 5800 {10,360,810,1440,1960,2560,3240,4000,4840,5760,6760,7840,9000,10240}
15 5190 {10,360,810,1440,1960,2560,3240,4000,4840,5760,6760,7840,9000,9610,10240}
16 4610 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,7290,8410,9000,9610,10240}
17 4060 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,7290,7840,8410,9000,9610,10240}
18 3550 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,6760,7290,7840,8410,9000,9610,10240}
19 3080 {10,360,810,1210,1690,2250,2890,3610,4410,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
20 2640 {10,250,640,1000,1440,1960,2560,3240,4000,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
21 2230 {10,250,640,1000,1440,1960,2560,3240,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
22 1860 {10,250,640,1000,1440,1960,2560,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
23 1520 {10,250,490,810,1210,1690,2250,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
24 1210 {10,250,490,810,1210,1690,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
25 940 {10,250,490,810,1210,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
26 710 {10,160,360,640,1000,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
27 500 {10,160,360,640,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
28 330 {10,160,360,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
29 200 {10,160,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
30 100 {10,90,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
31 30 {10,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
32 0 {10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}